Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OverloadedError: Model is overloaded #34

Open
farshidbalan opened this issue Sep 3, 2023 · 1 comment
Open

OverloadedError: Model is overloaded #34

farshidbalan opened this issue Sep 3, 2023 · 1 comment

Comments

@farshidbalan
Copy link

I am using the meta-llama/Llama-2-70b-chat-hf model on a data frame with 3000 rows, each including a 500-token text. But after 10 rows is processed, I get the following error

` in call_llama2_api(self, messages)
79 def call_llama2_api(self, messages):
80 huggingface.prompt_builder = "llama2"
---> 81 response = huggingface.ChatCompletion.create(
82 model="meta-llama/Llama-2-70b-chat-hf",
83 messages=messages,

/usr/local/lib/python3.10/dist-packages/easyllm/clients/huggingface.py in create(messages, model, temperature, top_p, top_k, n, max_tokens, stop, stream, frequency_penalty, debug)
205 generated_tokens = 0
206 for _i in range(request.n):
--> 207 res = client.text_generation(
208 prompt,
209 details=True,

/usr/local/lib/python3.10/dist-packages/huggingface_hub/inference/_client.py in text_generation(self, prompt, details, stream, model, do_sample, max_new_tokens, best_of, repetition_penalty, return_full_text, seed, stop_sequences, temperature, top_k, top_p, truncate, typical_p, watermark, decoder_input_details)
1063 decoder_input_details=decoder_input_details,
1064 )
-> 1065 raise_text_generation_error(e)
1066
1067 # Parse output

/usr/local/lib/python3.10/dist-packages/huggingface_hub/inference/_text_generation.py in raise_text_generation_error(http_error)
472 raise IncompleteGenerationError(message) from http_error
473 if error_type == "overloaded":
--> 474 raise OverloadedError(message) from http_error
475 if error_type == "validation":
476 raise ValidationError(message) from http_error

OverloadedError: Model is overloaded`

Is there any solution to fix this problem, like increasing the rate limit?

@murdadesmaeeli
Copy link

What happens if you give it 3000 rows with 250 max token? Is it the same thing?

This might be out of memory (oom) or endpoint problem. I think it would be helpful if you put in your ram&vram for spec&usage of your system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants