I am interested in limits on one single request. I am trying actually simple project: sending request with prompt + text, where I asks to create tests based on the text. So how many symbols max can be in single request? sorry if it is dumb question…
Thanks in advance!
The count is based on the number of API calls, not the number of tokens passed to the API, so there are almost no restrictions on the number of tokens. However, the Free Plan has very few API calls available…
Here’s the per-call limit on the Free plan:
No plan-specific character cap. One request is bounded by the model’s context window (tokens) and the server HTTP body size. The plan only changes credits and rate limits, not per-request size. (Hugging Face)
HTTP body cap: Hugging Face’s default payload limit is ~2,000,000 bytes on both TGI and TEI. Larger bodies return 413. (Hugging Face)
Token cap: Your input tokens + requested output tokens must fit the model’s context. TGI also exposes a --max-input-tokens guard. (Hugging Face)
Rough “symbols” math: 1 token ≈ 3–4 English characters.
With a 128k-token model (e.g., Qwen2.5 or Llama-3.1), you can usually fit on the order of ~380k–500k characters of input if you reserve some tokens for output. Always check the model card. (Hugging Face)
Bottom line: On Free, send as much as fits within the model’s token window and under ~2 MB JSON body. If you hit 413, shrink the payload; if you hit context limits, shorten or chunk the text. (Hugging Face)