Skip to content

feat(genapi): Update faq about maximum output tokens #5071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 6, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions pages/generative-apis/faq.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,13 @@ To get started, explore the [Generative APIs Playground](/generative-apis/quicks
Yes, API rate limits define the maximum number of requests a user can make within a specific time frame to ensure fair access and resource allocation between users. If you require increased rate limits (by a factor from 2 to 5 times), you can request them by [creating a ticket](https://console.scaleway.com/support/tickets/create). If you require even higher rate limits, especially to absorb infrequent peak loads, we recommend using [Managed Inference](https://console.scaleway.com/inference/deployments) instead with dedicated provisioned capacity.
Refer to our dedicated [documentation](/generative-apis/reference-content/rate-limits/) for more information on rate limits.

## Can I increase maximum output (completion) tokens for a model?
No, you cannot increase maximum output tokens above [limits for each models](https://www.scaleway.com/en/docs/generative-apis/reference-content/supported-models/) in Generative APIs.
These limits are in place to protect you against:
- Long generation which may be ended by an HTTP timeout. Limits are designed to ensure a model will send its HTTP response in less than 5 minutes.
- Uncontrolled billing, as several models are known to be able to enter infinite generation loops (specific prompts can make the model generate the same sentence over and over, without stopping at all).
If you require higher maximum output tokens, you can use [Managed Inference](https://console.scaleway.com/inference/deployments) where these limts do not apply (as your bill will be limited by the size of your deployment).

## What is the model lifecycle for Generative APIs?
Scaleway is dedicated to updating and offering the latest versions of generative AI models, ensuring improvements in capabilities, accuracy, and safety. As new versions of models are introduced, you can explore them through the Scaleway console. Learn more in our dedicated [documentation](/generative-apis/reference-content/model-lifecycle/).

Expand Down