Operations

Rate Limits and Quota Errors

Rate limits are calculated by account, model, and aggregate API-key usage. A customer quote should include traffic assumptions and an escalation path.

OperationsOfficial source

Who this is for

Teams moving from demo usage to production traffic.

Configuration reference

Values to confirm before setup

Limit scope

Main account aggregate across subaccounts, workspaces, and API keys

Model behavior

Different models have independent limit rules

Common symptoms

Requests rate limit exceeded, quota exceeded, traffic spike protection

Setup flow

Practical steps

  1. 01Collect expected RPM, TPM, peak traffic, and concurrency.
  2. 02Map each workload to a model.
  3. 03Check official limits for that model and account.
  4. 04Add retry/backoff logic.
  5. 05Set monitoring alerts and a replenishment plan.

Customer explanation

A 429-style failure does not always mean the key is invalid. It may mean request frequency, token usage, or a rapid traffic spike exceeded the allowed envelope.

Operational setup

Production apps need backoff, queueing, fallback models, and usage dashboards. A token purchase alone does not solve throughput design.

Common mistakes

Check these before escalating

  • Multiple API keys under the same main account can still share aggregate limits.
  • Burst traffic can trigger protection before a published limit is fully reached.
  • Rate-limit increases are official-provider decisions.

Related guides