Rate Limiting
Rate limiting is a technique that controls how often a user or system can perform an action or send requests within a given time period. It is used to prevent any single source from overwhelming a system by consuming too many resources too quickly. This helps protect services from abuse, automated attacks, and accidental overload.
Rate limiting is a network and application-layer control mechanism that enforces a threshold on the frequency or volume of requests sent or received by a system within a defined time window. Typically implemented at the network interface, API gateway, or application layer, it regulates inbound and outbound traffic to maintain system stability, prevent resource exhaustion, and mitigate abuse patterns such as credential stuffing, scraping, or denial-of-service attempts. Enforcement may be scoped per IP address, user identity, API key, or other client identifier, and is commonly implemented using algorithms such as token bucket, leaky bucket, fixed window, or sliding window counters. Rate limiting operates at the traffic-processing level and can restrict excessive request rates at runtime, though it typically cannot distinguish between legitimate high-volume users and malicious actors without additional contextual signals.
Why it matters
Without rate limiting, a single client or automated process can send requests at arbitrary volume, potentially exhausting server CPU, memory, database connections, or downstream API quotas. This resource exhaustion can degrade service availability for all users, regardless of whether the cause is a deliberate attack or an unintentionally aggressive client. Services exposed to the internet are particularly vulnerable because unauthenticated endpoints have no inherent constraint on how frequently they can be called.
Who it's relevant to
Inside Rate Limiting
Common questions
Answers to the questions practitioners most commonly ask about Rate Limiting.