Building a sliding window rate limiter with Redis
For our Instagram crawler we needed a system to keep track of the amount of API calls we did to prevent us from hitting the rate limits. We could of course perform our HTTP requests without checking rate limits upfront, and wait until we get a 429 OAuthRateLimitException
from Instagram, but that would exhaust our tokens and block us from talking efficiently to their API.
All rate limits on the Instagram Platform are controlled separately for each access token, and on a sliding 1-hour window.
We're allowed to do 5000 API calls per access token each hour.
Every point on the axis represents an API call. The sliding window is an hour in this case. Every time we do an API call we add the timestamp (in microseconds) of the API call to a list. In pseudocode:
timestamps.push(Date.now());
When we're about to do an API call in our crawler we need to check if we're allowed to do one:
- How many calls did we do in the last hour?
callsInTheLastHour = timestamps.filter(timestamp => timestamp > now - slidingWindow);
count = callsInTheLastHour.length
- How many can we still do?
After we calculated the amount of API calls we did in the last hour we can calculate the remaining API calls:
remaining = maxCallsPerHour - count;
Let's say we did 4413
API calls in the last hour then we're allowed to do 587
more at this moment.
Great, we got our algorithm. Now we need some kind of database to store a list of timestamps grouped per access token. Maybe we could use MySQL or PostgreSQL? Yes, we could but then we would need a system that periodically removes outdated timestamps since neither MySQL or PostgreSQL allow us to set a time to life on a row. What about Memcached? Yes, that's also an option. Sadly Memcached doesn't have the concept of an array or list (we could serialize an array using our favourite programming language). We can do better. What about Redis? Yes, I like where this is going. Redis is a key/value store that supports lists, sets, sorted sets and more.
We're ready to translate our algorithm to Redis commands. Assuming you already have Redis installed, start a server: $ redis-server
. If you're on a mac, you can just $ brew install redis
to get it.
We're going to use a sorted set to hold our timestamps because it fits our needs.
> MULTI
> ZREMRANGEBYSCORE $accessToken 0 ($now - $slidingWindow)
> ZRANGE $accessToken 0 -1
> ZADD $accessToken $now $now
> EXPIRE $accessToken $slidingWindow
> EXEC
Let's break it down:
MULTI
to mark the start of a transaction block. Subsequent commands will be queued for atomic execution usingEXEC
.ZREMRANGEBYSCORE $accessToken 0 ($now - $slidingWindow)
to remove API call timestamps that were done before the start of the window.ZRANGE $accessToken 0 -1
to get a list of all API call timestamps that happened during the window.ZADD $accessToken $now $now
to add a log for the current API call that we're about to do.EXPIRE $accessToken $slidingWindow
to reset the expiry date for this sorted set of timestamps (for the current OAuth Token).EXEC
will execute all previously queued commands and restore the connection state to normal.
Instead of using the actual OAuth access tokens (and duplicating them to Redis), you might want to use an identifier or hash of the token instead as $accessToken
. It serves as the key for our Redis sorted set. Also note that, in the same transaction as reading the list of timestamps, we add a new timestamp to the list (the ZADD
command). We do this because this is being used in a distributed context (we have many workers performing API calls), and we don't want to write when we already exceeded our limits.
In PHP this might look like this:
// composer require predis/predis
require_once __DIR__ . '/vendor/autoload.php';
$maxCallsPerHour = 5000;
$slidingWindow = 3600;
$now = microtime(true);
$accessToken = md5('access-token');
$client = new Predis\Client();
$client->multi();
$client->zrangebyscore($accessToken, 0, $now - $slidingWindow);
$client->zrange($accessToken, 0, -1);
$client->zadd($accessToken, $now, $now);
$client->expire($accessToken, $slidingWindow);
$result = $client->exec();
// The second command inside the transaction was ZRANGE,
// which returns a list of timestamps within the last hour.
$timestamps = $result[1];
$remaining = max(0, $maxCallsPerHour - count($timestamps));
if ($remaining > 0) {
echo sprintf('%s: Allowed and %d remaining', $now, $remaining) . PHP_EOL;
} else {
echo sprintf('%s: Not allowed', $now) . PHP_EOL;
}
To conclude all of this, and to make this work within our codebase, we put this all nicely in a class, behind an interface RateLimiter
:
<?php
namespace CXSocial\RateLimiter;
interface RateLimiter
{
/**
* Request the remaining ratelimit points
*
* @param RateLimitedResource $rateLimitedResource
*
* @throws SorryRateLimitUnavailable
*
* @return int
*/
public function remaining(RateLimitedResource $rateLimitedResource);
}
This allows us to write code that doesn't couple to implemention too much. This has been working like a charm for us! We're huge fans.
Categories: Redis