Using AWS SQS and Lambda Reserved Concurrency to Rate Limit

Today I want to do a deep dive into some AWS architecture decisions. I was able to use a lot of cool AWS tools and I want to share how and why. Let’s talk about AWS SQS and how it helps with rate limiting!

Set the scene

fun gif

The Secretary of State API scrapes…surprise, Secretary of State websites. I have a lambda function for each state and then a function in front that is connected to AWS API Gateway. I’m going to call this my “router” function. A user makes a request, it goes to the router function who forwards it on to the appropriate state.

I want to protect the sites I scrape.

The last thing I want is a state’s site to crash because we hit it with too many requests at once. So I need to do some kind of rate limiting. But it’s not traditional rate limiting it’s more like…reverse rate limiting.

I need to measure the amount of load on a specific target rather than coming from a specific user. A user can send tons of requests in and as long as they aren’t all to the same site, have at it!

The opposite applies, however. If many users are all sending requests to the same site then I need to gracefully rate limit. I don’t want the user to be rejected because that would be very confusing. They would have no context into when or why their request was being rejected.

Enter AWS’ Simple Queue Service (SQS).

But first! Long polling!

fun gif about waiting
I love this movie and book.

This isn’t the main point of this post so I’m going to keep it brief. Using the queue would have been pretty much impossible without already having implemented long polling.

With long polling, we can gracefully handle any request that takes so long. We originally added it as a way to handle states that took forever to scrape, such as Delaware.

When we added rate limiting we just implemented long polling for all states.

Now the current architecture

aws sqs for rate limiting diagram

There it is in all of its beauty. The flow ends up going something like this:

  • Request is received and goes to router (sos-search)
  • Router sends request to state-environment specific SQS queue
  • Lambda function polls queue when it has capacity and starts scrape
  • When scrape completes, it updates the long polling table
  • sos-search checks the long polling table every 250ms for the completed scrape
  • When it’s completed, sos-search sends back the response to the user
  • If it takes longer than ~20 seconds, sos-search returns a retryId to the user
  • User can send retryId to check on status of request

Writing that all out makes it seems like a lot of steps. And it kind of is.

We can control the Lambda capacity by using “Reserved concurrency” for each state function.

aws lambda reserved concurrency

With the “Reserved concurrency” set to 5 there will never be more than 5 instances of the function running at once.

SQS Code Sample

Here are a few code samples:

const sqsResponse = await sqs.sendMessage({
	QueueUrl: `${state}-${stage}`,
	MessageBody: JSON.stringify({
		searchQuery: searchQuery,
		sosId: sosId,
		list: list

The above is just a simple way to send a message to an SQS queue. I namespace it with the state and stage (alias). I have a both a dev and prod queue for each state.

sqs dev and prod

The Bad

AWS’ pricing for the Simple Queue Service currently is free for the first million requests. This sounded perfect for me. I could build up my queues without any cost as I rolled out my product.

Surprise! When you connect a Lambda function the SQS it is constantly polling that function. Even if there isn’t any data to grab each time Lambda queries the queue that counts as a request.

After talking to AWS’ support and learning more, I found ways to reduce that some. But if I have no traffic I still end up paying ~$30/USD per month. The price scales really well with traffic but I’d been so used to not having to pay for things with which I was tinkering that this stung a little bit.

And now to end with this calming picture from Unsplash. Photo by Aaron Burden on Unsplash.

Calm image

Video transcript:

Hello there. All right, here we go. We’re writing. We’re going, um, I’m Jordan Hanson. I’m from cobalt intelligence today. I wanted to talk a little bit about some architecture decisions that, um, I’d made with ADP. And I thought that was pretty cool implementation. I do say so myself. It took a lot of work, but actually with AWS, it made it somewhat trivial.

Now there’s some pros and cons and I’ll talk about those afterwards. Um, but essentially my problem, I’m going to talk about the, some of the architecture behind the secretary of state API, and we’re gonna use some cool stuff like, uh, SQS, which is Amazon simple queue service. But essentially my problem is I need.

But it’s almost like when you’re web scraping, at least I’m always concerned. And thinking about that website, I am scraping, I don’t want to put undue load upon it so much that I am willing to crash it because that would be illegal. And I don’t want to do that. And plus it’s disrespectful of those.

Creating those websites. So I need something reverse. Wait them again is kind of what I’m calling it is that the target site needs to be protected. And I need to make sure that the rate at which I’m scraping it is, um, not too high. It’s a controllable level. Now, if I’m the only one consuming my product, my web scraping, it’s not, this is a non-issue right.

Cause I will control how hard I hit it. But the problem is with the API. We now expose customers. So, you know, if we have a hundred customers and all of them are having to hit the same state at the same time, um, then all of a sudden we have a lot of load on that state and I wanted a way to control that.

And so preface that before we did any of that, we had to have something called on polling. And we’ve talked about that before. I’ll link to it in the description, but we’ve done long polling before, which is just us. If that request ever takes longer than expected, then we just send a retried dieback and then the user can recall.

And they can get their data back eventually. So here’s this cool diagram. Um, my friend actually works at AWS’s. Um, he works there. He provided this for me, me, um, helped build it with me. He actually walked through this architecture with me and I’m really smart guy. Now, you know, there’s probably other ways to do this and maybe better ways, but I dunno, this is where I landed on.

I feel really comfortable with it. So. Um, for each Lambda, each state where we’re scraping all the states in the United States, all their API secretary of state, we create an API for each one. We have a different Lambda function for each one. So that’s represented here. So it’s like state X, which let’s say is Delaware.

And then we have a Lambda function for that. And then we have state, Texas, which is statewide. And so in front of all those, we have some, a Lambda function, which I call SOS search. So the API gateway, you make the request via our API. It comes to the API. And then it goes to our router function and this Lambda function right here, this will route to which state you should go.

So it says, okay, I see that you’re looking for New Jersey. So it’s going to send you over to New Jersey and then New Jersey we’ll go ahead and do a scraping and then send the data back. Um, now, since we’ve implemented this SQS, the queuing, the rate limiting, um, it kind of looks like this. This is a little more complicated, which in front of each one in front of each Lambda function, we put like, And this queue really is unlimited.

Right. We can fill that thing up as much as we want. And the Lambda function will just grab it. As it becomes ready. And so it says, okay, well I’m ready for the next one. Grab it for the next one, grab it. And, um, that’s where we’re going to get a little bit into how lamb deck works with some of their concurrencies with it.

When I first started using it, I had no idea what that meant or why I’d want it, but this, it works perfect for this scenario. So what I did is I went over on my laptop. And I went over, this is Idaho for example. And I went over to the reserve concurrency and I said, okay, I don’t want this function running more than five at a time.

I don’t want more than five of this instances of the function running at once. So I did the reserve currency five, and then I create the SQS queue, Idaho, and I just connect them. Okay. I can go in here. I don’t know. Dev maybe edit. Yeah. So then coming here and they’re connected somehow. Some, I can’t remember.

I did it maybe in the trigger somewhere. I connected these two or it would do it at the alias level. Here we go. The dev right there triggers right here. So I have this right here. So now this Lambda function knows to pull from this SQS queue. I call SQS queue. It probably is maybe its an SQS queue. I don’t know, simple queue service queue.

That sounds weird. Okay. And so what it does is it goes through in here. If there’s a list, somewhere, a message in there, it goes and pulls it in. It pulls from the list and it just starts working on them. Um, now we can watch this in action right here, which is going to be cool. So, um, I had this call right here.

I’m gonna go to Idaho and it’s going to call over to Idaho and it’s going to call with pizza and then it’s gonna call 50 times in a row without slowing. Uh, so you can see here this asynchronous, right? It’s going to just do fire these all off at once and you’ll see it. Won’t we turn them all at once because it’s going to cue them and some will get a retry.

Dieback. Now I’m using the SDK here and we’ve talked about this before, but this will automatically handle the retry. These let’s give it that image backup. So what happens is it goes into the queue. Vandergriff. It starts working on it. And then it returns back to the, our long polling database to say, Hey, this is done now.

And then we just check in this long point database periodically with our retry ID. And then, um, we were then leaving and pull it off from that at anytime now, before, even before I do that, we can go into a little deeper architecture right here, who can look right here and you can see that this is how we’re doing this.

Um, I’m trying to decide how technical to get here, how deep we get, but we’re here. We’re going to depth. Let’s go. So we send it off, right? We send out the SQS message to the state and the stage, which is production or dev. And we send it out there with the stuff we have inside here, the search query of the state, which is like the business and then SOC D.

And then we already know the state. We’re going to present to the state right here now because API gateway has that limit. Right. I’m guessing I give it about 20 seconds. Before I even kill the request. I tried 20 seconds before I send it back or we try it to see if it finishes within 20 seconds, then we’ll send it back.

And they wouldn’t even know that this happened, this decoupling happened. So what I do is I go over here. I, we make the request and then I just searched through here up until 20 seconds. After the start time I searched every quarter of a second, I say, have we finished? Are we finished? Are we finished? Are we finished?

Are we finished? And if it’s. Um, and just handling all the retry ID with it. So it goes right there, cause the law, the response, and it just keeps bam send the over and over and over again and checking to see if it’s complete. Now it passed the 20 seconds to say, okay, this is going to take too long. We’re going to send a retry ID back.

Does that make sense? I know that’s kinda hard. Um, but this is the idea, the tricky part, I deal with it. So that’s how we make it look like there’s still one request happening. They’re actually decoupled. I actually checked this database every time when I have a request comes in. So goes here, hits the queue, Lambda starts working.

And as soon as that thing gets sent off to this queue, the lamp, this router doesn’t know what’s happening to it. It says, okay, are you took it? I don’t know if it’s done or not. I don’t have a request. I don’t have a response anymore. So it goes and test to check this database to say, is it done? Is it done?

Is it done? Is it done? It’s done. It’s done. And then once. It will grab it from here, return it, and then it can send it back here. If it’s past 20 some seconds, then it’s going to, this is just gonna be like, okay, we don’t know. We’ll just send it to you. Or we tried it and you can try it back later. So then it sends it back to you and you try it back with her.

We tried you. All right. So here let’s try and practice. This is we’re going to send 50 times in a row, right to Idaho. And it’s going to go here. There we go. I should have run this beforehand with pizza. Anyway. I don’t know what pizza is gonna be. Just go into Idaho and it’ll get 50. And the first thing I go back, I’m seeing 50, 50, 50.

I see it all coming in slowly though. You can kind of see them now in a second, I would expect to after 20 seconds, I expect to get retried. These there’s no way we’re gonna get 50 minutes and 20 seconds to the all cute already. They’ve all been cute. Now. See, look now we’re starting to read. Because now we have some of them that took longer.

And so it’s okay. Tuesday to retry D an SDK, again, takes care of all that we try. So at 10 on that we tried the automatically, oh, it’s done already, man. That was fast. I know it was quick, but you can send it to somewhere like, you know, if we sent a hundred. And look, it just handled that load beautifully, right?

It didn’t hit those all at once. It just handles that load and make sure the Idaho doesn’t give it Idaho’s protected. Um, and then we get our data back in the customer. Whenever it’s a customer has a great experience because the customer is not like, what I didn’t want is me to be protecting, like, have this reserved concurrency over here a five.

And so the, you know, I’m customer. I hit Idaho with 10 requests at once. Um, then customer B comes over and tries to hit Idaho and they give the 4 29, like, Hey, sorry, too many rates. And he’s like, what? I only sent one request. I don’t understand that it was a bad experience to me, something I didn’t want.

And so this kind of handles that. And one bad thing. One thing that I didn’t know when I started this, and this is just me not understanding how SQS works and this is the biggest downfall I’ve had so far is that I have a passive charge even before I had any usage on this. I was getting charged like 30 to $40 a month.

And I was like, what? And this, because this is because this Lambda function, it doesn’t know when there’s something in the queue. So it’s always checking. So it goes over, it connects over there, it checks. And so it’s checking constantly. So with SQS, so look, first 1 million requests per month, that’s free. I was like, oh wow, easy, no requests.

That’s a lot, people are making millions and millions of requests. I’m happily paying for this amount. The first one we request, I go through and like a day, because I have 50 of these cues per environment, 50 per dev, 50 per prod. And they’re all checking all the time to see if there’s anything in there.

And each one of those checks, whether there’s something in there or not, that’s the request. So it ends up being passively for 50 queues with a product in a dev environment. It ends up being close to 30 to 40. So anyway, that’s one of the bad things about this. Now, the other way I could build up my own weight, the main thing I can sort and dynamo and check that animal, but this just made it so easy to just handle it, the throttling automatically.

And they kind of just make sure it’s all protected there. So I think that’s it. Um, what else we have to talk about this. It’s in there. Do you have configuration? We got our concurrency. So those are the steps, right? I put this. Oh, we didn’t see this. Oh yeah, we should run it again so we can see the, see them in the queue.

So we go over here to IO dev. We can send it. We’ll pull for center exceed messages, and we’re gonna send right here. See, they’re showing up. Look at all. See, these are all in the queue.

oh, oh, that’s the duration. See how these are here in the queue. It’s beautiful. I love it. It just works for the happy with AWS and what they’re doing. I think they did a cool. And that’s done. So it’s probably going to be cleared out now. So it’s pulling from messages. It’s not finding any because now we’re finished and that’s how you is this again?

This is my final diagram. Um, this is how use AWS SQS for we burst rate limiting. That’s kinda what I’m calling it and that’s all. Thank you.