Lambda Pricing, Especially for Web Scrapers

AWS pricing can sometimes be confusing.

In short, almost always Lambda is going to be affordable (probably free) for web scrapers.

Here are some break points that you can remember as you are web scraping:

128mb is plenty for almost all web scraping
1532mb is what I use when using Puppeteer

Transcription:

Good morning. Hello? Okay. I’m Jordan Hansen. I’m going to talk, I’m sorry. Jordan Hanson. Coldwell intelligence today. I’m going to talk a little bit about the pricing of AWS Lambda. I don’t, I’m mostly talking about web scraping, but a lot of my stuff, pretty much all my stuff is done in the cloud with AWS.

I’ve grown to love AWS and been really, really a big fan of it. And I just wanted to talk a little bit about the pricing of AWS and how. And now it seems pretty simple and it is really there. Isn’t too much to go into. Sometimes I remember when I first came in, this is me. I was like, this is pricing.

Sometimes. It’s like, when you see these numbers, I’m like, what the freak does that even mean? Like, that seems so small that you will never make money. And I, we will never charge me. But it can be kind of expensive. I have a function that probably takes most of my costs. It’s over, it’s probably about a hundred bucks a month just for that function.

So you got. Be aware and really whenever you’re thinking of pricing, you should be comparing and contrasting to what the price would be elsewhere. So not just the price here, but also, you know, it would be cheaper to do it somewhere else. Anyway. So recently they introduced this arm, which is a different server.

So you can make your function on either one of these. This is new to me. So I, all my functions on this, I did try this a little. Chrome, AWS Lambda is not set up. So when you run a puppeteer on ADA or on Lambda Linux at all, you have to install on a, just some additional dependencies. And I’m guessing that those dependency, I don’t know enough about OSS to understand this very well, but I’m guessing those dependencies don’t aren’t don’t work on the arm.

Cause I tried over here at my pump it here, puppet here, function. Didn’t work. I’m trying to think now. It would be interesting because the price is actually quite a bit different. I will look at it cause my most expensive function, I don’t think it uses puppeteer and I can probably save myself some money.

Okay. So sorry. I’m short of breath a little bit today. So just typical function. This is, I think the other problem is you don’t know how much I’m functioning really needs to run. Now, most of the time, if you’re running things, this is going to be plenty to run. Anything, anything without puppet here, 1 28 is going to be fine.

Now, if you’re downloading big data sets, if you’re downloading files, you know, something else like that, you’re going to need more than that. But any kind of web scraping function, 129 megabytes, that’s what we use for almost everything. So two tiers, I guess. Right? We use this for anything. Let’s see. I’m sorry.

I’m just trying to compare. Can I see what we use exactly for the other one? We use this for anything that is just a basic fetch request. 1.8 megabytes. That’s all you’ll need. And if you’re going to go, if we go to puppet, here we go. All the way up to here. Now, puppet here. You could probably do a five 12, but I want it to have space.

I don’t want it to run out of memory. I’m trying to open a Chrome browser with 512. Megabytes is not that much. And yet. We got to worry about slowness and we’re not a memory. So anyway, let’s calculate that out. What you do is you just think, okay, well, how many minutes is my function running? Or how many seconds you could do to go through the whole thing, right?

Let’s say it’s running three minutes a day or 10 minutes, 15 minutes a day. So they’ll say we have 15 minutes, times, 60 seconds, times a thousand milliseconds times this gigantic number. Where’s that? How many zeros? I’m just going to copy it. This one right here. Excellent. That’s how much it is per day. So if you’re any time in this time, 20 megabyte, it’s pretty tough to have a significant bill.

30 days. Yeah. It’s like nothing. Right? As pennies. In fact, I don’t think anybody else even charges you if you’re below like a couple dollars they also have their free pricing tier which is. Over here, which they have up to a million requests per month and 4,000 gigabyte seconds. Now I, sorry, I forgot to mention this.

They do charge this amount sorry, this is that variable amount, but they do charge 20 cents per million requests. But I think for most people it’s going to be the, this is where you’re going to get. That’s going to get you here. So anyway, comparing those two it’s simple math, right? You just do that.

You just see how much it is, but what you probably need one 20. It’s probably plenty. If you’re just using puppeteer, then you probably 1500 or if you’re using something else, it needs more. But any kind of basic web scraping 120 megabytes is going to be fine. Now I’m kind of curious. So on my big function, I’m going to see how much I use my big function, because what it’s doing on that bigger function, it’s downloading a PDF and downloads the PDF.

And then. Parts of the PDF, it converted into an image and then it like parse it, Tesseract that image to get the data from it. So it using a lot in those 7,000 foot where we’re at this amount. So I’m over here, 7 1, 6, 8, and we run and we have to, because it’s so slow, you know, or downloading and parsing and converting, it’s really slow.

And so we probably run it two to three hours a night. So let’s say we run this thing three hours a night. So. Where to go three hours, times 60 minutes. And we go time, 60 seconds times a thousand milliseconds. So we have 10 million, 10 million milliseconds right there, times this now it’s $1 a day. Is that right?

I feel like I’m paying more than that for sure. $37 a month. I’m pretty sure I’m paying more than that. So maybe we might to run it for hours. I don’t know. I’m double that or something. It’s close to $1. So that’s how much it is now. I’m kind of curious what if, what if I was using arm here? So let’s go over here and try it there.

I need to get my history back. As soon as that seemed to have much different. That is their wait, is this number right? No, this number 10 million. 800, 800,000 there times.

Okay. So it saved me seven bucks a month. That would save me some money or whatever. Right. If let’s say we’re double that it’s even 14 bucks a month, so it’s not insignificant something to compare anyway, best Lambda pricing. I think the, probably the most important thing you need to know is this, the memory is often where your.

Hit, you’re not going to have 20 cents per million is nothing. I just get a good, if you’re hitting this thing with a million requests, 20 cents, you can afford 20 cents. It’s if you’re gonna use more memory. So if you’re gonna do something else, like quick hire with a puppet here, and if you’re going to be doing more kind of processing for a long time, now, if you’re doing web scraping, like constantly, let’s say you’re even doing this constantly or even multiple let’s let’s say one scrape where you want to running all the time.

Hold on. I want to. Let’s say how many? Okay. Okay. Well, how many seconds of a month we’ll come back to ourselves. I’ll come on. I see something they can easily do. It was that plus six

minutes at a month in there. Here we go. Here we go. Okay, wait this many minutes at a month, and this is we’re using the basic tier. So we have 43,000 minutes in a month. This many seconds, this many, oh, Nope. This many milliseconds, 2 billion, 2.6 billion milliseconds. And we’re going to run that times that times this.

So if we’re running it every minute of the month, so this scraper is running continuously. So your running is scheduled to run every 15 minutes forever. It just keep trying. So we’ve got that and we’d be like this. So that’s all it is. Oh, I multiply, attends a thousand. What did I do here? Where’s my 2 billion, 2 6 2 8 0 0 0 0 0 0 times.

Here we go. So $5 that’s a month. That’s not too bad. Look at that. You have one scraper running continuously all month. Let’s see. I had 20 of them. A hundred dollars. I don’t know to me that doesn’t seem if you’re running a scraper continuously, hopefully you’re getting some value out of this and hopefully paying a little bit of money.

Is that going to, that you should be used, you know, worth paying them. I need you. I mean, if you’re running it that long, 2 billion milliseconds, hopefully getting some value out of it. Okay. That’s it just wanted to talk about AWS Lambda pricing. I talk about Lambda a lot. I use it all the time. So I don’t think it’s, I think it’s pretty economical.

It’s going to get more expensive as you use more memory. That’s all talking forever. Peace Jordan out.