Getting Started With Puppeteer

Transcription:

Hello there. This is Jordan Hanson. I’m from cobalt intelligence. And today we’re talking about just puppeteer them getting started with puppeteer, this little brighter. That’s better getting started with puppeteer. Now I’m going to be walking through puppeteer a little bit. It is a headless browser, which means that if you’re looking to do things with.

The emulate a browser. This is what you want to use now. Puppeteer is in JavaScript. So I’m going to be doing it within a JavaScript framework. There is something that exists for iPhone. It’s called Piper tier. I’m not going to talk about that. I don’t know how to use it. Probably not going to do so.

Here we go. Puppet here. This is the package that Jason, and I’m going to have this code as a sample, so you can pull it whenever you want, but here’s puppeteer getting. This right here, feel free to use it. The dependencies, all I have is the newest puppeteer, and then I had the types for it because I use TypeScript and then TypeScript.

So if you’re not using TypeScript, you don’t need this stuff. And then you can format your packet slightly differently. You can pull this in NPM install and be good to go, which is this. So we’re going to come over here to puppet here

and we’re going to import it. And I’m going to set up an ACM clock here just so I can do an async await. And then you want to create a browser. Launch and there’s some option you can pass here. I’m going to pop in, have this false. Now what this does, this makes it, so your browser could pop up and it was.

Otherwise you won’t. So if you have this true, then you won’t see it. It’ll be true. We have this, you won’t see it pop up. You want to see any browser, but we want to watch what happens. So we’re going to have this false and whenever I’m starting a project, I always like to close the browser right there, because what it does is it launches this browser.

And unless you close it, your script will keep running. So because the browser is still there, it still had some process running in there. So then after that, we create a page.

The page there and then we just go somewhere. So let’s go page that go to four no’s and tips and tricks of a few things you need. So cobalt, intelligence.com. And because it’s fast, I’m going to say a wait paged out, wait four. And this is one of the things you. You have to handle timing and puppet here. It goes really fast.

It’s faster than any human, for sure. So I’m going to put a little timeout in there just so we can see it. It’s a seven and a half seconds and we hit start and you’ll see it. Do its work. That’s it. This is it to get stuff going pretty simple. So far, bam. There it is. Seven. A seconds is waiting and then.

Done. Come on. They’re easy. Now the cool thing is about puppeteer. It has easy jQuery selectors, so you can do a CSS selectors. You can also just slip also with XPath if familiar, if you’re familiar with that, but it’s like this constant title or constant H one. And we’ll say, wait, page dot evil and evil is a key thing you’re going to want to use.

This is the easiest way to get content. So if you’re doing the most content, if you’re don’t web scraping email is probably going to want it. You’re going to use 99% of the time, and you’re going to look for H one and then you’re going to get the element from it. And the same element they can do. Various things with this text.

Content is a common one. We’ll do that one first one. Like that. And then that’s around this baby. So remember, first things first we have go-to you got a launch puppeteer and the creative page. You’re going to go to the page and then you’re going to grab some stuff you want. And it’s still waiting for 700 seconds.

I’ll probably reduce that here. And they’ll close right there. And the guy there. Perfect. Let’s change this over at 2,500. Now you can also do things like, Hey, let’s say I’m going to,

let’s go to the walk and thousands. Let’s inspect this stuff and get like a URL. So it gets started URL. This one right here. Do I want it like that? Let’s see what if I go.

So we’ll just look for

trying to find the best way to find it. I’ve only go to nav and then a, and then end of type two. There we go. So you can also get attributes. That’s good. So text content, let’s say we get. Link, text equals awake page. That evil is my selector element. So first I’m going to get the text content to this right there.

And then I’m going to also get that link ref now visit. We’re going to show how to do that. He Val, same thing element, except for an element that attribute or get attribute is what it’s called and you pass it on one attribute you want. In our case, we want the. That’s a console. That log link stuff. We got the link text, the link H ref right there this time, I’m not gonna, I’m not, you’ll see how fast it goes.

I’m no timeout.

So with eight evil, you can get all that stuff right there. Bam got the link stuff. We got the title and the documentation so quick. That was bang. Bang. It goes fast. So it comes over. The text content and you get the good attribute ATF, you can get any attribute in there. HF is probably what you’re gonna use 99% of the time.

So evil, powerful. Now let’s say you have a list of things. Let’s get those. So we go constant. In that case, we’re going to have constant links. I’ll call them nav links. Now I’ll say a wait page dot. Now dollar dollar is going to get an array of items. And we’re going to go like this now, a, so all the links in the NAB, and we’ll say four, and you just loop through them.

This will, I’ll put an array and we go, I go NAB links, that link NAB, link it there. And then we say const, and we go it’s actually called it element handle. It was called Alvin, let’s say NAB link, text. And where are we at this?

I we go, and this will be it’s next time. You’re going to do it within the children, the child. So we’re going to say, Ooh, can I do that? Cause I’m grabbing those.

I don’t know if you can do that because we have to go. We have to have a selector. We’ll see what happens.

Going to have think stuff. That’s just my selector problems though. Not the problem with the code here. This is what you want to do when you go on and get an array of items, just like this, you’re going to get dollar about dollar dollar to get the array. And then you just live through them. May have to do a little bit differently.

We’ll see. Yeah, it broke because it doesn’t know this. Right here. Okay. So if I come over here and I can say, I’m going to say no, see they’re DevNet each, one’s their own a tag. Huh?

That’s not as fun as an example. There’s ways to do this. Of course we can just evaluate it, but evil is a lot more fun.

All right. Well, this is another tool that’s out there. See if I can do this whenever. It’s something like this, and then you go, I can’t remember. I’m not going to remember. I don’t use it enough. I look it up when I do it, but you can see like this let’s say this was sort of, these had dibs around each one of them.

They, you could pull them out, but because a valve needs to go in as a child, you have to have some kind of CSS selector. And because we’re already on this guy, we don’t have a CSS selector to use eval. Is there a better example? Let’s go over here and maybe we get these features are. That’s quite that yeah.

Feature label. There we go.

Yeah, I like that. That’ll be good. So let’s get the feature labels and then we’re going to get the texts right there.

The labels. And we’ll call this feature label again. I think I did the same thing, right? I did the exact same thing. It’s going to get. I really want it, like all this stuff, which is not very

come on

the long side here. They don’t have another one. For perfect. There we go. This will work. I think.

Yeah, well like this. So we probably this bam like that, then I go like this, it’s going to be dot feature. They will cause it’s a child. You get it. You know, we’re doing.

There you go like this.

Oh, you’re kidding me.

Well, that’s what you’re supposed to do. I do this a million times. I should’ve prepared this a little better, but that’s the idea. You can get the link and you go through and you get the children of it. You can do that by this. Cause this is now an element tenable. Okay. Trust me. All right. Now we’re going to go to clicking things.

I think the problem is, is that my feet, my selectors are not very good. I’d be kind of a pain. My sight would be like, all this look, all these selectors I’m using tailwind now. And man, this doesn’t make it easiest for selecting things about this.

And I say, yeah, there’s only one. Okay. And we’re going to click this button, we’ll say, oh, I don’t want this anymore though. After something like this, and we say, wait, page.click. And then we put the CS lecture we want in there right there, like that. Let’s say a wait, paged out, wait for time out. And we’ll get a nice 2,500 so we can see.

So you can click buttons with puppet here and that’s, what’s great too. Click buttons, drop downs, type in things, all of it. Anything you want to do, anything you can do with a browser, you can do with puppeteer clicks that button and it loads over to here. Perfect. Now what else clicks a button now. Also you can do things.

This is another cool thing right now. We’re waiting for timeout, which gives us a hard time out. We have to wait for 25. All right. 2,500 milliseconds. So what we do instead, because that’s not that we can waste time. There let’s go over here. Let’s click on this. Let’s look for a sun thing here. Is this an I frame it?

No, it’s not good. We could probably do button submit, right button submit.

Why is it type, come on. We’re looking for some selector here. Yeah. Like that, that should do it. I think.

Well, I’m going to try it anyway, because I think it should work. So we come over here instead of waiting for time out. What if we say great for selectors and we pass it into the selector. So then as soon as that selector appears in the page, it will complete its it’s script comes over here. It clicks it.

Bam done so, so much faster because it didn’t have to wait. So that really handles your things. So these are the, really the key things I think you’re going to want to know I’m, I’m tired of the video already. So when you’re starting, you can do great things like navigate the pages. This is how you select basic texts with evil.

You can do attributes, getting HRS. You can loop through them. You can get multiple. And then you can also click on buttons. You can wait for a selectors, like you’re managing your weight, your timeouts with puppeteers important. You got to remember the page is going to load. And if you immediately start to grab stuff before it’s there, then it’s going to fail because it hasn’t loaded into the page yet.

So wait for selector. Waiting somehow is important. Wait for selector is probably your best. The other ones don’t work as well. And we’ll talk about those, I think in another video. So there we go. That’s it. That’s getting started with puppet here. It’s a great library. There’s another library, similar to it called playwright, both very powerful, very good libraries.

So use them and love them. Thanks.