Getting Started With Playwright

Playwright is a headless browser tool that is similar to Puppeteer.

Although I am currently only using Puppeteer in all of my web scraping scripts, the entire Puppeteer moved over to Microsoft and started Playwright a few years ago so I thought that it would be worth checking out. Playwright seems great and useful. There are some nice things and the API is almost identical to Puppeteer. At this point I don’t find anything compelling enough to make me considering switching to Playwright since Puppeteer is continuing to be supported by Google.

Here’s my sample code – https://github.com/cobalt-intelligence/getting-started-with-playwright

Playwright – https://playwright.dev/

Transcription:

Hi, I’m Jordan Hanson and I’m cobalt intelligence. And today we’re going to talk about something called playwright. Now I use puppet here. If you’ve seen any of my videos, I mentioned puppeteer and almost all of them puppeteers, a headless browser. It pops up. It helps you navigate the web, do stuff. The web web automation, whatever web scraping with almost like a browser, it’s pretty much Chrome where you can boot it up and it’ll go into login things, do all that kind of stuff.

Now, playwright the original puppeteer team was a puppet here as a Google project, a was made by the Chrome developer tool team. And pretty much the whole team moved from Chrome over to Microsoft. Microsoft hired them all and they started building playwright. And so that’s a playwright is. A lot of the stuff is similar.

I think the API is very similar. But today we’re going to go through it. Now. It’s kind of funny. Look, see, I use puppet here so much. I typed in the same as directory and I called it getting started with puppeteer. It’s not getting started with puppeteer. It’s getting started the playwright and I have a get hub.

It, that one is to get hub is named correctly. Now I have only installed. My Pakistan, Jason, all I have is playwright. That’s it. Now maybe we’ll need more. I don’t know. I’ve used playwright before, but it’s not been a lot, so we’re going to go through it. I’m pretty sure we’re going to be able to do some basic things.

And we’re going to try to do that here. Pretty much the same as puppeteer. So I’m kind of curious how much I can do just from memory, just from. So let’s try it. So we’re going to do first, we’re going to import playwright. We’re going to inboard play right there, right there. Open an AC block because it’s getting these promises.

I want easy ways to do it. So I guess I’m going to boot up a browser. I’ll wait to play right. I’ll launch. So you gotta go chromium. Yeah, look at that. Okay. So you can pass it for browsers first change. So in puppet, here you go dot launch right here and said, you can choose which browser you want. That is very interesting.

We’re going to start with chromium though, but let’s try out firebox later too. And we’re going to go ahead. This don’t want to say false because we want to see it and we’re going to close this browser at the end because we don’t want to stay open and then let’s go con’s page equals await browser.new page right there.

And then let’s go somewhere KJ go to, and it’s go to, and it’s called Bolton intelligence.com. And then we’re going to say await age dot wait for timeout, but let’s think okay. Maybe there are more stuck here sometime. I probably should change over. I don’t know. Just so used to puppeteer. It feels it’s been really good for me.

I don’t know. Let’s see what happens here.

Here we go. There’s our browser pops up. Interesting. It doesn’t, it has a full thing right here. So if I go Firefox here, I don’t think I have Firefox installed, but maybe they all come with it. I don’t know. I don’t know how to do this on Lambda either. That seems it could be tricky. There’s Firefox right there.

I guess. I mean, it’s not Chrome. I don’t know. What else we got in there. That’s easy to use different browsers. That’s cool. Now what if I go Android? I can’t that launch that what?

I don’t know what this is. I’m just picking the first device in there. Okay. Well, all right. As a long shot, I don’t know how to do that yet. We’re not gonna worry about that right now. We’re gonna go stick with that chromium. Did I have another one? WebKit where kids, like, I think I got safari, isn’t it?

What is web kit

to browse or engine and wait, it did not pop up. It didn’t pop up to it or did it, did it pop up? Did I miss it? I was just over here and I, oh, there it is.

It doesn’t look very fancy there. Huh? What is it? What’s is used as a rendering engine within safari. So maybe because I am not on a Mac, it’s kinda get up. Okay. So first things first that all worked easy. Now let’s go. Let’s get some examples here. Yeah, sure. Cobalt’s intelligent right there. And we’re going to say cons title equals await.

Wait, page dot. What do I want, I want you to now see if this is the same thing, title, want element element that text content so far so good. All the same I’ve even listed. Beautiful. Okay. So differences here is the pages full screen and puppet here that the view ports small it’s a set. The viewport would be big.

Doesn’t really matter, right? That’s not a big deal. Of course. I’m using. WebKit still sweet. Here we go. This is my title. Title works. Great. Evil. That’s all good. I’m going to go to chromium. Let’s click stuff. I hear we’ll just say a, whatever. This is something NAB link one, something like that. Nav link and type.

Actually let’s go somewhere else. So Naveah type four, because that one right there. It’s just a different pay. Yeah, I bet. Huh. So that’s going to go there. There we go. Wait, page.click. There we go. Now I am not hold on. Then that’s get the title again.

What I’m watching for is timeouts. How’s navigation going to work. If I click this, here is the page. Is it going to wait at all? Do any kind of waiting. Let’s see. Good. It should go to a blog and it’s gonna take a little bit, tend to navigate. We’ll see how fast it is, are different than it is.

I don’t think a weighted. Let’s try that again.

Yeah. Okay. Now let’s go posts like this constant topics, goals, await page that you Val think it’s top. I’m just going from memory here.

It’s a constant, that log topics. So what I’m grabbing now are the first, so we can go over here to my blog. These right here, I think are topics to do. And it’s just okay. That mobile, huh? Whatever. Okay. These are the topics. And so I just want to get them in, log them out, like just the titles what’s going on with it.

Why is the ax? Yes.

Oh, that’s the T that’s actually the image. Okay. This looks weird. So I would expect it to have the adult names of these here,

but what I’m trying to see is ha this is what I was worried about now. What if I change it? Wait, I don’t know how you’d avoid this email. Wait, page, paged out. Wait for selector now. Because what had happened is it hadn’t rendered yet topics wasn’t on the page yet? The title apparently was there already, but the, by the time it got there.

Yeah. See, look, see, here we go. I have to make sure it arrives first, but this right here was on there. So this is all, all so far, everything in the same, this puppet here we wait for selectors. We’re clicking stuff. Let’s type in some things let’s go like a New York secretary of state or something.

And then let’s go over here. We’ll say, wait, pays down. Wait for, oh wait, pays that. Go to

I’ll. Wait, page dot type and type into this.

All right there. And then we’re going to say, oh, we got to type something tonight. Second pizza string now. Cause we’re not fools. Okay. And then we go, wait, page dot, click with a click, some stuff. Let’s click these one of these mobile

click guy. What limited liability company. Yeah, I like that better.

And then we want to click our button, that Syria,

and then one. Okay. So we’ll click that. Wait, page, click like that.

Now, if it works right, we should see it. Search there. Let’s see. Let me go here. Click on my blog, navigates. It’s going to, there, there, that looks great. It all feels the same as puppeteer. I don’t know to me, but I’m working within my framework. Right? My knowledge is only puppeteer stuff. So that’s what. But I didn’t have any problems with that.

That’s good. What happens if I go to what’s a rough one, Ohio is kind of a rough one.

Detection is over here.

Be search. Yo, what do you think there was be search all business search. Got every selector. Remember every selector is named. D’s in here. Nope. Or type button though. We’re gonna have to click this either. Every selector is named by a developer. So some human is in there naming every single selector.

Pretty much. No, I know some of the component driven libraries generate their own.

That doesn’t look like we clicked it. Does it.

What does this waiting for? Selective type. Ooh, look at this. This is kind of all, there was a lot of them. This is cool. 24. Okay. So it’s like giving me, can I do that again? I want to try it again. Hold on.

Okay. It was telling me some other good debug information.

What do I was waiting so long though? Why is it waiting? Did the air only pop up when I hit control C, this is not going to say anything.

Okay. So it was me hitting this. If I hit control C it says, okay, I see we have an error. We found 24 elements and we’re proceeding with the first one select results to a hidden. Attempting to click action, waiting for Elena and be visible and enabled in the stable. I love that. It’s not visible waiting.

That is some cool information. Now, why does it only happen? I go control seat. What if I have it on Lambda and it runs, and it does that don’t know about that. All right. So I need something more specific than this. Can I do this like value? It’s always worked for me. There’s multiple. Why are there multiple?

They’re all in here. Okay. All right. Well, I need somebody inside this.

Nope. This one.

Can we go,

Hey, let’s try that now. No, we’re checking for like anti scraping stuff. They have CloudFare and they use it pretty rigorously on Ohio. Okay, so it clicked. I might didn’t see anything. Did you let’s go like this. Let’s go over here and say, wait, wait for time out 15 seconds right there. And we’re going to inspect the dev tools here.

I’m going to get there.

where we are. 15 seconds is to navigate that in just a moment.

I would think this would clear. I got to say it, right? Yeah. Ah, by, well, three. Oh man. I to look at that, so that again,

waiting 15 seconds. We were supposed to have the devils open. Whew, glad I had that. 15 second. Wait here. Was it sitting here kicking it? It’s like a fool. And look at that Bible three would go cloud fire hit it. Okay. So web scraping protection about the same. Okay. Now we’re gonna just look glance at some documentation to see if anything’s cool.

So getting started with playwright. If you’re here, it looks super simple, very similar to puppeteer. If you use puppeteer before, you’re gonna know what to do. If you haven’t, these are the main tenants you need to know, you create a browser by instantiating something with. So normally it’s playwright dot, whatever, but it looks like in this part, you’ve got to call the browser you want, which is cool.

This is probably the number one difference I’ve had so far is that you, instead of puppeteer dot launch, you can select the browser immediately, which is very cool browser and apparently device, although I don’t know how that works. And then you launch it had default will be held. That’s true. I’m sure. You do a bunch of other stuff and then you’re going to create a page instance in that page is going to be your workhorse.

This is what you’re really getting. And then you things like what we want to navigate, do something you will go to, and then you can grab the data with evalu, hit the selector and you want, and you get the text content. You could also go at element dot attribute this, or maybe it’s get attribute like that.

And then you pass it in like HF. Right? So find the HF out of there that ticket the HF and its output texts that is HF. So text content right there. Then if you want to click things, you just easily click buffer. If you want to wait for, you know, you’re gonna wait for a selector. You, this is key. Timing is a big deal.

When you usually have this browsers, you want to make sure your page is loaded before you started doing stuff. Otherwise it will fail as we saw. So you have to be careful with that. So you got to wait for correctly and only selectors is the best one. I have a video about wait for that. Just came out. I think right before this I’ll probably have like last Monday or no, I don’t.

I have all scheduled, so I’m not sure. And then same thing away for 10 out, but you don’t want to use this normally. Otherwise it’s going to be go to, you’re going to fill out. This is how you fill out an input. You type it into ID or whatever the input field is. You put it in there, click on the buttons again.

Always close the browser at the end of the wiser scriptable remain open that’s playwright. That’s getting started with playwright right there, not puppeteer ignore this. Ah, but, and again, hub will have the repo and the repository with all this in there. And you can use it now. I’m trying to see just, this is kind of funny cause they build it for a test runner.

I wonder how they feel that it’s a really used a lot for web scraping. I dunno, they answer questions on, get, have all about it. About this team at Tina mentioned before this is tests right here or in test here, have things built in there. Okay. Assertions, trying to see if there’s anything else cool that I’m missing.

Okay. I don’t want texts. Pictures. I’m trying to see commands here. More documentation somewhere. Nope. I want auto waiting. Was it. Hi, this is cool.

Is there a check,

but this message checks and element matching selected by one. The first element matching selector, there is none. And wait until the Dom care ensured. The mash element is a checkbox. Oh. So it checks before you do any checking. It does it. Now what about type or what did we grab okay, so top table I wait till I see if it’s attached, but not, it doesn’t care if it’s visible, anything like that.

Connie,

you can also force actions. I don’t know if this exists. , this is kind of neat stuff. I’m not going to go over all this. This isn’t very helpful for getting started video. There we go. Here’s your basic framework. You use the code. Hope you enjoy it. Play right. That’s great. It looks fine. I think it’s very functional.

The end. This video is not sponsored by playwright or anyone it’s sponsored by me. Cobalt’s intelligence.