Puppeteer: End to End Testing framework
I saw a video a few days ago on DevTips where they attempted to use Puppeteer, I’ve never used it myself and thought it looked really cool. So I gave it a try and I’m sharing what I’ve learned here.
What is Puppeteer?
Before we just dive into the code it’s important to understand what a technology we’re using is and why it exists.
A Headless Browser
Puppeteer comes with Chromium and runs “headless” by default. What is a headless browser? A headless browser is a browser for machines. It has no UI and allows a program — often called a scraper or a crawler — to read and interact with it.
An API
Headless browsers are great and all, but they can be a pain to use sometimes. Puppeteer, however, provides a really nice API or set of functions for interacting with it.
Why use any of this?
There’s so much you can do with Puppeteer and web scraping in general!
- Make automated tests on a real web page,
- Generate PDFs
- Take screenshots
- Grab data from websites and save it
- Automate boring tasks
- Puppeteer specifically is perhaps the best tool you can use IMO
On with the code!
let’s get started!
Prerequisites
If you’re following along you’ll need NodeJS installed, basic knowledge of the command line, knowledge of JavaScript and knowledge of the DOM.
Note: Your scraper code doesn’t have to be perfect. When doing your own projects don’t overthink it.
Project Setup
- Make a folder ( name it whatever )
- Open the folder in your terminal/command prompt
- In your terminal run,
npm init -y
This will generate apackage.json
for managing project dependencies. - Then run
npm install puppeteer
This will install puppeteer which includes Chromium so don’t be surprised if it’s large. - Finally, open the folder in your favorite code editor and create an
index.js
file. You’ll also need these folders;screenshots
,pdfs
, andjson
if you’re following my example exactly.
A Simple Example
Now let’s try something simple ( but really cool! ) to verify that our setup is working. We’re going to take a screenshot of a web page and generate a PDF file.
Grabbing Data — Preparations
Using the same site from the example above we will grab some data and save it to a file. Let’s say in this scenario we only want the team name, year, wins and losses. The first step is to create some selectors.
A selector is just a path to the data. ( think CSS selectors ) We’ll come up with the paths here by using our browser’s developer tools. Open them on the page by opening your browser menu and looking for “developer tools”. I’ll be using Chrome and you can just press CTRL + Shift + I
to open them.
On the site open the elements tab in your developer tools and find what data you want to grab. Take note of its structure, classes, etc.
Grabbing Data — In Code
Time to apply this to our code.
The main part of this is page.evaluate()
this lets us run JS code in the browser and communicate back any data we want. This is all it takes to fetch data.
You may have noticed that we have access to the DOM here — this is the very nice and familiar API that Puppeteer provides!
Saving Data to a File
As a final touch, we’ll save this data to a file. In my case, I want the data in JSON format because that’s most easily used with JS.
- Load the file system module from node
- Convert the data to JSON with
JSON.stringify()
- Write the file with
fs.writeFile()
More Advanced Scraping
Puppeteer supports things like single-page applications ( SPA ), simulating input, tests and more. They’re beyond the scope of this tutorial, but you can find examples in the Puppeteer documentation.
References and Links
Thanks for reading! Leave any feedback or questions in the comments below.