How to scrape the web with Playwright in Python

26 July 2024

0

In this article, we will discuss about Playwright framework, Its feature, the advantages of Playwright, and the Scraping of a basic webpage.

The playwright is a framework for Web Testing and Automation. It is a fairly new web testing tool from Microsoft introduced to let users automate webpages more efficiently with fewer initial requirements as compared to the already existing tool Selenium. Although Playwright is significantly better than Selenium in terms of speed, usability, and reliability, It allows testing Chromium, Firefox, and WebKit with a single API. The playwright is built to enable cross-browser web automation that is reliable, and fast.

Features of Playwright

Headless execution.
Auto wait for elements.
Intercept network activity.
Emulate mobile devices, geolocation, and permissions.
Support web components via shadow piercing selectors.
Capture video, screenshots, and HAR files.
Contexts allow for isolated sessions.
Parallel execution.

Advantages of Playwright

Cross-browser executable
Completely open source
Well documentation
Executes tests in parallel
API testing
Context isolation
Python support

Creating a Python virtual environment

It is always advisable to work in a separate virtual environment specifically if you are using a particular library. Here, we are creating a virtual environment “venv” and activating it.

Creating virtual environment

virtualenv venv

Activating it

venv/Scripts/activate

Installing and setting up Playwright:

pip install playwright
playwright install

Automating and scraping data from a webpage

After installing the Playwright library, now it’s time to write some code to automate a webpage. For this article, we will use quotes.toscrape.com.

Step 1: We will import some necessary packages and set up the main function.

Python3

from playwright.sync_api import sync_playwright
 
def main():
    pass
 
if __name__ == '__main__':
    main()

Step 2: Now we will write our codes in the ‘main’ function. This code will open the above webpage, wait for 10000 milliseconds, and then it will close the webpage.

Python3

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    page.goto('https://quotes.toscrape.com/')
    page.wait_for_timeout(10000)
    browser.close()

Step 3: This will select all boxes with the ‘author’ class with for loop, and we will iterate through each element and will extract the quote and its author name. It always makes recommended to use a Python dictionary to store different data fields with key and value pairs. After that, we are printing out the dictionary in the terminal.

Python3

all_quotes = page.query_selector_all('.quote')
for quote in all_quotes:
    text = quote.query_selector('.text').inner_text()
    author = quote.query_selector('.author').inner_text()
    print({'Author': author, 'Quote': text})
page.wait_for_timeout(10000)
browser.close()

Code Implementation

Complete code to scrape quotes and their authors:

Python3

from playwright.sync_api import sync_playwright
 
 
def main():
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        page = browser.new_page()
        page.goto('https://quotes.toscrape.com/')
        all_quotes = page.query_selector_all('.quote')
 
        for quote in all_quotes:
            text = quote.query_selector('.text').inner_text()
            author = quote.query_selector('.author').inner_text()
            print({'Author': author, 'Quote': text})
 
        page.wait_for_timeout(10000)
        browser.close()
 
 
if __name__ == '__main__':
    main()

Output :

How to scrape the web with Playwright in Python

Features of Playwright

Advantages of Playwright

Creating a Python virtual environment

Automating and scraping data from a webpage

Python3

Python3

Python3

Code Implementation

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

Google Messages can now show your profile exactly how it’s supposed to be

Recent Comments

EDITOR PICKS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR POSTS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR CATEGORY

ABOUT US

FOLLOW US