How to use Xpath with BeautifulSoup ?

28 July 2024

3

In this article, we will see how to use Xpath with BeautifulSoup. Getting data from an element on the webpage using lxml requires the usage of Xpaths. XPath works very much like a traditional file system

Module needed and installation:

First, we need to install all these modules on our computer.

BeautifulSoup: Our primary module contains a method to access a webpage over HTTP.

pip install bs4

lxml: Helper library to process webpages in python language.

pip install lxml

requests: Makes the process of sending HTTP requests flawless.the output of the function

pip install requests

Getting data from an element on the webpage using lxml requires the usage of Xpaths.

Using XPath

XPath works very much like a traditional file system.

To access file 1,

C:/File1

Similarly, To access file 2,

C:/Documents/User1/File2

To find the XPath for a particular element on a page:

Right-click the element in the page and click on Inspect.
Right-click on the element in the Elements Tab.
Click on copy XPath.

Approach

Import module
Scrap content from a webpage
Now to use the Xpath we need to convert the soup object to an etree object because BeautifulSoup by default doesn’t support working with XPath.
However, lxml supports XPath 1.0. It has a BeautifulSoup compatible mode where it’ll try and parse broken HTML the way Soup does.
To copy the XPath of an element we need to inspect the element and then right-click on it’s HTML and find the XPath.
After this, you can use the .xpath method available in etree class of lxml module to parse the value inside the concerned element.

Note: If XPath is not giving you the desired result copy the full XPath instead of XPath and the rest other steps would be the same.

Given below is an example to show how Xpath can be used with Beautifulsoup

Program:

Python3

from bs4 import BeautifulSoup
from lxml import etree
import requests
  
  
URL = "https://en.wikipedia.org/wiki/Nike,_Inc."
  
HEADERS = ({'User-Agent':
            'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 \
            (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',\
            'Accept-Language': 'en-US, en;q=0.5'})
  
webpage = requests.get(URL, headers=HEADERS)
soup = BeautifulSoup(webpage.content, "html.parser")
dom = etree.HTML(str(soup))
print(dom.xpath('//*[@id="firstHeading"]')[0].text)

Output:

Nike, Inc.

How to use Xpath with BeautifulSoup ?

Module needed and installation:

Using XPath

To find the XPath for a particular element on a page:

Approach

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

NordVPN Not Working in China? Try These Tips by Tim Mocan

Interview with Ihor Demkovych – Chief Security Officer and Head of Engineering at Geniusee by Shauli Zacks

6 Best (REALLY FREE) iPad & iPhone Antivirus Apps in 2025 by Katarina Glamoslija

The Evolution of Phishing Attacks and How to Combat Them Copy by

Recent Comments

EDITOR PICKS

NordVPN Not Working in China? Try These Tips by Tim Mocan

Interview with Ihor Demkovych – Chief Security Officer and Head of Engineering at Geniusee by Shauli Zacks

6 Best (REALLY FREE) iPad & iPhone Antivirus Apps in 2025 by Katarina Glamoslija

POPULAR POSTS

NordVPN Not Working in China? Try These Tips by Tim Mocan

Interview with Ihor Demkovych – Chief Security Officer and Head of Engineering at Geniusee by Shauli Zacks

6 Best (REALLY FREE) iPad & iPhone Antivirus Apps in 2025 by Katarina Glamoslija

POPULAR CATEGORY

ABOUT US

FOLLOW US