Extract the HTML code of the given tag and its parent using BeautifulSoup

By Ted Musemwa

26 July 2024

0

3

In this article, we will discuss how to extract the HTML code of the given tag and its parent using BeautifulSoup.

Modules Needed

First, we need to install all these modules on our computer.

BeautifulSoup: Our primary module contains a method to access a webpage over HTTP.

pip install bs4

lxml: Helper library to process webpages in python language.

pip install lxml

requests: Makes the process of sending HTTP requests flawless.the output of the function.

pip install requests

Scraping A Sample Website

We import our beautifulsoup module and requests. We declared Header and added a user agent. This ensures that the target website we are going to web scrape doesn’t consider traffic from our program as spam and finally gets blocked by them.

Python3

# importing the modules 
from bs4 import BeautifulSoup 
import requests 
  
# URL to the scraped 
URL = "https://en.wikipedia.org/wiki/Machine_learning"
  
# getting the contents of the website and parsing them 
webpage = requests.get(URL) 
soup = BeautifulSoup(webpage.content, "lxml") 

Now to target the element about which you want to get the info right click it and click inspect element. Then from the inspect element window try to find an HTML attribute that is unique to others. Most of the time it’s the Id of the element.

Here to extract the HTML of the title of the site, we can extract this easily using the id of the title.

Python3

# getting the h1 with id as firstHeading and printing it 
title = soup.find("h1", attrs={"id": 'firstHeading'}) 
print(title) 

Now extracting the content of the concerned tag, we can simply use the .get_text() method. The implementation would be as below:

Python3

# getting the text/content inside the h1 tag we 
# parsed on the previous line 
cont = title.get_text() 
print(cont) 

Now to extract the HTML of the parent element of a concerning element, let’s take an example of a span having the ID “Machine_learning_approaches”.

We need to extract it that displays the HTML in lists of lists form.

Python3

# getting the HTML of the parent parent of  
# the h1 tag we parsed earlier 
parent = soup.find("span",  
                   attrs={"id": 'Machine_learning_approaches'}).parent() 
print(parent)

Below is the complete program:

Python3

# importing the modules 
from bs4 import BeautifulSoup  
import requests  
  
# URL to the scraped 
URL = "https://en.wikipedia.org/wiki/Machine_learning"
  
# getting the contents of the website and parsing them 
webpage = requests.get(URL)  
soup = BeautifulSoup(webpage.content, "lxml") 
  
# getting the h1 with id as firstHeading and printing it 
title = soup.find("h1", attrs={"id": 'firstHeading'}) 
print(title) 
  
# getting the text/content inside the h1 tag we  
# parsed on the previous line 
cont = title.get_text() 
print(cont) 
  
# getting the HTML of the parent parent of  
# the h1 tag we parsed earlier 
parent = soup.find("span",  
                   attrs={"id": 'Machine_learning_approaches'}).parent() 
print(parent)

Output:

Extract the HTML code of the given tag and its parent using BeautifulSoup

Modules Needed

Scraping A Sample Website

Python3

Python3

Python3

Python3

Python3

You can also refer to this video for an explanation:

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

This is my surprise phone of the year [Video]

Recent Comments

EDITOR PICKS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR POSTS

Sticky Password vs. LastPass 2024: Which Is Better? by Katarina Glamoslija

Galaxy S25 on-device AI capability expands, reducing reliance on the cloud

OnePlus 13R launches with a huge battery upgrade, starting in China

POPULAR CATEGORY

ABOUT US

FOLLOW US