Python – Find text using beautifulSoup then replace in original soup variable

27 July 2024

0

Python provides a library called BeautifulSoup to easily allow web scraping. BeautifulSoup object is provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. The BeautifulSoup object represents the parsed document as a whole. In this article, we’ll be scraping a simple website and replacing the content in the parsed “soup” variable.

For the purpose of this article, let’s create a virtual environment (venv) as it helps us to manage separate package installations for different projects and to avoid messing up with dependencies and interpreters!

More about, how to create a virtual environment can be read from here: Create a virtual environment

Creating a virtual environment

Navigate to your project directory and run this command to create a virtual environment named “env” in your project directory.

python3 -m venv env

Activate the “env” by typing.

 source env/bin/activate

Having interpreter activated, we can see the name of an interpreter in our command line before :~$ symbol

Installing required modules

BeautifulSoup: A library to scrape the web pages.

pip install bs4

requests: This makes the process of sending HTTP requests.

pip install requests

Step-by-step Approach

Let’s start by importing libraries and storing “GET” requests response in a variable.

Python3

import bs4
from bs4 import BeautifulSoup
import requests
 
# sending a GET req.
response = requests.get("https://isitchristmas.today/")
print(response.status_code)

Output:

A status of 200 implies a successful request.

Now let’s parse the content as a BeautifulSoup object to extract the title and header tags of the website (as for this article) and to replace it in the original soup variable. The find() method returns the first matching case from the soup object.

Python3

# create object
soup = BeautifulSoup(r.text, "html.parser")
 
# find title
title = soup.find("title")
 
# find heading
heading = soup.find("h1")
 
print(title)

Output:

Replacing the content of the parsed soup obj with the “.string” method.

Python3

# replace
title.string = "Is GFG day today?"
heading.string = "Welcome to GFG"

Output:

Thus, the title tag and heading tags have been replaced in the original soup variable.

Note: We can’t push the modified page back to the website as those pages are rendered from servers where they are hosted.

Below is the complete program:

Python3

import bs4
from bs4 import BeautifulSoup
import requests
 
 
# sending a GET requests
response = requests.get("https://isitchristmas.today/")
 
# a status 200 implies a successful requests
#print(response.status_code)
 
soup = BeautifulSoup(response.text, "html.parser")
#print(soup)
 
title = soup.find("title")
heading = soup.find("h1")
 
# replacde
title.string = "Is GFG day today?"
heading.string = "Welcome to GFG"
 
# display replaced content
print(soup)
# The title and the heading tag contents 
# get changed in the parsed soup obj.

Output:

Python – Find text using beautifulSoup then replace in original soup variable

Creating a virtual environment

Installing required modules

Step-by-step Approach

Python3

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

Recent Comments

EDITOR PICKS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR POSTS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR CATEGORY

ABOUT US

FOLLOW US