How to extract paragraph from a website and save it as a text file?

27 July 2024

2

Perquisites:

Scraping is an essential technique which helps us to retrieve useful data from a URL or a html file that can be used in another manner. The given article shows how to extract paragraph from a URL and save it as a text file.

Modules Needed

bs4: Beautiful Soup(bs4) is a Python library used for getting data from HTML and XML files. It can be installed as follows:

pip install bs4

urllib: urllib is a package that collects several modules for working with URLs. It can also be installed the same way, it is most of the in-built in the environment itself.

pip install urllib

Approach:

Create a text file.
Now for the program, import required module and pass URL and **.txt file path. This will make a copy of html code of that URL in your local machine.
Make requests instance and pass into URL
Open file in read mode and pass required parameter(s).
Pass the requests into a Beautifulsoup() function.
Create another file(or you can also write/append in existing file).
Then we can iterate, and find all the ‘p’ tags, and print each of the paragraph in our text file.

The implementation is given below:

Example:

Python3

import urllib.request
from bs4 import BeautifulSoup
 
# here we have to pass url and path
# (where you want to save your text file)
urllib.request.urlretrieve("https://www.geeksforgeeks.org/grep-command-in-unixlinux/?ref=leftbar-rightbar",
                           "/home/gpt/PycharmProjects/pythonProject1/test/text_file.txt")
 
file = open("text_file.txt", "r")
contents = file.read()
soup = BeautifulSoup(contents, 'html.parser')
 
f = open("test1.txt", "w")
 
# traverse paragraphs from soup
for data in soup.find_all("p"):
    sum = data.get_text()
    f.writelines(sum)
 
f.close()

Output:

How to extract paragraph from a website and save it as a text file?

Modules Needed

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

Google Messages can now show your profile exactly how it’s supposed to be

Recent Comments

EDITOR PICKS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR POSTS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR CATEGORY

ABOUT US

FOLLOW US