Saturday, November 16, 2024
Google search engine
HomeLanguagesHow to write the output to HTML file with Python BeautifulSoup?

How to write the output to HTML file with Python BeautifulSoup?

In this article, we are going to write the output to an HTML file with Python BeautifulSoup.  BeautifulSoup is a python library majorly used for web scraping but in this article, we will discuss how to write the output to an HTML file.

Modules needed and installation:

pip install bs4

Approach:

  • We will first import all the required libraries.
  • Make a get request to the desired URL and extract its page content.
  • Using the file data type of python write the output in a new file.

Steps to be followed:

Step 1: Import the required libraries.

Python3




# Import libraries
from bs4 import BeautifulSoup
import requests


Step 2: We will perform a get request to the Google search engine home page and extract its page content and make a soup object out of it by passing it to beautiful soup, and we will set the markup as html.parser.

Note: if you are extracting a xml page set the markup as xml.parser

Python3




# set the url to perform the get request
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using beautiful
# soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")


Step 3: We use the file data type of python and write the soup object in the output file. We will set the encoding to UTF-8. We will use .prettify() function on soup object that will make it easier to read. We will convert the soup object to a string before writing it.

We will store the output file in the same directory with the name output.html

Python3




# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string  
    file.write(str(soup.prettify()))


Below is the full implementation:

Python3




# Import libraries
from bs4 import BeautifulSoup
import requests
  
# set the url to perform the get request
page = requests.get(URL)
  
# load the page content
text = page.content
  
# make a soup object by using
# beautiful soup and set the markup as html parser
soup = BeautifulSoup(text, "html.parser")
  
# open the file in w mode
# set encoding to UTF-8
with open("output.html", "w", encoding = 'utf-8') as file:
    
    # prettify the soup object and convert it into a string
    file.write(str(soup.prettify()))


Output:

RELATED ARTICLES

Most Popular

Recent Comments