In this article, we are going to write the output to an HTML file with Python BeautifulSoup. BeautifulSoup is a python library majorly used for web scraping but in this article, we will discuss how to write the output to an HTML file.
Modules needed and installation:
pip install bs4
Approach:
- We will first import all the required libraries.
- Make a get request to the desired URL and extract its page content.
- Using the file data type of python write the output in a new file.
Steps to be followed:
Step 1: Import the required libraries.
Python3
# Import libraries from bs4 import BeautifulSoup import requests |
Step 2: We will perform a get request to the Google search engine home page and extract its page content and make a soup object out of it by passing it to beautiful soup, and we will set the markup as html.parser.
Note: if you are extracting a xml page set the markup as xml.parser
Python3
# set the url to perform the get request page = requests.get(URL) # load the page content text = page.content # make a soup object by using beautiful # soup and set the markup as html parser soup = BeautifulSoup(text, "html.parser" ) |
Step 3: We use the file data type of python and write the soup object in the output file. We will set the encoding to UTF-8. We will use .prettify() function on soup object that will make it easier to read. We will convert the soup object to a string before writing it.
We will store the output file in the same directory with the name output.html
Python3
# open the file in w mode # set encoding to UTF-8 with open ( "output.html" , "w" , encoding = 'utf-8' ) as file : # prettify the soup object and convert it into a string file .write( str (soup.prettify())) |
Below is the full implementation:
Python3
# Import libraries from bs4 import BeautifulSoup import requests # set the url to perform the get request page = requests.get(URL) # load the page content text = page.content # make a soup object by using # beautiful soup and set the markup as html parser soup = BeautifulSoup(text, "html.parser" ) # open the file in w mode # set encoding to UTF-8 with open ( "output.html" , "w" , encoding = 'utf-8' ) as file : # prettify the soup object and convert it into a string file .write( str (soup.prettify())) |
Output: