Prerequisite: Requests, BeautifulSoup, strip
The task is to write a program that removes the empty tag from HTML code. In Beautiful Soup there is no in-built method to remove tags that has no content.
Module Needed:
- bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
- requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
pip install requests
Approach:
- Get HTML Code
- Iterate through each tag
- Fetching text from the tag and remove whitespaces using the strip.
- After removing whitespace, check If the length of the text is zero remove the tag from HTML code.
Example 1: Remove empty tag.
Python3
# Import Module from bs4 import BeautifulSoup # HTML Object html_object = """ <p> <p></p> <strong>some<br>text<br>here</strong></p> """ # Get HTML Code soup = BeautifulSoup( html_object , "lxml" ) # Iterate each line for x in soup.find_all(): # fetching text from tag and remove whitespaces if len (x.get_text(strip = True )) = = 0 : # Remove empty tag x.extract() # Print HTML Code with removed empty tags print (soup) |
Output:
<html><body><strong>sometexthere</strong> </body></html>
Example 2: Remove empty tag from a given URL.
Python3
# Import Module from bs4 import BeautifulSoup import requests # Page URL # Page content from Website URL page = requests.get( URL ) # Get HTML Code soup = BeautifulSoup( page.content , "lxml" ) # Iterate each line for x in soup.find_all(): # fetching text from tag and remove whitespaces if len ( x.get_text ( strip = True )) = = 0 : # Remove empty tag x.extract() # Print HTML Code with removed empty tags print (soup) |
Output: