Saturday, November 16, 2024
Google search engine
HomeLanguagesHow to remove empty tags using BeautifulSoup in Python?

How to remove empty tags using BeautifulSoup in Python?

Prerequisite: Requests, BeautifulSoup, strip

The task is to write a program that removes the empty tag from HTML code. In Beautiful Soup there is no in-built method to remove tags that has no content.

Module Needed:

  • bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
  • requests:  Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
pip install requests

Approach:

  • Get HTML Code
  • Iterate through each tag
    • Fetching text from the tag and remove whitespaces using the strip.
    • After removing whitespace, check If the length of the text is zero remove the tag from HTML code.

Example 1: Remove empty tag.

Python3




# Import Module
from bs4 import BeautifulSoup
  
# HTML Object
html_object = """
  
<p>
<p></p>
<strong>some<br>text<br>here</strong></p>
  
"""
  
# Get HTML Code
soup = BeautifulSoup( html_object , "lxml")
  
# Iterate each line
for x in soup.find_all():
  
    # fetching text from tag and remove whitespaces
    if len(x.get_text(strip=True)) == 0:
          
        # Remove empty tag
        x.extract()
  
# Print HTML Code with removed empty tags
print(soup)


Output:

<html><body><strong>sometexthere</strong>
</body></html>

Example 2: Remove empty tag from a given URL.

Python3




# Import Module
from bs4 import BeautifulSoup
import requests
  
# Page URL
  
# Page content from Website URL
page = requests.get( URL )
  
# Get HTML Code
soup = BeautifulSoup( page.content , "lxml" )
  
# Iterate each line
for x in soup.find_all():
  
    # fetching text from tag and remove whitespaces
    if len( x.get_text ( strip = True )) == 0:
  
        # Remove empty tag
        x.extract()
  
# Print HTML Code with removed empty tags
print(soup)


Output:

RELATED ARTICLES

Most Popular

Recent Comments