Remove all style, scripts, and HTML tags using BeautifulSoup

28 July 2024

2

Prerequisite: BeautifulSoup, Requests

Beautiful Soup is a Python library for pulling data out of HTML and XML files. In this article, we are going to discuss how to remove all style, scripts, and HTML tags using beautiful soup.

Required Modules:

bs4: Beautiful Soup (bs4) is a python library primarily used to extract data from HTML, XML, and other markup languages. It’s one of the most used libraries for Web Scraping.
Run the following command in the terminal to install this library-

pip install bs4

requests: This library is used for making HTTP requests in python.
Run the following command in the terminal to install this library-

pip install requests

Approach:

Import bs4 library
Create an HTML doc
Parse the content into a BeautifulSoup object
Iterate over the data to remove the tags from the document using decompose() method
Use stripped_strings() method to retrieve the tag content
Print the extracted data

Implementation:

Python3

# Import Module
from bs4 import BeautifulSoup
 
# HTML Document
HTML_DOC = """
              <html>
                <head>
                    <title> GeeksforLazyroar </title>
                    <style>.call {background-color:black;} </style>
                    <script>getit</script>
                </head>
                <body>
                    is a
                    <div>Computer Science portal.</div>
                </body>
              </html>
            """
 
# Function to remove tags
def remove_tags(html):
 
    # parse html content
    soup = BeautifulSoup(html, "html.parser")
 
    for data in soup(['style', 'script']):
        # Remove tags
        data.decompose()
 
    # return data by retrieving the tag content
    return ' '.join(soup.stripped_strings)
 
 
# Print the extracted data
print(remove_tags(HTML_DOC))

Output:

GeeksforLazyroar is a Computer Science portal.

Removing all style, scripts, and HTML tags from an URL

Approach:

Import bs4 and requests library
Get content from the given URL using requests instance
Parse the content into a BeautifulSoup object
Iterate over the data to remove the tags from the document using decompose() method
Use stripped_strings() method to retrieve the tag content
Print the extracted data

Implementation:

Python3

# Import Module
from bs4 import BeautifulSoup
import requests
 
# Website URL
URL = 'https://www.geeksforgeeks.org/data-structures/'
 
# Page content from Website URL
page = requests.get(URL)
 
# Function to remove tags
def remove_tags(html):
 
    # parse html content
    soup = BeautifulSoup(html, "html.parser")
 
    for data in soup(['style', 'script']):
        # Remove tags
        data.decompose()
 
    # return data by retrieving the tag content
    return ' '.join(soup.stripped_strings)
 
 
# Print the extracted data
print(remove_tags(page.content))

Output:

Remove all style, scripts, and HTML tags using BeautifulSoup

Python3

Removing all style, scripts, and HTML tags from an URL

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

5 Best VPNs With Ad Blockers in 2025: 100% Tested by Tim Mocan

How to Watch IPTV From Anywhere in 2025: Full Guide by Gjurgjica Panova

Google’s Gemini AI will expand to your car, headphones, and watches soon

A $100 gift card is yours when you pick up the Pixel 9a — no trade-in needed

Recent Comments

EDITOR PICKS

5 Best VPNs With Ad Blockers in 2025: 100% Tested by Tim Mocan

How to Watch IPTV From Anywhere in 2025: Full Guide by Gjurgjica Panova

Google’s Gemini AI will expand to your car, headphones, and watches soon

POPULAR POSTS

5 Best VPNs With Ad Blockers in 2025: 100% Tested by Tim Mocan

How to Watch IPTV From Anywhere in 2025: Full Guide by Gjurgjica Panova

Google’s Gemini AI will expand to your car, headphones, and watches soon

POPULAR CATEGORY

ABOUT US

FOLLOW US