Wednesday, November 27, 2024
Google search engine
HomeLanguagesPython BeautifulSoup – find all class

Python BeautifulSoup – find all class

Prerequisite:- Requests , BeautifulSoup

The task is to write a program to find all the classes for a given Website URL. In Beautiful Soup there is no in-built method to find all classes.

Module needed:

  • bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4

  • requests:  Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not come built-in with Python. To install this type the below command in the terminal.
pip install requests

Methods #1: Finding the class in a given HTML document.

Approach:

  • Create an HTML doc.
  • Import module.
  • Parse the content into BeautifulSoup.
  • Iterate the data by class name.

Code:

Python3




# html code
html_doc = """<html><head><title>Welcome  to neveropen</title></head>
<body>
<p class="title"><b>Geeks</b></p>
  
  
<p class="body">neveropen a computer science portal for Lazyroar
</body>
"""
  
# import module
from bs4 import BeautifulSoup
  
# parse html content
soup = BeautifulSoup( html_doc , 'html.parser')
  
# Finding by class name
soup.find( class_ = "body" )


Output:

<p class="body">neveropen a computer science portal for Lazyroar
</p>

 

Methods #2: Below is the program to find all class in a URL.

Approach:

  • Import module
  • Make requests instance and pass into URL
  • Pass the requests into a Beautifulsoup() function
  • Then we will iterate all tags and fetch class name

Code:

Python3




# Import Module
from bs4 import BeautifulSoup
import requests
  
# Website URL
  
# class list set
class_list = set()
  
# Page content from Website URL
page = requests.get( URL )
  
# parse html content
soup = BeautifulSoup( page.content , 'html.parser')
  
# get all tags
tags = {tag.name for tag in soup.find_all()}
  
# iterate all tags
for tag in tags:
  
    # find all element of tag
    for i in soup.find_all( tag ):
  
        # if tag has attribute of class
        if i.has_attr( "class" ):
  
            if len( i['class'] ) != 0:
                class_list.add(" ".join( i['class']))
  
print( class_list )


Output:

RELATED ARTICLES

Most Popular

Recent Comments