Beautifulsoup is a Python library used for web scraping. This powerful python tool can also be used to modify HTML webpages. This article depicts how beautifulsoup can be employed to extract a div and its content by its ID. For this, find() function of the module is used to find the div by its ID.
Approach:
- Import module
- Scrap data from a webpage
- Parse the string scraped to HTML
- Find the div with its ID
- Print its content
Syntax : find(tag_name, **kwargs)
Parameters:
- The tag_name argument tell Beautiful Soup to only find tags with given names. Text strings will be ignored, as will tags whose names that don’t match.
- The **kwargs arguments are used to filter against each tag’s ‘id’ attribute.
Below is the implementation:
Example 1:
Python3
#importing module from bs4 import BeautifulSoup markup = '''<html><body><div id="container">Div Content</div></body></html>''' soup = BeautifulSoup(markup, 'html.parser' ) #finding the div with the id div_bs4 = soup.find( 'div' , id = "container" ) print (div_bs4.string) |
Output:
Div Content
Example 2:
Python3
#importing module from bs4 import BeautifulSoup markup = markup = """ <!DOCTYPE> <html> <head><title>Example</title></head> <body> <p> Nested div </p> <div id="first"> Div with ID first <div id="second"> Div with id second </div> </div> </body> </html> """ # parsering string to HTML soup = BeautifulSoup(markup, 'html.parser' ) #finding the div with the id div_bs4 = soup.find( 'div' , id = "second" ) print (div_bs4.string) |
Output:
Div with id second