Searching the parse tree means we need to find the tag and the content of the HTML tree. This can be done in many ways. But the most used method for searching the parse tree is the find() and find_all() method. With the help of this, we can parse the HTML tree using Beautifulsoup.
For Searching the parse tree follow the below steps.
Step 1: For scraping, we need to import the beautifulsoup module and import the requests method to request the website page.
from bs4 import BeautifulSoup import requests
Step 2: The Second step will be to create a soup of the website or HTML page with the HTML parser and beautifulsoup function.
BeautifulSoup(sample_website, 'html.parser')
Step 3: We can search the parse tree with two methods in soup first is find method and the second is find all method. In the find method, it will return the first HTML tree which will satisfy the condition and find_all method will return all the HTML parse tree which will satisfy the condition.
Example 1: Using the find() method
Python3
from bs4 import BeautifulSoup import requests # sample website # call get method to request the page page = requests.get(sample_website) # with the help of BeautifulSoup method and # html parser created soup soup = BeautifulSoup(page.content, 'html.parser' ) # With the help of find method perform searching # in parser tree print (soup.find( 'th' )) |
Output:
<th>S.No.</th>
Example 2: Using find_all() method
Python3
from bs4 import BeautifulSoup import requests # sample website # call get method to request the page page = requests.get(sample_website) # with the help of BeautifulSoup method and html # parser created soup soup = BeautifulSoup(page.content, 'html.parser' ) # With the help of find_all method perform searching # in parser tree print (soup.find_all( 'th' )) |
Output:
[<th>S.No.</th>, <th>ARTICLE</th>, <th>BLOG</th>]