In this article, we are going to see how to find an HTML tag that contains certain text using BeautifulSoup.
Methods used:
Open( filename, mode ): It opens the given filename in that mode which we have passed.
find_all ( ): It finds all the pattern in the file which will match with the passed expression.
Here, in the given below code, we are finding a certain text mentioned as a pattern in the program, in various different tags. Now the code will provide all these tags which will have the text matched with the pattern.
Approach:
Here we first import the regular expressions and BeautifulSoup libraries. Then we open the HTML file using the open function which we want to parse. Then using the find_all function, we find a particular tag that we pass inside that function and also the text we want to have within the tag. If the passed tag has that certain text, then it is added to a list.
So all the tags having certain text are stored in a list and then the list is printed. If we get the empty list, then it means that there is no such tag having the text we were trying to check.
Below is the HTML file for demonstration:
HTML
<!DOCTYPE html> < html lang = "en" > < head > < meta charset = "UTF-8" > < meta http-equiv = "X-UA-Compatible" content = "IE=edge" > < meta name = "viewport" content = "width=device-width, initial-scale=1.0" > < title >GFG </ title > </ head > < body > < a href = "Dummy Check Text" >Geeks For Geeks</ a > < a href = "Dummywebsite.com" >Dummy Text</ a > < h1 >Hello</ h1 > < h1 >Python Program</ h1 > < span class = true >Geeks For Geeks</ span > < span class = false >Geeks For Geeks</ span > < li class = 1 >Python Program</ li > < li class = 2 >Python Code</ li > < table > < tr >GFG Website</ tr > </ table > </ body > </ html > |
Output:
Below is the implementation:
Python3
# Python program to find a HTML tag # that contains certain text Using BeautifulSoup # Importing library from bs4 import BeautifulSoup import re # Opening and reading the html file file = open ( "gfg.html" , "r" ) contents = file .read() soup = BeautifulSoup(contents, 'html.parser' ) # Finding a pattern(certain text) pattern = 'Geeks For Geeks' # Anchor tag text1 = soup.find_all( 'a' , text = pattern) print (text1) # Span tag text2 = soup.find_all( 'span' , text = pattern) print (text2) # Finding a pattern(certain text) pattern2 = 'Python Program' # Heading tag text3 = soup.find_all( 'h1' , text = pattern2) print (text3) # List tag text4 = soup.find_all( 'li' , text = pattern2) print (text4) # Finding a pattern(certain text) pattern3 = 'GFG Website' # Table(row) tag text5 = soup.find_all( 'tr' , text = pattern3) print (text5) |
Output:
[<a href=”https://www.geeksforgeeks.org/”>Geeks For Geeks</a>, <a href=”Dummy Check Text”>Geeks For Geeks</a>]
[<span class=”true”>Geeks For Geeks</span>, <span class=”false”>Geeks For Geeks</span>]
[<h1>Python Program</h1>]
[<li class=”1″>Python Program</li>]
[<tr>GFG Website</tr>]