In this article, we are going to Find the length of the text of the first given tag using BeautifulSoup.
Let us see a sample example. Using ‘html.parser’ it is parsed and the tag value ‘h2’ length is calculated in the below code soup = BeautifulSoup(html_doc, ‘html.parser’) specifies that entire given HTML document is parsed using html.parser. The soup.find(‘h2’).text method takes any of the valid HTML tags that are present inside the given document and searches for it. If the tags are present, it will get the next set of operations to get done. In case if the specified tag is not present, it will throw “Attribute Error”
Here in the example, we care calculating length, hence used len() function. The len() function returns the number of items in an object and in the case of a string, it returns the number of characters enclosed in that string.
Example 1:
In this example, as we have tried to get a text value present inside “h2”, it is just calculating the number of characters enclosed in that string.
Python3
# import module from bs4 import BeautifulSoup # assign HTML document html_doc = """ <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <title>An example of HTML page to find the length of the first tag</title> </head> <body> <h2>An example of HTML page to find the length of the first tag</h2> <p> Beautiful Soup is a library which is essential to scrape information from web pages. It helps to iterate, search and modifying the parse tree.</p> </body> </html> """ # create beautiful soup object soup = BeautifulSoup(html_doc, 'html.parser' ) # get length print ( "Length of the text of the first <h2> tag:" ) print ( len (soup.find( 'h2' ).text)) |
Output:
Length of the text of the first <h2> tag: 59
The soup.find().text statement retrieves the text enclosed between a particular tag. Then the len() function returns the length of the text.
Example 2 :
Get the length of all HTML tags present inside the given HTML.
Python3
# import module from bs4 import BeautifulSoup # assign html document html_doc = """ <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <title>An example of HTML page to find the length of the first tag</title> </head> <body> <h2>An example of HTML page to find the length of the first tag</h2> <p> Beautiful Soup is a library which is essential to scrape information from web pages. It helps to iterate, search and modifying the parse tree.</p> </body> </html> """ # create beautiful soup object soup = BeautifulSoup(html_doc, 'html.parser' ) # Get all the tags present in the html and # getting their length for tag in soup.findAll( True ): print (tag.name, " : " , len (soup.find(tag.name).text)) |
Output:
The findAll(True) method until there are tags, it will find them. The for tag in soup.findAll(True): statement iterates all the tags that are found out and, finally the statement print(tag.name, ” : “, len(soup.find(tag.name).text)) displays the tag one by one as well as its length.
If we explicitly want to get the first tag means, in the above code, we need to put a break statement after the print statement.
Python3
# get length of first tag only for tag in soup.findAll( True ): print (tag.name, " : " , len (soup.find(tag.name).text)) break |
Output:
html : 270
Example 3:
In this example, we will find the text length of a particular given tag from an HTML document.
Python3
# import module from bs4 import BeautifulSoup # assign HTML document html_doc = """ <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <title>An example of HTML page to find the length of the first tag</title> </head> <body> <h2>An example of HTML page to find the length of the first tag</h2> <p> Beautiful Soup is a library which is essential to scrape information from web pages. It helps to iterate, search and modifying the parse tree.</p> </body> </html> """ # create beautiful soup object soup = BeautifulSoup(html_doc, 'html.parser' ) # assign tag tag = "html" # get length print ( "Length of the text of" , tag, "tag is:" , len (soupResults.find(tag).text)) |
Output:
Length of the text of html tag is: 5062
Example 4:
Now let us see how to get a tag and their text lengths from a web page like monster. As we need to get data from this request URL, we need to include the requests module to achieve the same.
Python3
# import module from bs4 import BeautifulSoup import requests # assign URL monsterPage = requests.get(monsterPageURL) # create Beautiful Soup object soupResults = BeautifulSoup(monsterPage.content, 'html.parser' ) # assign tag tag = "title" # get length of the tags print ( "Length of the text of" ,tag, "tag is:" , len (soupResults.find(tag).text)) |
Output:
Length of the text of title tag is: 57