Find the length of the text of the first given tag using BeautifulSoup

26 July 2024

1

In this article, we are going to Find the length of the text of the first given tag using BeautifulSoup.

Let us see a sample example. Using ‘html.parser’ it is parsed and the tag value ‘h2’ length is calculated in the below code soup = BeautifulSoup(html_doc, ‘html.parser’) specifies that entire given HTML document is parsed using html.parser. The soup.find(‘h2’).text method takes any of the valid HTML tags that are present inside the given document and searches for it. If the tags are present, it will get the next set of operations to get done. In case if the specified tag is not present, it will throw “Attribute Error”

Here in the example, we care calculating length, hence used len() function. The len() function returns the number of items in an object and in the case of a string, it returns the number of characters enclosed in that string.

Example 1:

In this example, as we have tried to get a text value present inside “h2”, it is just calculating the number of characters enclosed in that string.

Python3

# import module
from bs4 import BeautifulSoup
 
# assign HTML document
html_doc = """
<html>
 
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
<title>An example of HTML page to find the length of
the first tag</title>
</head>
 
<body>
<h2>An example of HTML page to find the length of the
first tag</h2>
 
 
 
<p>
Beautiful Soup is a library which is essential to scrape
information from web pages.
It helps to iterate, search and modifying the parse tree.</p>
 
 
</body>
</html>
"""
 
# create beautiful soup object
soup = BeautifulSoup(html_doc, 'html.parser')
 
# get length
print("Length of the text of the first <h2> tag:")
print(len(soup.find('h2').text))

Output:

Length of the text of the first <h2> tag:
59

The soup.find().text statement retrieves the text enclosed between a particular tag. Then the len() function returns the length of the text.

Example 2 :

Get the length of all HTML tags present inside the given HTML.

Python3

# import module
from bs4 import BeautifulSoup
 
# assign html document
html_doc = """
<html>
 
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
<title>An example of HTML page to find the length of
the first tag</title>
</head>
 
<body>
<h2>An example of HTML page to find the length of the 
first tag</h2>
 
 
<p>
Beautiful Soup is a library which is essential to scrape
information from web pages.
It helps to iterate, search and modifying the parse tree.</p>
 
 
</body>
</html>
"""
 
# create beautiful soup object 
soup = BeautifulSoup(html_doc, 'html.parser')
 
# Get all the tags present in the html and 
# getting their length
for tag in soup.findAll(True):
    print(tag.name, " : ", len(soup.find(tag.name).text))

Output:

The findAll(True) method until there are tags, it will find them. The for tag in soup.findAll(True): statement iterates all the tags that are found out and, finally the statement print(tag.name, ” : “, len(soup.find(tag.name).text)) displays the tag one by one as well as its length.

If we explicitly want to get the first tag means, in the above code, we need to put a break statement after the print statement.

Python3

# get length of first tag only
for tag in soup.findAll(True):
    print(tag.name, " : ", len(soup.find(tag.name).text))
    break

Output:

html  :  270

Example 3:

In this example, we will find the text length of a particular given tag from an HTML document.

Python3

# import module
from bs4 import BeautifulSoup
 
# assign HTML document
html_doc = """
<html>
 
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=iso-8859-1">
<title>An example of HTML page to find the length of
the first tag</title>
</head>
 
<body>
<h2>An example of HTML page to find the length of the
first tag</h2>
 
 
<p>
Beautiful Soup is a library which is essential to scrape 
information from web pages.
It helps to iterate, search and modifying the parse tree.</p>
 
 
</body>
</html>
"""
 
# create beautiful soup object
soup = BeautifulSoup(html_doc, 'html.parser')
 
# assign tag
tag = "html"
 
# get length
print("Length of the text of", tag, "tag is:",
      len(soupResults.find(tag).text))

Output:

Length of the text of html tag is: 5062

Example 4:

Now let us see how to get a tag and their text lengths from a web page like monster. As we need to get data from this request URL, we need to include the requests module to achieve the same.

Python3

# import module
from bs4 import BeautifulSoup
import requests
 
# assign URL
monsterPageURL = 'https://www.geeksforgeeks.org/how-to-scrape-all-pdf-files-in-a-website/'
monsterPage = requests.get(monsterPageURL)
 
# create Beautiful Soup object
soupResults = BeautifulSoup(monsterPage.content, 'html.parser')
 
# assign tag
tag="title"
 
# get length of the tags
print("Length of the text of",tag,"tag is:",
        len(soupResults.find(tag).text))

Output:

Length of the text of title tag is: 57

Find the length of the text of the first given tag using BeautifulSoup

Python3

Python3

Python3

Python3

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Vietnam’s Success in Software Outsourcing

Install Python 3 / Python 2.7 on Rocky Linux 8 |AlmaLinux 8

How To Manage Angular JS Projects using Angular CLI

How To Install PHP 8.2 on Ubuntu 22.04|20.04|18.04

Recent Comments

EDITOR PICKS

Vietnam’s Success in Software Outsourcing

Install Python 3 / Python 2.7 on Rocky Linux 8 |AlmaLinux 8

How To Manage Angular JS Projects using Angular CLI

POPULAR POSTS

Vietnam’s Success in Software Outsourcing

Install Python 3 / Python 2.7 on Rocky Linux 8 |AlmaLinux 8

How To Manage Angular JS Projects using Angular CLI

POPULAR CATEGORY

ABOUT US

FOLLOW US