Friday, November 15, 2024
Google search engine
HomeLanguagesGet all HTML tags with BeautifulSoup

Get all HTML tags with BeautifulSoup

Web scraping is a process of using bots like software called web scrapers in extracting information from HTML or XML content. Beautiful Soup is one such library used for scraping data through python. Beautiful Soup parses through the HTML content of the web page and collects it to provide iteration, searching and modification features on it. To provide these functionalities it works with a parser that converts the content to a parse tree. Using a parser you are comfortable with It’s fairly easy to crawl through the web pages using BeautifulSoup.  

To get all the HTML tags of a web page using the BeautifulSoup library first import BeautifulSoup and requests library to make a GET request to the web page.

Step-by-step Approach:

  • Import required modules.

Python3




from bs4 import BeautifulSoup
import requests


  • After importing the library now assign a URL variable with the URL of the web page and make a GET request to fetch the raw HTML content:

Python3




# Assign URL
  
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text


  • Now parse the HTML content:

Python3




# Parse the html content using any parser 
soup = BeautifulSoup(html_content,"html.parser")


  • Now to get all the HTML tags of the web page run a loop for the .name attribute of the tag using the find_all() function:

Python3




[tag.name for tag in soup.find_all()]


Below is the complete program:

Python3




# Import modules
from bs4 import BeautifulSoup
import requests
  
# Assign URL
  
# Make a GET request to fetch the raw HTML content
html_content = requests.get(url).text
  
# Parse the html content using any parser
soup = BeautifulSoup(html_content, "html.parser")
  
# Display HTML tags
[tag.name for tag in soup.find_all()]


Output:

['html',
 'head',
 'meta',
 'meta',
 'meta',
 'link',
 'meta',
 'meta',
 'meta',
 'meta',
 'meta',
 'script',
 'script',
 'link',
 'title',
 'link',
 'link',
 'script',
 'script']

RELATED ARTICLES

Most Popular

Recent Comments