Sunday, November 17, 2024
Google search engine
HomeLanguagesExtract CSS tag from a given HTML using Python

Extract CSS tag from a given HTML using Python

Prerequisite: Implementing Web Scraping in Python with BeautifulSoup

In this article, we are going to see how to extract CSS from an HTML document or URL using python.

 Module Needed:

  • bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
  • requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
pip install requests

Approach:

  • Import module
  • Create an HTML document and specify the CSS tag into the code
  • Pass the HTML document into the Beautifulsoup() function
  • Now traverse the tag with the select() method.

Implementation:

Python3




# import module
from bs4 import BeautifulSoup
  
# Html doc
html_doc = """
<html>
<head>
<title>Geeks</title>
</head>
<body>
<h2>paragraphs</h2>
  
<p>Welcome Lazyroar.</p>
  
  
<p>Hello Lazyroar.</p>
  
<a class="example" href="www.neveropen.com" id="dsx_23">java</a>
<a class="example" href="www.neveropen.com/python"  id="sdcsdsdf">python</a>
</body>
</html>
"""
soup = BeautifulSoup(html_doc, "lxml")
  
# traverse CSS from soup
print("display by CSS class:")
print(soup.select(".example"))


Output:

display by CSS class:
[<a class="example" href="www.neveropen.com" id="dsx_23">java</a>, 
<a class="example" href="www.neveropen.com/python" id="sdcsdsdf">python</a>]

Now let’s get the CSS tag with URL:

Python3




# import module
from bs4 import BeautifulSoup
import requests
  
# link for extract html data
# Making a GET request 
      
def getdata(url):
    r=requests.get(url)
    return r.text
html_doc = getdata('https://www.geeksforgeeks.org/')
soup = BeautifulSoup(html_doc,"lxml")
  
# traverse CSS from soup
  
print("\nTags by CSS class:")
print(soup.select(".header-main__wrapper"))


Output:

Dominic Rubhabha-Wardslaus
Dominic Rubhabha-Wardslaushttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Recent Comments