Prerequisite: Implementing Web Scraping in Python with BeautifulSoup
In this article, we are going to see how to extract CSS from an HTML document or URL using python.
Module Needed:
- bs4: Beautiful Soup(bs4) is a Python library for pulling data out of HTML and XML files. This module does not come built-in with Python. To install this type the below command in the terminal.
pip install bs4
- requests: Requests allows you to send HTTP/1.1 requests extremely easily. This module also does not comes built-in with Python. To install this type the below command in the terminal.
pip install requests
Approach:
- Import module
- Create an HTML document and specify the CSS tag into the code
- Pass the HTML document into the Beautifulsoup() function
- Now traverse the tag with the select() method.
Implementation:
Python3
# import module from bs4 import BeautifulSoup # Html doc html_doc = """ <html> <head> <title>Geeks</title> </head> <body> <h2>paragraphs</h2> <p>Welcome Lazyroar.</p> <p>Hello Lazyroar.</p> <a class="example" href="www.neveropen.com" id="dsx_23">java</a> <a class="example" href="www.neveropen.com/python" id="sdcsdsdf">python</a> </body> </html> """ soup = BeautifulSoup(html_doc, "lxml" ) # traverse CSS from soup print ( "display by CSS class:" ) print (soup.select( ".example" )) |
Output:
display by CSS class: [<a class="example" href="www.neveropen.com" id="dsx_23">java</a>, <a class="example" href="www.neveropen.com/python" id="sdcsdsdf">python</a>]
Now let’s get the CSS tag with URL:
Python3
# import module from bs4 import BeautifulSoup import requests # link for extract html data # Making a GET request def getdata(url): r = requests.get(url) return r.text soup = BeautifulSoup(html_doc, "lxml" ) # traverse CSS from soup print ( "\nTags by CSS class:" ) print (soup.select( ".header-main__wrapper" )) |
Output: