Web scraping is commonly used to gather information from a webpage. Using this technique, we are able to extract a large amount of data and then save it. We can use this data at many places later according to our needs.
For Scraping data, we need to import a few modules. These modules did not come with the Python package so we need to install these.
Importing Libraries
For data Scraping, we need to import these Requests and BeautifulSoup Modules.
And we need to get the link to the website and product from which we want to scrape data. For this, open the Flipkart website in your browser and search the item for that you want to get the data and Copy the address of that page. Assign this address to a variable
Python3
import requests from bs4 import BeautifulSoup as bs |
Now we want the content of that webpage for that we need to make the request for that site to get the content. Requests module has a function get() which is used to make the request for the webpage.
Python3
url = requests.get(link) |
Now we will make soup of this URL that we get from requests for that we will use the BeautifulSoup module.
Soup simply parses the data, organize it, and removes unwanted data.
For making soup, use the below command.
Python3
soup = bs(url.text) |
Now we need to find the data that we need from the website
For that, open the weblink and follow these steps :
- After opening the website, click on the right-upmost corner
- Then click on More tools
- Click on Developer tools
- Or press ctrl+shift+I.
After opening this, you will get a screen in the following format.
Now we have to select the class that contains the price information. For that, click on this icon
After this, move the mouse cursor to the price and click on the price, by which we are able to find the class which contains the price. You can see in the image that Price is found in the div of class “_30jeq3” and this class is present in “_25o18c” class similarly we have to look for the most upper class which contain the price and then in that class go to the next class which count contains the price until we found the last class which contains the price.
For that in our code, we have a soup that contains all related data of the site. now we find the class that includes the price
Python3
elements = soup.find_all( "div" , class_ = "_1AtVbE col-12-12" ) |
Here class “_1AtVbE col-12-12” contains price details, so we fetch all these classes for every product visible on that site.
Now we have a list of all the elements and classes that the site has. So we choose the elements one by one.
Python3
for element in elements: if element: a = element.find( "div" , class_ = "_13oc-S _1t9ceu" ) |
Now in this class, another class, “_13oc-S _1t9ceu” contains the details for the price it selects a class and iterates it for all the products visible on that site. If this class exists, then we go to the next down class and print the text that the last class of price contains.
Python3
if a: b = a.find( "div" , class_ = "_1xHGtK _373qXS" ) if b: c = b.find( "div" , class_ = "_2B099V" ) if c: d = c.find( "div" , class_ = "_25b18c" ) if d: e = d.find( "div" , class_ = "_30jeq3" ) print (e.text) |
Output: