Let us see how to extract data from Justdial using Selenium and Python. Justdial is a company that provides local search for different services in India over the phone, website and mobile apps. In this article we will be extracting the following data:
- Phone number
- Name
- Address
We can then save the data in a CSV file.
Approach:
- Import the following modules: webdriver from selenium, ChromeDriverManager, pandas, time and os.
- Use the driver.get() method and pass the link you want to get information from.
- Use the driver.find_elements_by_class_name() method and pass ‘store-details’.
- Instantiate empty lists to store the values.
- Iterate the StoreDetails and start fetching the individual details that are required.
- Create a user-defined function strings_to_number() to convert the extracted string to numbers.
- Display the details and save them as a CSV file according to the requirements.
Python3
# importing the modules from selenium import webdriver from webdriver_manager.chrome import ChromeDriverManager driver = webdriver.Chrome(ChromeDriverManager().install()) import pandas as pd import time import os # driver.get method() will navigate to a page given by the URL address # the user-defined function def strings_to_num(argument): switcher = { 'dc' : '+' , 'fe' : '(' , 'hg' : ')' , 'ba' : '-' , 'acb' : '0' , 'yz' : '1' , 'wx' : '2' , 'vu' : '3' , 'ts' : '4' , 'rq' : '5' , 'po' : '6' , 'nm' : '7' , 'lk' : '8' , 'ji' : '9' } return switcher.get(argument, "nothing" ) # fetching all the store details storeDetails = driver.find_elements_by_class_name( 'store-details' ) # instantiating empty lists nameList = [] addressList = [] numbersList = [] # iterating the storeDetails for i in range ( len (storeDetails)): # fetching the name, address and contact for each entry name = storeDetails[i].find_element_by_class_name( 'lng_cont_name' ).text address = storeDetails[i].find_element_by_class_name( 'cont_sw_addr' ).text contactList = storeDetails[i].find_elements_by_class_name( 'mobilesv' ) myList = [] for j in range ( len (contactList)): myString = contactList[j].get_attribute( 'class' ).split( "-" )[ 1 ] myList.append(strings_to_num(myString)) nameList.append(name) addressList.append(address) numbersList.append("".join(myList)) # initialize data of lists. data = { 'Company Name' : nameList, 'Address' : addressList, 'Phone' : numbersList} # Create DataFrame df = pd.DataFrame(data) print (df) # Save Data as .csv df.to_csv( 'demo1.csv' , mode = 'a' , header = False ) |
Output: