Scrape Table from Website using Python – Selenium

28 July 2024

1

Selenium is the automation software testing tool that obtains the website, performs various actions, or obtains the data from the website. It was chiefly developed for easing the testing work by automating web applications. Nowadays, apart from being used for testing, it can also be used for making tedious work interesting. Do you know that with the help of Selenium, you can also extract data from the table on the website? The answer is Yes, we can easily scrap the table data from the website. What you need to do in order to scrape table data from the website is explained in this article.

Approach to be followed:

Let us consider the simple HTML program containing tables only to understand the approach of scraping the table from the website.

HTML

<!DOCTYPE html>
<html>
   <head>
      <title>Selenium Table</title>
   </head>
   <body>
      <table border="1">
        <thead>
         <tr>
            <th>Name</th>
            <th>Class</th>
         </tr>
        </thead>
        <tbody>
         <tr>
            <td>Vinayak</td>
            <td>12</td>
         </tr>
         <tr>
            <td>Ishita</td>
            <td>10</td>
         </tr>
        </tbody>
      </table>
   </body>
</html>

Browser Output:

Follow the below-given steps:

Once you have created the HTML file, you can follow the below steps and extract data from the table from the website on your own.

First, declare the web driver

driver=webdriver.Chrome(executable_path=”Declare the path where web driver is installed”)

Now, open the website from which you want to obtain table data

driver.get("Specify the path of the website")

Next, you need to find rows in the table

rows=1+len(driver.find_elements_by_xpath("Specify the altered path"))

Here, the altered xpath means that if xpath of the row 1 is /html/body/table/tbody/tr[1] then, altered xpath will be /html/body/table/tbody/tr What needs to be done here is to remove the index value of table row.

NOTE: Remember to add 1 to the row’s value for the table header as it was not included while calculating the table rows.

Further, find columns in the table

cols=len(driver.find_elements_by_xpath("Specify the altered path"))

Here, the altered xpath means that if xpath of the column showing output Vinayak is /html/body/table/tbody/tr[1]/td[1] then, altered xpath will be /html/body/table/tbody/tr/td What needs to be done here is to remove the index value of table row and table data.

Moreover, obtain data from each column of the table body

for r in range(2, rows+1):
     for p in range(1, cols+1):
           value = driver.find_element_by_xpath("Specify the altered path").text

Here, the altered xpath means that if xpath of the column showing output Vinayak is /html/body/table/tbody/tr[1]/td[1] then, altered xpath will be /html/body/table/tbody/tr[“+str(r)+”]/td[“+str(p)+”] What needs to be done here is to add the str(r) and str(p) for the index value of table row and table data respectively.

Finally, print data of the table

print(value, end='       ')  
   print()

How to scrape table data from the website in Selenium?

As we have now seen the approach to be followed to extract the table data while using the automation tool Selenium. Now, let’s see the complete example for the scraping table data from the website. We will use this website to extract its table data in the given below program.

Python

# Python program to scrape table from website
 
# import libraries selenium and time
from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep
 
# Create webdriver object
driver = webdriver.Chrome(
    executable_path="C:\selenium\chromedriver_win32\chromedriver.exe")
 
# Get the website
driver.get(
    "https://www.geeksforgeeks.org/find_element_by_link_text-driver-method-selenium-python/")
 
# Make Python sleep for some time
sleep(2)
 
# Obtain the number of rows in body
rows = 1+len(driver.find_elements(By.XPATH,
    "/html/body/div[3]/div[2]/div/div[1]/div/div/div/article/div[3]/div/table/tbody/tr"))
 
# Obtain the number of columns in table
cols = len(driver.find_elements(By.XPATH,
    "/html/body/div[3]/div[2]/div/div[1]/div/div/div/article/div[3]/div/table/tbody/tr[1]/td"))
 
# Print rows and columns
print(rows)
print(cols)
 
# Printing the table headers
print("Locators           "+"             Description")
 
# Printing the data of the table
for r in range(2, rows+1):
    for p in range(1, cols+1):
       
        # obtaining the text from each column of the table
        value = driver.find_element(By.XPATH,
            "/html/body/div[3]/div[2]/div/div[1]/div/div/div/article/div[3]/div/table/tbody/tr["+str(r)+"]/td["+str(p)+"]").text
        print(value, end='       ')
    print()

Further, run the python code using:

python run.py

Output:

Scrape Table from Website using Python – Selenium

Approach to be followed:

HTML

Browser Output:

Follow the below-given steps:

How to scrape table data from the website in Selenium?

Python

Browser Output:

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

How to Set Up a VPN on Any Device in 2024 (Full Guide) by Tim Mocan

The Pixel 9 Pro Fold proved you shouldn’t buy first-gen Google products

The latest One UI 7 beta hints at Samsung’s foldable plans for 2025

This might be our first glimpse at the OnePlus Open 2’s new design

Recent Comments

EDITOR PICKS

How to Set Up a VPN on Any Device in 2024 (Full Guide) by Tim Mocan

The Pixel 9 Pro Fold proved you shouldn’t buy first-gen Google products

The latest One UI 7 beta hints at Samsung’s foldable plans for 2025

POPULAR POSTS

How to Set Up a VPN on Any Device in 2024 (Full Guide) by Tim Mocan

The Pixel 9 Pro Fold proved you shouldn’t buy first-gen Google products

The latest One UI 7 beta hints at Samsung’s foldable plans for 2025

POPULAR CATEGORY

ABOUT US

FOLLOW US