Wednesday, September 3, 2025
HomeLanguagesScrape Tables From any website using Python

Scrape Tables From any website using Python

Scraping is a very essential skill for everyone to get data from any website. Scraping and parsing a table can be very tedious work if we use standard Beautiful soup parser to do so. Therefore, here we will be describing a library with the help of which any table can be scraped from any website easily. With this method you don’t even have to inspect element of a website, you only have to provide the URL of the website. That’s it and the work will be done within seconds.

Installation

You can use pip to install this library:

pip install html-table-parser-python3

Getting Started

Step 1: Import the necessary libraries required for the task

# Library for opening url and creating 
# requests
import urllib.request

# pretty-print python data structures
from pprint import pprint

# for parsing all the tables present 
# on the website
from html_table_parser.parser import HTMLTableParser

# for converting the parsed data in a
# pandas dataframe
import pandas as pd

Step 2 : Defining a function to get contents of the website

# Opens a website and read its
# binary contents (HTTP Response Body)
def url_get_contents(url):

    # Opens a website and read its
    # binary contents (HTTP Response Body)

    #making request to the website
    req = urllib.request.Request(url=url)
    f = urllib.request.urlopen(req)

    #reading contents of the website
    return f.read()

Now, our function is ready so we have to specify the url of the website from which we need to parse tables.

Note: Here we will be taking the example of moneycontrol.com website since it has many tables and will give you a better understanding. You can view the website here . 

Step 3 : Parsing tables

# defining the html contents of a URL.
xhtml = url_get_contents('Link').decode('utf-8')

# Defining the HTMLTableParser object
p = HTMLTableParser()

# feeding the html contents in the
# HTMLTableParser object
p.feed(xhtml)

# Now finally obtaining the data of
# the table required
pprint(p.tables[1])

Each row of the table is stored in an array. This can be converted into a pandas dataframe easily and can be used to perform any analysis. 

Complete Code:

Python3




# Library for opening url and creating
# requests
import urllib.request
 
# pretty-print python data structures
from pprint import pprint
 
# for parsing all the tables present
# on the website
from html_table_parser.parser import HTMLTableParser
 
# for converting the parsed data in a
# pandas dataframe
import pandas as pd
 
 
# Opens a website and read its
# binary contents (HTTP Response Body)
def url_get_contents(url):
 
    # Opens a website and read its
    # binary contents (HTTP Response Body)
 
    #making request to the website
    req = urllib.request.Request(url=url)
    f = urllib.request.urlopen(req)
 
    #reading contents of the website
    return f.read()
 
# defining the html contents of a URL.
xhtml = url_get_contents('https://www.moneycontrol.com/india\
/stockpricequote/refineries/relianceindustries/RI').decode('utf-8')
 
# Defining the HTMLTableParser object
p = HTMLTableParser()
 
# feeding the html contents in the
# HTMLTableParser object
p.feed(xhtml)
 
# Now finally obtaining the data of
# the table required
pprint(p.tables[1])
 
# converting the parsed data to
# dataframe
print("\n\nPANDAS DATAFRAME\n")
print(pd.DataFrame(p.tables[1]))


Output:

 

 

RELATED ARTICLES

Most Popular

Dominic
32260 POSTS0 COMMENTS
Milvus
81 POSTS0 COMMENTS
Nango Kala
6625 POSTS0 COMMENTS
Nicole Veronica
11795 POSTS0 COMMENTS
Nokonwaba Nkukhwana
11854 POSTS0 COMMENTS
Shaida Kate Naidoo
6746 POSTS0 COMMENTS
Ted Musemwa
7023 POSTS0 COMMENTS
Thapelo Manthata
6694 POSTS0 COMMENTS
Umr Jansen
6714 POSTS0 COMMENTS