CSV file is a Comma Separated Value file that uses a comma to separate values. CSV file is a useful thing in today’s world when we are talking about machine learning, data handling, and data visualization. In this article, we will discuss how to convert an HTML table into a CSV file.
Converting HTML Table into CSV file in Python
Example: Suppose HTML file looks like, HTML table that can be converted to a CSV file using the BeautifulSoup and Pandas module of Python. These modules do not come built-in with Python. To install them type the below command in the terminal.
pip install beautifulsoup4 pip install pandas
Python3 Code for converting the HTML table into a CSV file
Python3
# Importing the required modules import os import sys import pandas as pd from bs4 import BeautifulSoup path = 'html.html' # empty list data = [] # for getting the header from # the HTML file list_header = [] soup = BeautifulSoup( open (path), 'html.parser' ) header = soup.find_all("table")[ 0 ].find("tr") for items in header: try : list_header.append(items.get_text()) except : continue # for getting the data HTML_data = soup.find_all("table")[ 0 ].find_all("tr")[ 1 :] for element in HTML_data: sub_data = [] for sub_element in element: try : sub_data.append(sub_element.get_text()) except : continue data.append(sub_data) # Storing the data into Pandas # DataFrame dataFrame = pd.DataFrame(data = data, columns = list_header) # Converting Pandas DataFrame # into CSV file dataFrame.to_csv( 'Geeks.csv' ) |
Output: