Thursday, December 26, 2024
Google search engine
HomeLanguagesHow to read multiple data files into Pandas?

How to read multiple data files into Pandas?

In this article, we are going to see how to read multiple data files into pandas, data files are of multiple types, here are a few ways to read multiple files by using the pandas package in python.

The demonstrative files can be download from here

Method 1: Reading CSV files

If our data files are in CSV format then the read_csv() method must be used. read_csv takes a file path as an argument. it reads the content of the CSV. To read multiple CSV files we can just use a simple for loop and iterate over all the files. 

Example: Reading Multiple CSV files using Pandas

In this example we make a list of our data files or file path and then iterate through the file paths using a for loop, a for loop is used to iterate through iterables like list, tuples, strings, etc. And then create a data frame using pd.DataFrame(), concatenate each dataframe into a main dataframe using pd.concat(), then convert the final main dataframe into a CSV file using to_csv() method which takes the name of the new CSV file we want to create as an argument.

Python3




# importing pandas
import pandas as pd
  
file_list=['a.csv','b.csv','c.csv']
  
main_dataframe = pd.DataFrame(pd.read_csv(file_list[0]))
  
for i in range(1,len(file_list)):
    data = pd.read_csv(file_list[i])
    df = pd.DataFrame(data)
    main_dataframe = pd.concat([main_dataframe,df],axis=1)
print(main_dataframe)


Output:

Method 2: Using the glob package

The glob module in python is used to retrieve files or pathnames matching a specified pattern. 

This program is similar to the above program but the only difference is instead of keeping track of file names using a list we use the glob package to retrieve files matching a specified pattern.

Example: Reading multiple CSV files using Pandas and glob.

Python3




# importing packages
import pandas as pd
import glob
  
folder_path = 'Path_of_file/csv_files'
file_list = glob.glob(folder_path + "/*.csv")
main_dataframe = pd.DataFrame(pd.read_csv(file_list[0]))
for i in range(1,len(file_list)):
    data = pd.read_csv(file_list[i])
    df = pd.DataFrame(data)
    main_dataframe = pd.concat([main_dataframe,df],axis=1)
print(main_dataframe)


Output:

Method 3: Reading text files using Pandas:

To read text files, the panda’s method read_table() must be used.

Example: Reading text file using pandas and glob.

Using glob package to retrieve files or pathnames and then iterate through the file paths using a for loop. Create a data frame of the contents of each file after reading it using pd.read_table() method which takes the file path as an argument. Concatenate each dataframe into a main dataframe using pd.concat(), then convert the final main dataframe into a CSV file using to_csv() method which takes the name of the new CSV file we want to create as an argument.

Python3




# importing packages
import pandas as pd
import glob
  
folder_path = 'Path_/files'
file_list = glob.glob(folder_path + "/*.txt")
main_dataframe = pd.DataFrame(pd.read_table(file_list[0]))
  
for i in range(1,len(file_list)):
    data = pd.read_table(file_list[i])
    df = pd.DataFrame(data)
    main_dataframe = pd.concat([main_dataframe, df], axis = 1)
  
print(main_dataframe)
  
# creating a new csv file with
# the dataframe we created
main_dataframe.to_csv('new_csv1.csv')


Output:

RELATED ARTICLES

Most Popular

Recent Comments