Friday, December 27, 2024
Google search engine
HomeLanguagesGetting all CSV files from a directory using Python

Getting all CSV files from a directory using Python

Python provides many inbuilt packages and modules to work with CSV files in the workspace. The CSV files can be accessed within a system’s directories and subdirectories and modified or edited. The CSV file contents can both be printed on the shell, or it can be saved in the form of the dataframe and played with later.

In this article, we will see how to iterate through n number of CSV files contained within a directory (folder/ path) consisting of different types of files, and to work with the contents of these files. So, we apply two different methods to our task. 

Input Directory for both the approaches :

CSV Used:

CSV 1:

CSV 2:

CSV 3:

Method 1: Using Glob module

  • Initially, the path of the source directory is specified, in this case, the folder “csvfoldergfg” using path variable.
path = "csvfoldergfg"
  • In order to locate all CSV files, whose names may be unknown, the glob module is invoked and its glob method is called. It is supplied with the path using glob.glob(path). This returns all the CSV files’ list located within the path. The regex used is equivalent to *.csv, which matches all files for an extension .csv.
glob.glob(path, '*.csv')
  • An iteration is then performed over these files using the for loop and the content is read into a data frame, using the read_csv() method of the pandas library. The information fetched using this method can then be manipulated.
read_csv(file_contents)

The following code was executed on the local machine, where both the script and the directory whose path is specified are stored in the same working directory:

Python3




# importing the required modules
import glob
import pandas as pd
  
# specifying the path to csv files
path = "csvfoldergfg"
  
# csv files in the path
files = glob.glob(path + "/*.csv")
  
# defining an empty list to store 
# content
data_frame = pd.DataFrame()
content = []
  
# checking all the csv files in the 
# specified path
for filename in files:
    
    # reading content of csv file
    # content.append(filename)
    df = pd.read_csv(filename, index_col=None)
    content.append(df)
  
# converting content to data frame
data_frame = pd.concat(content)
print(data_frame)


Output:

Method 2: Using OS module

  • Initially the path of the source directory is specified, in this case, the folder “csvfoldergfg” using dir_name string variable.
dir_name = "csvfoldergfg"
  • In order to locate all the files, whose names may be unknown, the os module is invoked, and its listdir() method is called. It is supplied with the path using os.listdir(path). This returns all the files’ list located within the path.
os.listdir(dir_name)
  • An iteration is then performed over these files using the for loop and the content is read into a data frame, using the read_csv() method of the pandas library. The information fetched using this method can then be manipulated.
read_csv(file_contents)

The following code is executed on the local machine :

Python3




# importing the required packaged 
# in python
import pandas as pd
import os
dir_name = "csvfoldergfg"
  
# specifying an empty list for content
content = []
for file in os.listdir(dir_name):
      
    # reading content into data frame
    df = pd.read_csv(file)
    df_list.append(df)
  
final_content = df.append(df for df in df_list)
print(final_content)


Output:

RELATED ARTICLES

Most Popular

Recent Comments