Python provides many inbuilt packages and modules to work with CSV files in the workspace. The CSV files can be accessed within a system’s directories and subdirectories and modified or edited. The CSV file contents can both be printed on the shell, or it can be saved in the form of the dataframe and played with later.
In this article, we will see how to iterate through n number of CSV files contained within a directory (folder/ path) consisting of different types of files, and to work with the contents of these files. So, we apply two different methods to our task.
Input Directory for both the approaches :
CSV Used:
CSV 1:
CSV 2:
CSV 3:
Method 1: Using Glob module
- Initially, the path of the source directory is specified, in this case, the folder “csvfoldergfg” using path variable.
path = "csvfoldergfg"
- In order to locate all CSV files, whose names may be unknown, the glob module is invoked and its glob method is called. It is supplied with the path using glob.glob(path). This returns all the CSV files’ list located within the path. The regex used is equivalent to *.csv, which matches all files for an extension .csv.
glob.glob(path, '*.csv')
- An iteration is then performed over these files using the for loop and the content is read into a data frame, using the read_csv() method of the pandas library. The information fetched using this method can then be manipulated.
read_csv(file_contents)
The following code was executed on the local machine, where both the script and the directory whose path is specified are stored in the same working directory:
Python3
# importing the required modules import glob import pandas as pd # specifying the path to csv files path = "csvfoldergfg" # csv files in the path files = glob.glob(path + "/*.csv" ) # defining an empty list to store # content data_frame = pd.DataFrame() content = [] # checking all the csv files in the # specified path for filename in files: # reading content of csv file # content.append(filename) df = pd.read_csv(filename, index_col = None ) content.append(df) # converting content to data frame data_frame = pd.concat(content) print (data_frame) |
Output:
Method 2: Using OS module
- Initially the path of the source directory is specified, in this case, the folder “csvfoldergfg” using dir_name string variable.
dir_name = "csvfoldergfg"
- In order to locate all the files, whose names may be unknown, the os module is invoked, and its listdir() method is called. It is supplied with the path using os.listdir(path). This returns all the files’ list located within the path.
os.listdir(dir_name)
- An iteration is then performed over these files using the for loop and the content is read into a data frame, using the read_csv() method of the pandas library. The information fetched using this method can then be manipulated.
read_csv(file_contents)
The following code is executed on the local machine :
Python3
# importing the required packaged # in python import pandas as pd import os dir_name = "csvfoldergfg" # specifying an empty list for content content = [] for file in os.listdir(dir_name): # reading content into data frame df = pd.read_csv( file ) df_list.append(df) final_content = df.append(df for df in df_list) print (final_content) |
Output: