Saturday, December 28, 2024
Google search engine
HomeLanguagesHow to Merge all excel files in a folder using Python?

How to Merge all excel files in a folder using Python?

In this article, we will see how to combine all Excel files present in a folder into a single file.

Module used:

The python libraries used are:

  • Pandas: Pandas is a python library developed for a python programming language for manipulating data and analyzing the data. It is widely used in Data Science and Data analytics.
  • Glob: The glob module matches all the pathnames matching a specified pattern according to rules used by Unix Shell.

Excel files used:

Three Excel files will be used which will be combined into a single Excel file in a folder using python. The three Excel files are x1.xlsx, x2.xlsx, and x3.xlsx:

Stepwise Approach:

  • Firstly we have to import libraries and modules

Python3




# importing pandas libraries and 
# glob module
import pandas as pd
import glob


  • Setting the path of the folder where files are stored. This line of code will fetch the folder where the files are stored.

Python3




# path of the folder
path = r'test'


  • Displaying the names of files in the folder using Glob module. glob.glob( ) function will search for all the files in the given path with .xlsx extension. print(filenames) displays the names of all the files with xlsx extension.

Python3




# reading all the excel files
filenames = glob.glob(path + "\*.xlsx")
print('File names:', filenames)


  • Initializing Empty data frames. A Data Frame is a Table data structure in python for analyzing and manipulating the data. Here we have to initialize an empty data frame for storing the combined data in the three files

Python3




# Initializing empty data frame
finalexcelsheet = pd.DataFrame()


  • Iterating through all the files in the folder one by one. We have to iterate through each file using for loop. The pd.concat()  function will concatenate all the multiple sheets present in the excel files as in the case of the third excel file in this example and will store in a variable called df. finalexcelsheet.append( )  function will append the data present in df variable into finalexcelsheet one by one. Hence with this piece of code, you will be able to combine the Excel files with ease

Python3




# to iterate excel file one by one 
# inside the folder
for file in filenames:
  
    # combining multiple excel worksheets 
    # into single data frames
    df = pd.concat(pd.read_excel(file, sheet_name=None),
                   ignore_index=True, sort=False)
      
    # Appending excel files one by one
    finalexcelsheet = finalexcelsheet.append(
      df, ignore_index=True)


  • Displaying the combined data. To display the combined file just write print(finalexcelsheet).

Python3




# to print the combined data
print('Final Sheet:')
display(finalexcelsheet)


  • Insert the combined data into a new Excel file.

Python3




# save combined data
finalexcelsheet.to_excel(r'Final.xlsx',index=False)


Below is the complete python program based on the above approach:

Python3




#import modules
import pandas as pd
import glob
  
# path of the folder
path = r'test'
  
# reading all the excel files
filenames = glob.glob(path + "\*.xlsx")
print('File names:', filenames)
  
# initializing empty data frame
finalexcelsheet = pd.DataFrame()
  
# to iterate excel file one by one 
# inside the folder
for file in filenames:
  
    # combining multiple excel worksheets
    # into single data frames
    df = pd.concat(pd.read_excel(
      file, sheet_name=None), ignore_index=True, sort=False)
  
    # appending excel files one by one
    finalexcelsheet = finalexcelsheet.append(
      df, ignore_index=True)
  
# to print the combined data
print('Final Sheet:')
display(finalexcelsheet)
  
finalexcelsheet.to_excel(r'Final.xlsx', index=False)


Output:

Final Excel:

RELATED ARTICLES

Most Popular

Recent Comments