Thursday, December 26, 2024
Google search engine
HomeLanguagesHow to merge multiple excel files into a single files with Python...

How to merge multiple excel files into a single files with Python ?

Normally, we’re working with Excel files, and we surely have come across a scenario where we need to merge multiple Excel files into one. The traditional method has always been using a VBA code inside excel which does the job but is a multi-step process and is not so easy to understand. Another method is manually copying long Excel files into one which is not only time-consume, troublesome but also error-prone. 

This task can be done easily and quickly with few lines of code in Python with the Pandas module. First, we need to install the module with pip. So let’s get the installation out of our way. 

Use the following command in the terminal:

pip install pandas

Method 1: Using dataframe.append()

Pandas dataframe.append() function is used to append rows of other dataframe to the end of the given dataframe, returning a new dataframe object. Columns not in the original dataframes are added as new columns and the new cells are populated with NaN value.

Syntax : DataFrame.append(other, ignore_index=False, verify_integrity=False, sort=None)

Parameters :

  • other : DataFrame or Series/dict-like object, or list of these
  • ignore_index : If True, do not use the index labels. default False.
  • verify_integrity : If True, raise ValueError on creating index with duplicates. default False.
  • sort : Sort columns if the columns of self and other are not aligned. default False.

Returns: appended DataFrame

Example:

Excel Used: FoodSales1-1, FoodSales2-1
 

Python3




# importing the required modules
import glob
import pandas as pd
 
# specifying the path to csv files
path = "C:/downloads"
 
# csv files in the path
file_list = glob.glob(path + "/*.xlsx")
 
# list of excel files we want to merge.
# pd.read_excel(file_path) reads the excel
# data into pandas dataframe.
excl_list = []
 
for file in file_list:
    excl_list.append(pd.read_excel(file))
 
# create a new dataframe to store the
# merged excel file.
excl_merged = pd.DataFrame()
 
for excl_file in excl_list:
     
    # appends the data into the excl_merged
    # dataframe.
    excl_merged = excl_merged.append(
      excl_file, ignore_index=True)
 
# exports the dataframe into excel file with
# specified name.
excl_merged.to_excel('total_food_sales.xlsx', index=False)


Output :

‘total_food_sales.xlsx’

Method 2: Using pandas.concat()

The pandas.concat() function does all the heavy lifting of performing concatenation operations along with an axis of Pandas objects while performing optional set logic (union or intersection) of the indexes (if any) on the other axes.

Syntax: concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)

Parameters:

  • objs: Series or DataFrame objects
  • axis: axis to concatenate along; default = 0 //along rows
  • join: way to handle indexes on other axis; default = ‘outer’
  • ignore_index: if True, do not use the index values along the concatenation axis; default = False
  • keys: sequence to add an identifier to the result indexes; default = None
  • levels: specific levels (unique values) to use for constructing a MultiIndex; default = None
  • names: names for the levels in the resulting hierarchical index; default = None
  • verify_integrity: check whether the new concatenated axis contains duplicates; default = False
  • sort: sort non-concatenation axis if it is not already aligned when join is ‘outer’; default = False
  • copy: if False, do not copy data unnecessarily; default = True

Returns: a pandas dataframe with concatenated data.

Example:

In the last example, we worked on only two Excel files with a few rows. Let’s try merging more files each containing approximately 5000 rows and 7 columns. We have 5 files BankE, BankD, BankC, BankB, BankA having historical stock data for respective bank. Let’s merge them into a single ‘Bank_Stocks.xlsx’ file. Here we are using the pandas.concat() method.

Python3




# importing the required modules
import glob
import pandas as pd
 
# specifying the path to csv files
path = "C:/downloads"
 
# csv files in the path
file_list = glob.glob(path + "/*.xlsx")
 
# list of excel files we want to merge.
# pd.read_excel(file_path) reads the 
# excel data into pandas dataframe.
excl_list = []
 
for file in file_list:
    excl_list.append(pd.read_excel(file))
 
# concatenate all DataFrames in the list
# into a single DataFrame, returns new
# DataFrame.
excl_merged = pd.concat(excl_list, ignore_index=True)
 
# exports the dataframe into excel file
# with specified name.
excl_merged.to_excel('Bank_Stocks.xlsx', index=False)


Output :

Bank_Stocks.xlsx

RELATED ARTICLES

Most Popular

Recent Comments