How to merge many TSV files by common key using Python Pandas?

26 July 2024

0

For data analysis the most important thing is data and we need to prepare it before we can use it for analysis. Sometimes required data can be scattered in multiple files and we need to merge them. In this article, we are going to merge multiple TSV (Tab Separated Values) files with a common key. This can be possible by using the merge method of the pandas Python library. This method allows us to combine files by using a common key.

Approach:

Import pandas library
Then read first two tsv files and merge them using pd.merge() function by setting the ‘on’ parameter to the common column present in both files. Then store the result in a new dataframe called ‘Output_df’.
Store remaining files in a list.
Run a loop that will iterate over these file names. Read these files one by one and merge them with ‘Output_df’ dataframe
Save ‘Output_df’ in tsv file

Example 1:

In this example, we will merge tsv files by using an inner join. We have taken four tsv files for this example as follows.

Used file: Customer.tsv, Account.tsv, Branch.tsv, Loan.tsv

Python3

# Import pandas library
import pandas as pd
 
# Read first two csv files with '\t' separator
tsv1 = pd.read_csv("Documents/Customer.tsv", sep='\t')
tsv2 = pd.read_csv("Documents/Account.tsv", sep='\t')
 
# store the result in Output_df dataframe.
# Here common column is 'ID' column
Output_df = pd.merge(tsv1, tsv2, on='ID',
                     how='inner')
 
# store remaining file names in list
tsv_files = ["Branch.tsv", "Loan.tsv"]
 
# One by one read tsv files and merge with
# 'Output_df' dataframe and again store
# the final result in Output_df
for i in tsv_files:
    path = "Documents/"+i
    tsv = pd.read_csv(path, sep='\t')
    Output_df = pd.merge(Output_df, tsv,
                         on='ID', how='inner')
 
# Now store the 'Output_df'
# in tsv file 'Output.tsv'
Output_df.to_csv("Documents/Output.tsv", 
                 sep="\t", header=True, 
                 index=False)

Output:

Output.tsv

Example 2:

In this example, we will merge tsv files by using an outer join. We have taken four tsv files for this example as follows.

Used file: Course.tsv, Teacher.tsv, Credits.tsv, Marks.tsv

Python3

# Import pandas library
import pandas as pd
 
# Read first two csv files with '\t' separator
tsv3 = pd.read_csv("Documents/Course.tsv", sep='\t')
tsv4 = pd.read_csv("Documents/Teacher.tsv", sep='\t')
 
# store the result in Output_df dataframe.
# Here common column is 'Course_ID' column
Output_df2 = pd.merge(tsv3, tsv4, on='Course_ID', how='outer')
 
# store remaining file names in list
tsv_files = ["Credits.tsv", "Marks.tsv"]
 
# One by one read tsv files and merge with
# 'Output_df2' dataframe and again store
# the final result in 'Output_df2'
for i in tsv_files:
    path = "Documents/"+i
    tsv = pd.read_csv(path, sep='\t')
    Output_df2 = pd.merge(Output_df2, tsv,
                          on='Course_ID', how='outer')
 
 
# Now store the 'Output_df2' in tsv file 'Output_outer.tsv'
# Here we replacing nan values with NA
Output_df2.to_csv("Documents/Output_outer.tsv", sep="\t",
                  header=True, index=False, na_rep="NA")

Output:

How to merge many TSV files by common key using Python Pandas?

Approach:

Example 1:

Python3

Example 2:

Python3

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

OnePlus’ decision to ditch Samsung’s OLED screens could backfire in the US

Recent Comments

EDITOR PICKS

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

POPULAR POSTS

Interview With Bill Reed – CEO at RemotelyMe by Shauli Zacks

Samsung’s Galaxy S24 FE plummets to the price it should have been at launch

Samsung’s new periscope camera fits telephoto lenses into an even slimmer design

POPULAR CATEGORY

ABOUT US

FOLLOW US