In this article, we will discuss how to convert TSV files to JSON using Python.
Converting TSV to JSON
Here for demonstration purposes, we will use random flower data set stored in tsv file with few tables.
Python3
import json def tsv2json(input_file,output_file): arr = [] file = open (input_file, 'r' ) a = file .readline() # The first line consist of headings of the record # so we will store it in an array and move to # next line in input_file. titles = [t.strip() for t in a.split( '\t' )] for line in file : d = {} for t, f in zip (titles, line.split( '\t' )): # Convert each row into dictionary with keys as titles d[t] = f.strip() # we will use strip to remove '\n'. arr.append(d) # we will append all the individual dictionaires into list # and dump into file. with open (output_file, 'w' , encoding = 'utf-8' ) as output_file: output_file.write(json.dumps(arr, indent = 4 )) # Driver Code input_filename = 'flower.tsv' output_filename = 'flower.json' tsv2json(input_filename,output_filename) |
Explanation:
- We’ll begin by importing the json module, which comes pre-installed and does not require installation.
- Open the input tsv file and read only the first line, saving all words in the list as our data labels.
- We’ll need to create an empty list, let’s call it arr. This list will eventually hold all of the data in the format and will be written to a json file.
- Now we’ll read the lines in a loop and define an empty dictionary type variable inside the loop where we’ll zip together the tab-separated values in the line with the labels we retrieved from the first line.
- We’ll utilize Python’s .zip() function to zip labels with data.
- We’ll also use the .strip() function to strip the data before adding it to the dictionary because the last data in the row will have a line break, which we don’t want.
- We’ll append this dictionary to our list arr at the end of the loop block. Now, outside of the loop, we’ll write this list arr to a json file, and we’ll do so with the json.dumps() function, which converts a Python object to a JSON string.
Output:
Let’s look at another example. Here’s a sample hospital dataset containing patient ids, total visits, and Systolic Blood Pressure. The code will remain the same; all we need to do is update the input filename and the desired output file name; otherwise, it will overwrite if the file with the output filename already exists.
Python3
import json def tsv2json(input_file,output_file): arr = [] file = open (input_file, 'r' ) a = file .readline() # The first line consist of headings of the record # so we will store it in an array and move to # next line in input_file. titles = [t.strip() for t in a.split( '\t' )] for line in file : d = {} for t, f in zip (titles, line.split( '\t' )): # Convert each row into dictionary with keys as titles d[t] = f.strip() # we will use strip to remove '\n'. arr.append(d) # we will append all the individual dictionaires into list # and dump into file. with open (output_file, 'w' , encoding = 'utf-8' ) as output_file: output_file.write(json.dumps(arr, indent = 4 )) # Driver Code input_filename = 'hospital.tsv' output_filename = 'hospital.json' tsv2json(input_filename,output_filename) |
Output: