Thursday, July 4, 2024
HomeLanguagesPythonHow to Save Pandas Dataframe as gzip/zip File?

How to Save Pandas Dataframe as gzip/zip File?

Pandas is an open-source library that is built on top of NumPy library. It is a Python package that offers various data structures and operations for manipulating numerical data and time series. It is mainly popular for importing and analyzing data much easier. Pandas is fast and it has high-performance & productivity for users. 

Converting to zip/gzip file

The to_pickle() method in Pandas is used to pickle (serialize) the given object into the file. This method utilizes the syntax as given below :

Syntax:

DataFrame.to_pickle(self, path,
                   compression='infer',
                   protocol=4)

This method supports compressions like zip, gzip, bz2, and xz. In the given examples, you’ll see how to convert a DataFrame into zip, and gzip.

Example 1: Save Pandas Dataframe as zip File

Python3




# importing packages
import pandas as pd
  
# dictionary of data
dct = {'ID': {0: 23, 1: 43, 2: 12,
  
              3: 13, 4: 67},
  
       'Name': {0: 'Ajay', 1: 'Deep',
  
                2: 'Deepanshi', 3: 'Mira',
  
                4: 'Yash'},
  
       'Marks': {0: 89, 1: 97, 2: 45, 3: 78,
  
                 4: 56},
  
       'Grade': {0: 'B', 1: 'A', 2: 'F', 3: 'C',
  
                 4: 'E'}
       }
  
# forming dataframe and printing
data = pd.DataFrame(dct)
print(data)
  
# using to_pickle function to form file
# by default, compression type infers from the file extension in specified path.
# file will be created in the given path
data.to_pickle('file.zip')


Output:

 

Example 2: Save Pandas Dataframe as gzip File.

Python3




# importing packages
import pandas as pd
  
# dictionary of data
dct = {"C1": range(5), "C2": range(5, 10)}
  
# forming dataframe and printing
data = pd.DataFrame(dct)
print(data)
  
# using to_pickle function to form file
# we can also select compression type
# file will be created in the given path
data.to_pickle('file.gzip')


Output:

Reading zip/gzip file

In order to read the created files, you’ll need to use read_pickle() method. This method utilizes the syntax as given below:

pandas.read_pickle(filepath_or_buffer,  
               compression='infer')

Example 1: Reading zip file

Python3




# reading from the zip file
pd.read_pickle('file.zip')


Output:

 

Example 2: Reading gzip File.

Python3




# reading from gzip file
pd.read_pickle('file.gzip')


Output:

From the above two examples, we can see both of the compressed files can be read by the read_pickle() method without any changes except for the file extension.

Shaida Kate Naidoo
am passionate about learning the latest technologies available to developers in either a Front End or Back End capacity. I enjoy creating applications that are well designed and responsive, in addition to being user friendly. I thrive in fast paced environments. With a diverse educational and work experience background, I excel at collaborating with teams both local and international. A versatile developer with interests in Software Development and Software Engineering. I consider myself to be adaptable and a self motivated learner. I am interested in new programming technologies, and continuous self improvement.
RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments