Sunday, November 17, 2024
Google search engine
HomeLanguagesHow to add metadata to a DataFrame or Series with Pandas in...

How to add metadata to a DataFrame or Series with Pandas in Python?

Metadata, also known as data about the data. Metadata can give us data description, summary, storage in memory, and datatype of that particular data. We are going to display and create metadata.

Scenario:

  • We can get metadata simply by using info() command
  • We can add metadata to the existing data and can view the metadata of the created data.

Steps:

  • Create a data frame
  • View the metadata which is already existing
  • Create the metadata and view the metadata.

Here, we are going to create a data frame, and we can view and create metadata on the created data frame

View existing Metadata methods:

  • dataframe_name.info() – It will return the data types null values and memory usage in tabular format
  • dataframe_name.columns() – It will return an array which includes all the column names in the data frame
  • dataframe_name.describe() – It will give the descriptive statistics of the given numeric data frame column like mean, median, standard deviation etc.

Create Metadata

We can create the metadata for the particular data frame using dataframe.scale() and dataframe.offset() methods. They are used to represent the metadata.

Syntax:

dataframe_name.scale=value

dataframe_name.offset=value

Below are some examples which depict how to add metadata to a DataFrame or Series:

Example 1

Initially create and display a dataframe.

Python3




# import required modules
import pandas as pd
 
# initialise data of lists using dictionary
data = {'Name': ['Sravan', 'Deepak', 'Radha', 'Vani'],
        'College': ['vignan', 'vignan Lara', 'vignan', 'vignan'],
        'Department': ['CSE', 'IT', 'IT', 'CSE'],
        'Profession': ['Student', 'Assistant Professor',
                       'Programmer & ass. Proff',
                       'Programmer & Scholar'],
        'Age': [22, 32, 45, 37]
        }
 
# create dataframe
df = pd.DataFrame(data)
 
# print dataframe
df


Output:

Then check dataframe attributes and description.

Python3




# data information
df.info()
 
# data columns description
df.columns
 
# describing columns
df.describe()


Output:

Initialize offset and scale of the dataframe.

Python3




# initializing scale and offset
# for creating meta data
df.scale = 0.1
df.offset = 15
 
# display scale and offset
print('Scale:', df.scale)
print('Offset:', df.offset)


Output:

We are storing data in hdf5 file format, and then we will display the dataframe along with its stored metadata. 

Python3




# store in hdf5 file format
storedata = pd.HDFStore('college_data.hdf5')
 
# data
storedata.put('data_01', df)
 
# including metadata
metadata = {'scale': 0.1, 'offset': 15}
 
# getting attributes
storedata.get_storer('data_01').attrs.metadata = metadata
 
# closing the storedata
storedata.close()
 
# getting data
with pd.HDFStore('college_data.hdf5') as storedata:
    data = storedata['data_01']
    metadata = storedata.get_storer('data_01').attrs.metadata
 
# display data
print('\nDataframe:\n', data)
 
# display stored data
print('\nStored Data:\n', storedata)
 
# display metadata
print('\nMetadata:\n', metadata)


Output:

Example 2

Series data structure in pandas will not support info and all methods. So we directly create metadata and display.

Python3




# import required module
import pandas as pd
 
# initialise data of lists using dictionary.
data = {'Name': ['Sravan', 'Deepak', 'Radha', 'Vani'],
        'College': ['vignan', 'vignan Lara', 'vignan', 'vignan'],
        'Department': ['CSE', 'IT', 'IT', 'CSE'],
        'Profession': ['Student', 'Assistant Professor',
                       'Programmer & ass. Proff',
                       'Programmer & Scholar'],
        'Age': [22, 32, 45, 37]
        }
 
# Create series
ser = pd.Series(data)
 
# display data
ser


Output:

Now we will store the metadata and then display it.

Python3




# storing data in hdf5 file format
storedata = pd.HDFStore('college_data.hdf5')
 
# data
storedata.put('data_01', ser)
 
# mentioning scale and offset
metadata = {'scale': 0.1, 'offset': 15}
 
storedata.get_storer('data_01').attrs.metadata = metadata
 
# storing close
storedata.close()
 
# getting attributes
with pd.HDFStore('college_data.hdf5') as storedata:
    data = storedata['data_01']
    metadata = storedata.get_storer('data_01').attrs.metadata
 
# display data
print('\nData:\n', data)
 
# display stored data
print('\nStored Data:\n', storedata)
 
# display Metadata
print('\nMetadata:\n', metadata)


Output:

RELATED ARTICLES

Most Popular

Recent Comments