Tuesday, December 24, 2024
Google search engine
HomeLanguagesSlicing, Indexing, Manipulating and Cleaning Pandas Dataframe

Slicing, Indexing, Manipulating and Cleaning Pandas Dataframe

With the help of Pandas, we can perform many functions on data set like Slicing, Indexing, Manipulating, and Cleaning Data frame. 

Case 1: Slicing Pandas Data frame using DataFrame.iloc[]

Example 1: Slicing Rows 

Python3




# importing pandas library
import pandas as pd
 
# Initializing the nested list with Data set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villers', 38, 74, 3428000],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', 40, 100, 4528000],
               ['J.Root', 33, 72, 7028000],
               ['K.Peterson', 42, 85, 2528000]]
 
# creating a pandas dataframe
df = pd.DataFrame(player_list, columns=['Name', 'Age', 'Weight', 'Salary'])
 
# data frame before slicing
df


Output:

Python3




# Slicing rows in data frame
df1 = df.iloc[0:4]
 
# data frame after slicing
df1


Output:

In the above example, we sliced the rows from the data frame.

Example 2: Slicing Columns 

Python3




# importing pandas library
import pandas as pd
 
# Initializing the nested list with Data set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villers', 38, 74, 3428000],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', 40, 100, 4528000],
               ['J.Root', 33, 72, 7028000],
               ['K.Peterson', 42, 85, 2528000]]
 
# creating a pandas dataframe
df = pd.DataFrame(player_list, columns=['Name', 'Age', 'Weight', 'Salary'])
 
# data frame before slicing
df


Output:

Python3




# Slicing columnss in data frame
df1 = df.iloc[:, 0:2]
 
# data frame after slicing
df1


Output:

In the above example, we sliced the columns from the data frame.

Case 2: Indexing Pandas Data frame 

Python3




# importing pandas library
import pandas as pd
 
# Initializing the nested list with Data set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villers', 38, 74, 3428000],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', 40, 100, 4528000],
               ['J.Root', 33, 72, 7028000],
               ['K.Peterson', 42, 85, 2528000]]
 
# creating a pandas dataframe and indexing it using Aplhabets
df = pd.DataFrame(player_list, columns=['Name', 'Age', 'Weight', 'Salary'],
                  index=['A', 'B', 'C', 'D', 'E', 'F', 'G'])
 
 
# Displaying data frame
df


Output:

In the above example, we do indexing of the data frame.

Case 3: Manipulating Pandas Data frame

Manipulation of the data frame can be done in multiple ways like applying functions, changing a data type of columns, splitting, adding rows and columns to a data frame, etc.

Example 1: Applying lambda function to a column using Dataframe.assign()

Python3




# importing pandas library
import pandas as pd
 
# creating and initializing a list
values = [['Rohan', 455], ['Elvish', 250], ['Deepak', 495],
          ['Sai', 400], ['Radha', 350], ['Vansh', 450]]
 
# creating a pandas dataframe
df = pd.DataFrame(values, columns=['Name', 'Univ_Marks'])
 
# Applying lambda function to find percentage of
# 'Univ_Marks' column using df.assign()
df = df.assign(Percentage=lambda x: (x['Univ_Marks'] / 500 * 100))
 
# displaying the data frame
df


Output:

In the above example, the lambda function is applied to the ‘Univ_Marks’ column and a new column ‘Percentage’ is formed with the help of it.

Another Method:

Another way of obtaining the above output is by creating a new column by using the [ ] brackets with the new column name at the left side of the assignment and operation to be performed on right side of the assignment.

Note: This method is used to create a new column derived from existing column(s) of the data frame.

Python3




# importing pandas library
import pandas as pd
 
# creating and initializing a list
values = [['Rohan', 455], ['Elvish', 250], ['Deepak', 495],
          ['Sai', 400], ['Radha', 350], ['Vansh', 450]]
 
# creating a pandas dataframe
df = pd.DataFrame(values, columns=['Name', 'Univ_Marks'])
 
# displaying the data frame
 
print('Data frame before calculating percentage\n')
print(df)
print('\nData frame with Percentage Column\n')
 
# Creating new column Percentage derived from Univ_Marks
df["Percentage"] = df["Univ_Marks"]/500*100
 
# displaying the data frame
print(df)


Output:

 

Example 2: Sorting the Data frame in Ascending order

Python3




# importing pandas library
import pandas as pd
 
# Initializing the nested list with Data set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villers', 38, 74, 3428000],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', 40, 100, 4528000],
               ['J.Root', 33, 72, 7028000],
               ['K.Peterson', 42, 85, 2528000]]
 
# creating a pandas dataframe
df = pd.DataFrame(player_list, columns=['Name', 'Age', 'Weight', 'Salary'])
 
# Sorting by column 'Weight'
df.sort_values(by=['Weight'])


Output:

In the above example, we sort the data frame by column ‘Weight”. 

Case 4: Cleaning Pandas Data frame 

Python3




# importing pandas and Numpy libraries
import pandas as pd
import numpy as np
 
# Initializing the nested list with Data set
player_list = [['M.S.Dhoni', 36, 75, 5428000],
               ['A.B.D Villers', np.nan, 74, np.nan],
               ['V.Kholi', 31, 70, 8428000],
               ['S.Smith', 34, 80, 4428000],
               ['C.Gayle', np.nan, 100, np.nan],
               [np.nan, 33, np.nan, 7028000],
               ['K.Peterson', 42, 85, 2528000]]
 
# creating a pandas dataframe
df = pd.DataFrame(player_list, columns=['Name', 'Age', 'Weight', 'Salary'])
 
df


Output:

Python3




# Checking for missing values
df.isnull().sum()


Output:

Python3




# dropping or cleaning the missing data
df = df.dropna()
df


Output:

In the above example, we clean all the missing values from the data set. 

RELATED ARTICLES

Most Popular

Recent Comments