Firstly, the pandas dataframe stores data in the form of a table. In some situations we need to retrieve data from dataframe according to some conditions. Such as if we want to get top N records of each group of the dataframe. We create the dataframe and use the methods mentioned below.
Get topmost N records within each group
Firstly, we created a pandas dataframe in Python:
Python3
#importing pandas as pd import pandas as pd #creating dataframe df = pd.DataFrame({ 'Variables' : [ 'A' , 'A' , 'A' , 'A' , 'B' , 'B' , 'B' , 'C' , 'C' , 'C' , 'C' ], 'Value' : [ 2 , 5 , 0 , 3 , 1 , 0 , 9 , 0 , 7 , 5 , 4 ]}) df |
Output:
Variables Value 0 A 2 1 A 5 2 A 0 3 A 3 4 B 1 5 B 0 6 B 9 7 C 0 8 C 7 9 C 5 10 C 4
Using Groupby() function of pandas to group the columns
Now, we will get topmost N values of each group of the ‘Variables’ column. Here reset_index() is used to provide a new index according to the grouping of data. And head() is used to get topmost N values from the top.
Example 1: Suppose the value of N=2
Python3
# setting value of N as 2 N = 2 # using groupby to group acc. to # column 'Variable' print (df.groupby( 'Variables' ).head(N).reset_index(drop = True )) |
Output:
Variables Value 0 A 2 1 A 5 2 B 1 3 B 0 4 C 0 5 C 7
Example 2: Now, suppose the value of N=4
Python3
# setting value of N as 2 N = 4 # using groupby to group acc. # to column 'Variable' print (df.groupby( 'Variables' ).head(N).reset_index(drop = True )) |
Output:
Variables Value 0 A 2 1 A 5 2 A 0 3 A 3 4 B 1 5 B 0 6 B 9 7 C 0 8 C 7 9 C 5 10 C 4
Using nlargest() function of pandas to group the columns
Now, we will get topmost N values of each group of the ‘Variables’ column. Here nlargest() function is used to get the n largest values in the specified column.
Python3
# importing pandas as pd import pandas as pd # creating dataframe df = pd.DataFrame({ 'Variables' : [ 'A' , 'A' , 'A' , 'A' , 'B' , 'B' , 'B' , 'C' , 'C' , 'C' , 'C' ], 'Value' : [ 2 , 5 , 0 , 3 , 1 , 0 , 9 , 0 , 7 , 5 , 4 ]}) #print(df) d = df.nlargest( 4 , 'Value' ) print (d) |
Output:
Variables Value 6 B 9 8 C 7 1 A 5 9 C 5