Monday, January 27, 2025
Google search engine
HomeLanguagesPython | Pandas Dataframe.sample()

Python | Pandas Dataframe.sample()

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas sample() is used to generate a sample random row or column from the function caller data frame.

Syntax:

DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)

Parameters:

n: int value, Number of random rows to generate.
frac: Float value, Returns (float value * length of data frame values ). frac cannot be used with n.
replace: Boolean value, return sample with replacement if True.
random_state: int value or numpy.random.RandomState, optional. if set to a particular integer, will return same rows as sample in every iteration.
axis: 0 or ‘row’ for Rows and 1 or ‘column’ for Columns.

Return type: New object of same type as caller.

To download the CSV file used, Click Here.

Example #1: Random row from Data frame

In this example, two random rows are generated by the .sample() method and compared later.




# importing pandas package
import pandas as pd
  
# making data frame from csv file 
data = pd.read_csv("employees.csv")
  
# generating one row 
row1 = data.sample(n = 1)
  
# display
row1
  
# generating another row
row2 = data.sample(n = 1)
  
# display
row2


Output:
As shown in the output image, the two random sample rows generated are different from each other.

 
Example #2: Generating 25% sample of data frame
In this example, 25% random sample data is generated out of the Data frame.




# importing pandas package
import pandas as pd
  
# making data frame from csv file 
data = pd.read_csv("employees.csv")
  
# generating one row 
rows = data.sample(frac =.25)
  
# checking if sample is 0.25 times data or not
  
if (0.25*(len(data))== len(rows)):
    print( "Cool")
    print(len(data), len(rows))
  
# display
rows


Output:
As shown in the output image, the length of sample generated is 25% of data frame. Also the sample is generated randomly.

RELATED ARTICLES

Most Popular

Recent Comments