Python | Pandas Dataframe.sample()

28 July 2024

5

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas sample() is used to generate a sample random row or column from the function caller data frame.

Syntax:

DataFrame.sample(n=None, frac=None, replace=False, weights=None, random_state=None, axis=None)

Parameters:

n: int value, Number of random rows to generate.
frac: Float value, Returns (float value * length of data frame values ). frac cannot be used with n.
replace: Boolean value, return sample with replacement if True.
random_state: int value or numpy.random.RandomState, optional. if set to a particular integer, will return same rows as sample in every iteration.
axis: 0 or ‘row’ for Rows and 1 or ‘column’ for Columns.

Return type: New object of same type as caller.

To download the CSV file used, Click Here.

Example #1: Random row from Data frame

In this example, two random rows are generated by the .sample() method and compared later.

# importing pandas package
import pandas as pd
  
# making data frame from csv file 
data = pd.read_csv("employees.csv")
  
# generating one row 
row1 = data.sample(n = 1)
  
# display
row1
  
# generating another row
row2 = data.sample(n = 1)
  
# display
row2

Output:
As shown in the output image, the two random sample rows generated are different from each other.

Example #2: Generating 25% sample of data frame
In this example, 25% random sample data is generated out of the Data frame.

# importing pandas package
import pandas as pd
  
# making data frame from csv file 
data = pd.read_csv("employees.csv")
  
# generating one row 
rows = data.sample(frac =.25)
  
# checking if sample is 0.25 times data or not
  
if (0.25*(len(data))== len(rows)):
    print( "Cool")
    print(len(data), len(rows))
  
# display
rows

Output:
As shown in the output image, the length of sample generated is 25% of data frame. Also the sample is generated randomly.

Python | Pandas Dataframe.sample()

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

Interview With Willem Dewulf – CEO of ProBackup by Shauli Zacks

Recent Comments

EDITOR PICKS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR POSTS

Samsung offers free screen replacements for users still suffering green line issues

7 Best Free Antiviruses for Mac in 2024: Are They Any Good? by Katarina Glamoslija

Is Microsoft Teams Secure? Use Teams Safely in 2024 by Tyler Cross

POPULAR CATEGORY

ABOUT US

FOLLOW US