CSV files are the Comma Separated Files. To access data from the CSV file, we require a function read_csv() from Pandas that retrieves data in the form of the data frame.
Syntax of read_csv()
Here is the Pandas read CSV syntax with its parameters.
Syntax: pd.read_csv(filepath_or_buffer, sep=’ ,’ , header=’infer’, index_col=None, usecols=None, engine=None, skiprows=None, nrows=None)
Parameters:
- filepath_or_buffer: Location of the csv file. It accepts any string path or URL of the file.
- sep: It stands for separator, default is ‘, ‘.
- header: It accepts int, a list of int, row numbers to use as the column names, and the start of the data. If no names are passed, i.e., header=None, then, it will display the first column as 0, the second as 1, and so on.
- usecols: Retrieves only selected columns from the CSV file.
- nrows: Number of rows to be displayed from the dataset.
- index_col: If None, there are no index numbers displayed along with records.
- skiprows: Skips passed rows in the new data frame.
Read CSV File using Pandas read_csv
Before using this function, we must import the Pandas library, we will load the CSV file using Pandas.
PYTHON3
# Import pandas import pandas as pd # reading csv file df = pd.read_csv( "people.csv" ) print (df.head()) |
Output:
First Name Last Name Sex Email Date of birth Job Title
0 Shelby Terrell Male elijah57@example.net 1945-10-26 Games developer
1 Phillip Summers Female bethany14@example.com 1910-03-24 Phytotherapist
2 Kristine Travis Male bthompson@example.com 1992-07-02 Homeopath
3 Yesenia Martinez Male kaitlinkaiser@example.com 2017-08-03 Market researcher
4 Lori Todd Male buchananmanuel@example.net 1938-12-01 Veterinary surgeon
Using sep in read_csv()
In this example, we will take a CSV file and then add some special characters to see how the sep parameter works.
Python3
# sample = "totalbill_tip, sex:smoker, day_time, size # 16.99, 1.01:Female|No, Sun, Dinner, 2 # 10.34, 1.66, Male, No|Sun:Dinner, 3 # 21.01:3.5_Male, No:Sun, Dinner, 3 #23.68, 3.31, Male|No, Sun_Dinner, 2 # 24.59:3.61, Female_No, Sun, Dinner, 4 # 25.29, 4.71|Male, No:Sun, Dinner, 4" # Importing pandas library import pandas as pd # Load the data of csv df = pd.read_csv( 'sample.csv' , sep = '[:, |_]' , engine = 'python' ) # Print the Dataframe print (df) |
Output:
totalbill tip Unnamed: 2 sex smoker Unnamed: 5 day time Unnamed: 8 size
16.99 NaN 1.01 Female No NaN Sun NaN Dinner NaN 2
10.34 NaN 1.66 NaN Male NaN No Sun Dinner NaN 3
21.01 3.50 Male NaN No Sun NaN Dinner NaN 3.0 None
23.68 NaN 3.31 NaN Male No NaN Sun Dinner NaN 2
24.59 3.61 NaN Female No NaN Sun NaN Dinner NaN 2
25.29 NaN 4.71 Male NaN No Sun NaN Dinner NaN 4
Using usecols in read_csv()
Here, we are specifying only 3 columns,i.e.[“First Name”, “Sex”, “Email”] to load and we use the header 0 as its default header.
Python3
df = pd.read_csv( 'people.csv' , header = 0 , usecols = [ "First Name" , "Sex" , "Email" ]) # printing dataframe print (df.head()) |
Output:
First Name Sex Email
0 Shelby Male elijah57@example.net
1 Phillip Female bethany14@example.com
2 Kristine Male bthompson@example.com
3 Yesenia Male kaitlinkaiser@example.com
4 Lori Male buchananmanuel@example.net
Using index_col in read_csv()
Here, we use the “Sex” index first and then the “Job Title” index, we can simply reindex the header with index_col parameter.
Python3
df = pd.read_csv( 'people.csv' , header = 0 , index_col = [ "Sex" , "Job Title" ], usecols = [ "Sex" , "Job Title" , "Email" ]) print (df.head()) |
Output:
Sex Job Title
Male Games developer elijah57@example.net
Female Phytotherapist bethany14@example.com
Male Homeopath bthompson@example.com
Market researcher kaitlinkaiser@example.com
Veterinary surgeon buchananmanuel@example.net
Using nrows in read_csv()
Here, we just display only 5 rows using nrows parameter.
Python3
df = pd.read_csv( 'people.csv' , header = 0 , index_col = [ "Sex" , "Job Title" ], usecols = [ "Sex" , "Job Title" , "Email" ], nrows = 3 ) print (df) |
Output:
Sex Job Title
Male Games developer elijah57@example.net
Female Phytotherapist bethany14@example.com
Male Homeopath bthompson@example.com
Using skiprows in read_csv()
The skiprows help to skip some rows in CSV, i.e, here you will observe that the rows mentioned in skiprows have been skipped from the original dataset.
Python3
df = pd.read_csv( "people.csv" ) print ( "Previous Dataset: " ) print (df) # using skiprows df = pd.read_csv( "people.csv" , skiprows = [ 1 , 5 ]) print ( "Dataset After skipping rows: " ) print (df) |
Output:
Previous Dataset:
First Name Last Name Sex Email Date of birth Job Title
0 Shelby Terrell Male elijah57@example.net 1945-10-26 Games developer
1 Phillip Summers Female bethany14@example.com 1910-03-24 Phytotherapist
2 Kristine Travis Male bthompson@example.com 1992-07-02 Homeopath
3 Yesenia Martinez Male kaitlinkaiser@example.com 2017-08-03 Market researcher
4 Lori Todd Male buchananmanuel@example.net 1938-12-01 Veterinary surgeon
5 Erin Day Male tconner@example.org 2015-10-28 Management officer
6 Katherine Buck Female conniecowan@example.com 1989-01-22 Analyst
7 Ricardo Hinton Male wyattbishop@example.com 1924-03-26 Hydrogeologist
Dataset After skipping rows:
First Name Last Name Sex Email Date of birth Job Title
0 Shelby Terrell Male elijah57@example.net 1945-10-26 Games developer
1 Kristine Travis Male bthompson@example.com 1992-07-02 Homeopath
2 Yesenia Martinez Male kaitlinkaiser@example.com 2017-08-03 Market researcher
3 Lori Todd Male buchananmanuel@example.net 1938-12-01 Veterinary surgeon
4 Katherine Buck Female conniecowan@example.com 1989-01-22 Analyst
5 Ricardo Hinton Male wyattbishop@example.com 1924-03-26 Hydrogeologist