Suppose you are working on a Data Science project and you tackle one of the most important tasks, i.e, Data Cleaning. After data cleaning, you don’t want to lose your cleaned data frame, so you want to save your cleaned data frame as a CSV. Let us see how to export a Pandas DataFrame to a CSV file.
Pandas enable us to do so with its inbuilt to_csv() function.
First, let’s create a sample data frame
Python3
# importing the module import pandas as pd # making the data scores = { 'Name' : [ 'a' , 'b' , 'c' , 'd' ], 'Score' : [ 90 , 80 , 95 , 20 ]} # creating the DataFrame df = pd.DataFrame(scores) # displaying the DataFrame print (df) |
Output :
Now let us export this DataFrame as a CSV file named your_name.csv :
Python3
# converting to CSV file df.to_csv( "your_name.csv" ) |
Output
In case you get a UnicodeEncodeError, just pass the encoding parameter with ‘utf-8’ value.
Python3
# converting to CSV file df.to_csv( "your_name.csv" , encoding = 'utf-8' ) |
Possible Customizations
1. Include index number
You can choose if you want to add automatic index. The default value is True. To set it to False.
Python3
# converting to CSV file df.to_csv( 'your_name.csv' , index = False ) |
Output :
2. Export only selected columns
If you want to export only a few selected columns, you may pass it in to_csv() as ‘columns = [“col1”, “col2”]
Python3
# converting to CSV file df.to_csv( "your_name.csv" , columns = [ 'Name' ]) |
Output :
3. Export header
You can choose if you want your column names to be exported or not by setting the header parameter to True or False. The default value is True.
Python3
# converting to CSV file df.to_csv( 'your_name.csv' , header = False ) |
Output :
4. Handle NaN
In case your data frame has NaN values, you can choose it to replace by some other string. The default value is ”.
Python3
# converting to CSV file df.to_csv( "your_name.csv" , na_rep = 'nothing' ) |
5. Separate with something else
If instead of separating the values with a ‘comma’, we can separate it using custom values.
Python3
# converting to CSV file # separated with tabs df.to_csv( "your_name.csv" , sep = '\t' ) |
Output :