Thursday, December 26, 2024
Google search engine
HomeLanguagesWays to import CSV files in Google Colab

Ways to import CSV files in Google Colab

Colab (short for Collaboratory) is Google’s free platform which enables users to code in Python. It is a Jupyter Notebook-based cloud service, provided by Google. This platform allows us to train the Machine Learning models directly in the cloud and all for free. Google Colab does whatever your Jupyter Notebook does and a bit more, i.e. you can use GPU and TPU for free. Some of Google Colab’s advantages include quick installation and real-time sharing of Notebooks between users. 
However, loading a CSV file requires writing some extra lines of codes. In this article, we will be discussing three different ways to load a CSV file and store it in a pandas dataframe. To get started, sign in to your Google Account, and then go to “https://colab.research.google.com” and click on “New Notebook”
 

Ways to import CSV

Load data from local drive 

To upload the file from the local drive write the following code in the cell and run it
 

Python3




from google.colab import files
 
 
uploaded = files.upload()


 
 

you will get a screen as, 
 

 

 

Click on “choose files”, then select and download the CSV file from your local drive.  Later write the following code snippet to import it into a pandas dataframe.

 

Python3




import pandas as pd
import io
 
df = pd.read_csv(io.BytesIO(uploaded['file.csv']))
print(df)


 
 

Output:

 

From Github 

 

It is the easiest way to upload a CSV file in Colab. For this go to the dataset in your GitHub repository, and then click on “View Raw”. Copy the link to the raw dataset and pass it as a parameter to the read_csv() in pandas to get the dataframe. 
 

 

Python3




url = 'copied_raw_github_link'
df = pd.read_csv(url)


 
 

Output:

 

 

From your Google drive

 

We can import datasets that are uploaded on our google drive in two ways : 
1. Using PyDrive 
This is the most complex method for importing datasets among all. For this, we are first required to install the PyDrive library from the python installer(pip) and execute the following.

 

Python3




!pip install -U -q PyDrive
 
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
 
 
# Authenticate and create the PyDrive client.
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)


 
 

Output:

 

 

Click on the link prompted to get the authentication to allow Google to access your Drive. You will see a screen with “Google Cloud SDK wants to access your Google Account” at the top. After you allow permission, copy the given verification code and paste it into the box in Colab. 
Now, go to the CSV file in your Drive and get the shareable link and store it in a string variable in Colab. Now, to get this file in the dataframe run the following code.

 

Python3




 
import pandas as pd
 
# to get the id part of the file
id = link.split("/")[-2]
 
downloaded = drive.CreateFile({'id':id})
downloaded.GetContentFile('xclara.csv'
 
df = pd.read_csv('xclara.csv')
print(df)


 
 

Output:

 

 

2. Mounting the drive 
This method is quite simple and clean than the above-mentioned method. 

 

  • Create a folder in your Google Drive.
  • Upload the CSV file in this folder.
  • Write the following code in your Colab Notebook : 
     
from google.colab import drive

drive.mount(‘/content/drive’)

 

Just like with the previous method, the commands will bring you to a Google Authentication step. Later complete the verification as we did in the last method. Now in the Notebook, at the top-left, there is a File menu and then click on Locate in Drive, and then find your data. Then copy the path of the CSV file in a variable in your notebook, and read the file using read_csv(). 

 

path = "copied path"
df_bonus = pd.read_csv(path)

 

Now, to read the file run the following code.

 

Python3




import pandas as pd
 
df = pd.read_csv("file_path")
print(df)


 
 

Output:

 

 

RELATED ARTICLES

Most Popular

Recent Comments