Colab (short for Collaboratory) is Google’s free platform which enables users to code in Python. It is a Jupyter Notebook-based cloud service, provided by Google. This platform allows us to train the Machine Learning models directly in the cloud and all for free. Google Colab does whatever your Jupyter Notebook does and a bit more, i.e. you can use GPU and TPU for free. Some of Google Colab’s advantages include quick installation and real-time sharing of Notebooks between users.
However, loading a CSV file requires writing some extra lines of codes. In this article, we will be discussing three different ways to load a CSV file and store it in a pandas dataframe. To get started, sign in to your Google Account, and then go to “https://colab.research.google.com” and click on “New Notebook”.
Ways to import CSV
Load data from local drive
To upload the file from the local drive write the following code in the cell and run it
Python3
from google.colab import files uploaded = files.upload() |
you will get a screen as,
Click on “choose files”, then select and download the CSV file from your local drive. Later write the following code snippet to import it into a pandas dataframe.
Python3
import pandas as pd import io df = pd.read_csv(io.BytesIO(uploaded[ 'file.csv' ])) print (df) |
Output:
From Github
It is the easiest way to upload a CSV file in Colab. For this go to the dataset in your GitHub repository, and then click on “View Raw”. Copy the link to the raw dataset and pass it as a parameter to the read_csv() in pandas to get the dataframe.
Python3
url = 'copied_raw_github_link' df = pd.read_csv(url) |
Output:
From your Google drive
We can import datasets that are uploaded on our google drive in two ways :
1. Using PyDrive
This is the most complex method for importing datasets among all. For this, we are first required to install the PyDrive library from the python installer(pip) and execute the following.
Python3
!pip install - U - q PyDrive from pydrive.auth import GoogleAuth from pydrive.drive import GoogleDrive from google.colab import auth from oauth2client.client import GoogleCredentials # Authenticate and create the PyDrive client. auth.authenticate_user() gauth = GoogleAuth() gauth.credentials = GoogleCredentials.get_application_default() drive = GoogleDrive(gauth) |
Output:
Click on the link prompted to get the authentication to allow Google to access your Drive. You will see a screen with “Google Cloud SDK wants to access your Google Account” at the top. After you allow permission, copy the given verification code and paste it into the box in Colab.
Now, go to the CSV file in your Drive and get the shareable link and store it in a string variable in Colab. Now, to get this file in the dataframe run the following code.
Python3
import pandas as pd # to get the id part of the file id = link.split( "/" )[ - 2 ] downloaded = drive.CreateFile({ 'id' : id }) downloaded.GetContentFile( 'xclara.csv' ) df = pd.read_csv( 'xclara.csv' ) print (df) |
Output:
2. Mounting the drive
This method is quite simple and clean than the above-mentioned method.
- Create a folder in your Google Drive.
- Upload the CSV file in this folder.
- Write the following code in your Colab Notebook :
from google.colab import drive drive.mount(‘/content/drive’)
Just like with the previous method, the commands will bring you to a Google Authentication step. Later complete the verification as we did in the last method. Now in the Notebook, at the top-left, there is a File menu and then click on Locate in Drive, and then find your data. Then copy the path of the CSV file in a variable in your notebook, and read the file using read_csv().
path = "copied path" df_bonus = pd.read_csv(path)
Now, to read the file run the following code.
Python3
import pandas as pd df = pd.read_csv( "file_path" ) print (df) |
Output: