Saturday, November 16, 2024
Google search engine
HomeData Modelling & AI10 Colab Tips and Hacks for Efficient use of it

10 Colab Tips and Hacks for Efficient use of it

This article was published as a part of the Data Science Blogathon

Introduction

Colaboratory, or “Colab” for short, are Jupyter Notebooks hosted by Google that allow you to write and execute Python code through your browser. It is easy to use a Colab and linked with your Google account. Colab provides free access to GPUs and TPUs, requires zero configuration, and easy to share your code with the community.

Colab has a fascinating history. It is an internal tool for data analysis at Google. However, later it was released publically, and since then, many people have been using this tool to achieve their machine learning tasks. Most of the students and users who do not have a GPU utilize colab for the free resources to run their Data Science experiments.

This article collects some helpful tips and hacks that I use to make my work easy in Colab. I have attempted to list most of the sources where I read those first. These tips should help you get the most out of your Colab notebooks.

Using local runtimes

Typically, Colab affords you free GPU resources. However, if you already have your GPUs and want to use the Colab UI, there is a workaround. Users can use the Colab UI with a local runtime as follows:

Using local runtimes colab tips
Image By Author

 

You can use this method to run code on your local hardware and access your local file system without leaving the Colab notebook. The following documentation goes more profound into the way it works. Check this document for more details.

Scratchpad

If you are creating multiple notebooks with names like “untitled.ipynb” and “untitled1.ipynb” etc.? I think a few of us might sail in the same boat in this regard. Then the Cloud scratchpad notebook might be for you if that’s the case.

Scratchpad colab tips
Image By Author

 

The Cloud scratchpad is a specific notebook available at the URL — https://colab.research.google.com/notebooks/empty.ipynb that is not saved automatically to your drive account. It is helpful for experimentation or nontrivial work and doesn’t take space in Google drive.

 

Get Notified about completed cell executions

Users get notified after completion of executions even if they switch to another tab, window, or application. Users can enable it through Tools > Settings > Site > Show desktop notifications (and allow browser notifications once prompted) to check it out.

Here is how the notification appears even if you are on another tab.

Get Notified about completed cell executions colab tips
Image By Author

Here is a demo of notification that appears even if you navigate to another tab.

get colab tips 2

Image By Author

GitHub Jupyter Notebooks Open directly in Colab

Colab notebooks developed in such a way that they can easily integrate with Github. It means you can load and save Colab notebooks to GitHub directly. We have an easy way to do it, thanks to Seungjae Ryan Lee.

When you are on a notebook on Github which you want to view in Colab, replace Github with githubtocolab in the URL, leaving everything else unchanged.

GitHub Jupyter Notebooks Open directly in Colab

Image By Author

Kaggle Datasets into Google Colab

At the time of low budget and had consumed your GPU quota on Kaggle, this hack might be a break for you. One can download any dataset seamlessly from Kaggle onto your Colab workspace. Here is what you should do:

 Kaggle Datasets into Google Colab
Image By Author

 

After clicking the ‘Create New API Token’ tab, a kaggle.json file generated which consists of your API token. Create a folder Kaggle in your Google Drive and store the kaggle.json file in it

Create New API Token
Image By Author

 

Mount drive in colab notebook

Mount drive in colab notebook
Image By Author

 

Change the config path to ‘Kaggle.json’ and change the current working directory 

import os
os.environ['KAGGLE_CONFIG_DIR'] = "/content/drive/My Drive/Kaggle"
%cd /content/drive/MyDrive/Kaggle

Copy API of Dataset to be downloaded

Copy API of Dataset to be downloaded
Image By Author

The API is present under the ‘Data’ tab for datasets linked to competitions.

 

kaggle

Image By Author

At last, run any one of the following commands to download the dataset

!kaggle datasets download -d alexanderbader/forbes-billionaires-2021-30
!kaggle competitions download -c google-smartphone-decimeter-challenge
kaggle datasets

Image By Author

 

Search for your notebooks in drive

If you want to search for a specific Colab notebook in the drive? Go to the drive search box and add:

 application/vnd.google.colaboratory

It will list all your notebooks in your Google Drive. In Addition, you can also specify the title and ownership of the notebook. For example, if I need to search for a notebook created long back, having ‘Transfer’ in its title, the following helps to get it:

Search for your notebooks in drive
Image By Author

Data Table extension

Colab includes an extension for loading pandas dataframes into interactive displays that can be dynamically sorted, filtered, and examined. Type the below code in the notebook cell to enable the Data table display for Pandas dataframes.

%load_ext google.colab.data_table #To diable the display
%unload_ext google.colab.data_table

Here’s a quick demo of it:

 

Data Table extension

Image By Author

Comparing Notebooks

By using Colab, it is easy to compare two notebooks. Use View > Diff notebooks from the Colab menu or navigate to https://colab.research.google.com/diff and in the input box, paste URLs of the notebooks to see the difference.

Comparing Notebooks
Image By Author

Stop Colab from Disconnecting

Disconnected due to idleness:

This is a significant disadvantage of Google Colab, and I’m sure many of you have experienced it at least once. You decide to take a break, but when you return, your notebook is disconnected!

In fact, if we leave the notebook idle for more than 30 minutes, Google Colab automatically disconnects it.

Open Chrome DevTools by hitting F12 on Windows or ctrl+shift+i on Linux, and then type the following JavaScript code into your console:

function KeepClicking(){
console.log("Clicking");
document.querySelector("colab-connect-button").click() }
setInterval(KeepClicking,60000)

Every 60 seconds, this function clicks the connect button. As a result, Colab believes that the notebook is not idle, and you should not be concerned about being disconnected!

Disconnection while a task is running:

To begin, keep in mind that when you connect to a GPU, you are only allowed to use the Cloud Machine for a maximum of 12 hours at a time.

You may be disconnected at some point during these 12 hours. “Colaboratory is meant for interactive use,” according to the FAQ on Colab. Background computations that have been running for a long time, particularly on GPUs, can be terminated.

 

Use Tensorboard with Colab

TensorBoard is a tool for displaying metrics and visualizations throughout a Deep Learning workflow. It is immediately usable within Colab.

Load the TensorBoard notebook extension first:

%load_ext tensorboard

Once your model is complete, launch TensorBoard within the notebook by typing:

%tensorboard --logdir logs
Use Tensorboard with Colab
image by Author

Conclusion

These were few tricks that I have found very useful, particularly when it comes to training Ml models on GPUs. Even though Colab notebooks can only run for a maximum of 12 hours, with the hacks shared above, you should be able to get the most out of your session.

I hope you have found this article useful and have a wonderful day, Thank you.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion. 

Pavan Kalyan

01 Jul 2021

Myself Pavan Kalyan with 2 years of experience in developing, deploying scalable Machine Learning models and interested to explore data, discover useful insights. I like to participate in the Machine Hack and Kaggle competitions.

RELATED ARTICLES

Most Popular

Recent Comments