Data Modelling & AI Data Structure & Algorithm

The Lottery Ticket Hypothesis

30 July 2024

2

The Lottery Ticket Hypothesis has been presented in the form of a research paper at ICLR 2019 by MIT-IBM Watson AI Lab. This paper has been awarded the Best Paper Award in ICLR 2019.

Background: Network Pruning
Pruning basically means reducing the extent of a neural network by removing superfluous and unwanted parts. Network Pruning is a commonly used practise to reduce the size, storage and computational space occupied by a neural network. Like – Fitting an entire neural network in your phone. The idea of Network Pruning was originated in the 1990s which was later popularized in 2015.

How do you “prune” a neural network?
We can summarize the process of pruning into 4 major steps:

Train the Network
Remove superfluous structures
Fine-tune the network
Optionally : Repeat the Step 2 and 3 iteratively

But, before we further move ahead, you must know :

Usually, pruning is done after a neural network is trained on data.
The superfluous structures can be Weights, Neurons, Filters, Channels . However, here we consider “sparse pruning” which means pruning “weights”.
A heuristic is needed to define whether a structure is superfluous or not. These heuristics are Magnitudes, Gradients, or Activations. Here, we chose magnitudes. We prune the weights with the lowest magnitudes.
By removing parts out of neural network, we somewhat have damaged the activation function. Hence, we train the model a bit more. This is known as fine-tuning.

9x to 12x

Can’t we randomly initialize a pruned network and train to convergence?

Training a pruned model from scratch performs worse than retraining a pruned model, which may indicate the difficulty of training a network with small capacity

How to train pruned networks ?

Randomly initialize the full network
Train it and prune superfluous structure
Reset each remaining weight to its value after Step 1.

This basically suggests that “There exists a subnetwork that exists inside a randomly-initialized deep neural network which when trained in isolation can match or even outperform the accuracy of the original network.

Advantages of Trained Pruned Networks

A fully-connected neural network like MNIST having more than 600K parameters supposedly is reduced to a subnet of 21K parameters having the same accuracy as the original network
Retention of the original features – Dropout, weight decay, batchnorm, resnet, your favourite optimizer etc.

Further Scope of Research

Subnetworks are found retroactively
Finding subnetworks is very expensive
Small, vision networks and tasks

Link to the research paper: The lottery ticket hypothesis: Finding sparse, trainable neural networks

Recommended

Solve DSA problems on GfG Practice.

Solve Problems

Feeling lost in the world of random DSA topics, wasting time without progress? It’s time for a change! Join our DSA course, where we’ll guide you on an exciting journey to master DSA efficiently and on schedule.
Ready to dive in? Explore our Free Demo Content and join our DSA course, trusted by over 100,000 neveropen!

The Lottery Ticket Hypothesis

Run Local AWS Cloud Stack using LocalStack on Linux

Learn Terraform Automation in 3 days using Video Courses

How To Expose Ansible AWX Service using Nginx Ingress

LEAVE A REPLY Cancel reply

Most Popular

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

Google Messages can now show your profile exactly how it’s supposed to be

Recent Comments

EDITOR PICKS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR POSTS

How to factory reset the Google Pixel 8a

The 2024 YouTube Music Recap could be here any day now

How to install Proton VPN on a Fire TV Stick

POPULAR CATEGORY

ABOUT US

FOLLOW US