Deep learning may not always be the most appropriate application of algorithms for every model problem. Madison May’s primary focus at Indico Solutions is giving businesses the ability to develop machine learning algorithms despite limited training data through a process called Transfer Learning.
[Related Article: Deep Learning with Reinforcement Learning]
Primary Limitations of Deep Learning
Representation learning could be a more accurate term than deep learning because of the primary goals. For example:
Visualization comes from representing input in a way that conforms to a target task. Input is fed through layers, a process that produces activation corresponding to a models representation.
Deep learning is training data intensive. If you don’t have more than 10,000 examples, deep learning probably isn’t on the table at all. Similar processes also exist in natural language processing. For example, a simple word2vec model can predict words through sequences by context such as vectors for language learning, which can produce these patterns based on context through vectors from other inputs.
Deep learning isn’t always the best approach for these types of data sets. Extreme training requirements, huge computational time and, most importantly, expense put those deep learning inputs out of reach for many contexts.
How Transfer Learning Resolves These Problems
Everyone has problems, but not everyone has data. Big data is actually less of an issue than small data. Transfer learning is the application gained of one context to another context. So applying the knowledge from one model could help reduce training time and deep learning issues through taking existing parameters to solve “small” data problems.
For example, using deep learning to teach an algorithm to recognize a tiger is far too labor intensive for such a small task. Instead, transferring existing models such as high-level concepts of inputs like “how large? How striped? What color?” could give you high activations, since each of these questions corresponds with an image of a tiger. The relationship between the input features and the target becomes much more straightforward with less training power and less overall computing data.
The basic process goes like this:
Transfer learning then solves deep learning issues in three separate ways.
- simpler training requirements using pre-trained data
- much smaller memory requirements
- considerably shortened target model training – seconds rather than days
Practical Recommendations Applying in Practice
If you’ve watched any of the HBO show Silicon Valley, you’re familiar with the “not hotdog” app. This would be a hilarious but real application of the transfer learning process. While “not hotdog” isn’t exactly solving the world’s problems, it does raise the question of what’s possible with this type of learning.
For NLP, the process is more complicated. Visual data inputs tend to be more concrete, requiring, for example only a few hundred examples of hotdog/not hotdog for the app to be able to discern between the two categories. However, there’s a much larger variety of data for the language model, dealing with variance in terminology. Components that make up vision features are more generic while finance trained sets won’t transfer as well to biomedical models. Overall, there’s a lack of agreement on what even constitutes a good source model.
Here’s how you can see the benefits in your own work:
- Align vocabulary in the source model with that of your target model. This is the most critical variable.
- Pull source tasks that have good general representations such as natural language inference, machine translation, or multitask learning.
- Keep target models simple. Allow logistic regression to do its job.
- At the same time, be careful about undersampling because classic machine learning problems ( for example, class imbalance) are highly exaggerated in small training data sets.
- Consider second-order optimization methods. Models you’re training in transfer learning are a lot more simple, so where it fails in deep learning, it’s perfect for transfer learning.
- Measure performance variance across model performance. High variance generally corresponds to poor generalization.
- Don’t forget feature engineering. Applying transformations can make mapping between input data and target task a lot more intuitive. Pairing domain experts with deep learning experts, for example, can produce more specific results.
Monitoring Performance Benchmarks
May and his team built their own framework, Enso, to evaluate these transfer learning parameters. They built it to prevent human overfitting, ensure higher fidelity baselines, and to benchmark on many datasets to understand where a particular approach is effective.
The workflow simplifies these benchmarks so that you can quickly input data and replicate across many different data sets. With under 1000 examples, these parameters are the ideal solution to training. Enso’s full documentation is available online, and the data is open to contributions.
Research For Understanding the Current State of Transfer Learning
It’s illuminating to read some of the research that backs up these types of task models, even if most are applied to models with more than 1000 inputs. The studies shed light on how data engineers are applying these tasks to the less generalized world of NLP instead of concrete visualization.
- Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning – Sandeep Subramanian, Adam Trischler, Yoshua Bengio, Christopher J Pal
- Fine Tuning Language Models For Text Classification – Jeremy Howard, Sebastian Ruder
- Deep Contextualized Word Representations – Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer
Moving Forward With Transfer Learning
The most significant aspect of this research is to consider how simplicity helps illuminate solutions. Since small data problems are often more difficult to pin down, transfer learning can take the place of deep learning to reduce the time and effort spent solving these small problems. Now that businesses and organizations don’t have to expend massive amounts of time and resources for small-scale issues, it could be a way to process previously untouched data.