I started working with Deep Learning (DL) in the 2016 – 2017 time frame when the framework ecosystem was much more diverse and fragmented than it is today. Theano was the gold standard at the time, Tensorflow had just been released, and DeepLearning4j was still being built. Both Theano and Tensorflow were relatively low-level frameworks, and were somewhat painful to work with, especially for newbies like me. The Keras and Lasagne libraries helped alleviate the pain somewhat, offering a higher level API — Lasagne wrapped Theano, while Keras provided a common interface for either Theano or Tensorflow.
[Related article: Deep Learning in R with Keras]
My first DL project was Image Classification. By this time, the ImageNet problem was pretty much considered solved, with submitted ML models routinely scoring in excess of 95% accuracy. For reference, the ImageNet task was to distinguish between about 1000 types of everyday objects. My classifier, on the other hand, needed to distinguish between 8 (later 11) classes of medical images.
Using Transfer Learning to leverage the knowledge in the ImageNet models seemed like a natural fit. In my case, Transfer Learning would involve taking an existing high-performing ImageNet model, and fine-tuning it with my labeled data to fit the medical image classification task.
The first iteration of my model ended up being a hybrid of a Caffe backend to generate vector image representations using trained ImageNet models, and a Keras Dense network to classify these vectors into one of 8 classes. Caffe is a C++ framework specialized for image tasks, and the main reason I chose it was because it provided downloadable trained ImageNet models. Later, when the Keras project provided its own downloadable ImageNet models, I built a second iteration of the model, which fine-tuned the ImageNet model and learned to classify medical images end-to-end.
I bring up this story to illustrate how easy and natural it felt to start working with Keras. For a long time, that was the only library I needed for building my models. Over this period, I have been consistently delighted by its intuitive API, its sensible default parameters, and the overall quality of tutorial material on its website.
However, if you are like me, you often have to understand and train other peoples’ models in addition to your own. Since these models are often built by DL researchers using Pytorch and Tensorflow, I ended up learning these frameworks as well. I can attest from experience that Keras is by far the easiest to learn and use.
Today, the Deep Learning ecosystem is much more mature, so thankfully one can get by with learning fewer frameworks. While many excellent frameworks have been released over these intervening years, and are being used in specialized niches, the major ones are Keras, Tensorflow, and Pytorch. Pytorch became popular because of its eager execution model, which Tensorflow did not allow, and which Keras hid behind its cleverly-designed API. Keras has since been subsumed into Tensorflow as tf.keras, but the original Keras lives on as well, with an additional CNTK (from Microsoft) backend. For its part, Tensorflow, in its 2.x incarnation, has embraced Pytorch’s eager execution model, and made tf.keras its default API.
So there has been lots of convergence, and while I recommend learning all three of the frameworks listed above, if you need to get productive quickly and don’t have a framework imposed upon you (by the project or by corporate policy), I would recommend you start with Keras. If you are proficient in one of the others, you should still learn Keras, because for a majority of tasks you are likely to be more productive with Keras than with your current framework.
Of course, the simplicity and elegance of Keras comes at a price. Most things are easy and intuitive to do in Keras, but certain things are very hard or even impossible. Some of these tasks are possible in the other lower-level frameworks, but you pay for that convenience with more verbose code and a steeper learning curve. However, with tf.keras (and to some extent with the original Keras using Tensorflow backend), you have access to the full Tensorflow substrate. In addition, the Keras team has recently been busy refining their code, which now makes it possible to do certain things in Keras that were previously thought to be impossible.
I am honored to present Keras: from soup to nuts – an example-driven tutorial at OSDC West this year, where I hope to touch upon some of these things that make Keras such a simple yet powerful addition to your DL toolbox. I hope to see you there!
About the author/ODSC West speaker: Sujit Pal is an applied data scientist at Elsevier Labs, an advanced technology group within the Reed-Elsevier Group of companies. His areas of interests include Semantic Search, Natural Language Processing, Machine Learning, and Deep Learning. At Elsevier, he has worked on several machine learning initiatives involving large image and text corpora, and other initiatives around recommendation systems and knowledge graph development. He has co-authored Deep Learning with Keras (https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-keras) and Deep Learning with Tensorflow 2.x and Keras (https://www.packtpub.com/data/deep-learning-with-tensorflow-2-0-and-keras-second-edition), and writes about technology on his blog Salmon Run (https://sujitpal.blogspot.com/).