As a consultant in data science and machine learning, and also a tech journalist, I’m in a position to recognize current trends in the industry. One of the latest crazes centers around “automated machine learning” or AutoML as many call it. In fact, I’ve written a couple of articles in the past year about the growing interest in AutoML:
[Related Article: Automated Machine Learning: Myth Versus Reality.]
My familiarity with this class of solution is what drew me to a recent ODSC West 2018 talk by Randal S. Olson, PhD – “The Past, Present, and Future of Automated Machine Learning.” Olson works as Lead Data Scientist for Life Epigenetics, Inc. In addition, he’s worked for a number of years as a researcher in the AutoML space so he’s very well-qualified to provide important insights into this area of technology. The slides for Jensen’s presentation can be found HERE.
An idea that’s been around since the 1990s, AutoML has been described as a “quiet revolution in AI” that is poised to dramatically change the data science landscape by automating a large portion of the machine learning process. Academic researchers, startups, and tech giants alike have begun developing AutoML methods and tools ranging from simple open source prototypes to industry-scale software products. Yet beyond all the hype and vague tech jargon, many are left wondering: What is AutoML, really? In this talk, Olson draws from his AutoML research experience to discuss the benefits of AutoML and highlight some promising future directions of the field.
The way the need for Auto ML solutions was explained to me by a representative from H2O.ai earlier this year at the company’s annual H2O World conference was this way – AutoML allows the data scientist to extend his/her productivity without adding more members to the organization’s data science team. So in effect, AutoML addresses the skills gap that keeps widening between the demand for data science talent and the availability of this talent. This is why so many managers of data science teams are motivated to evaluate these products and why they’re gaining in popularity.
As depicted in the figure below, AutoML addresses the steps in the ML workflow enclosed in the box.
Olson reported on the results of an independent study he carried out to see how well various ML algorithms did with just default parameters. He used 165 classification data sets from a variety of sources and then used 13 different classification algorithms from scikit-learn. Then he compared classification accuracy results using the default parameters for each algorithm to a tuned version of those algorithms for the same data sets; namely how much improvement is seen. In the plot below from this study, the y-axis shows the ML algorithms, and the x-axis shows a distribution of performance improvements. We see some algorithms don’t see much improvement, while some algorithms see great improvement. On average, you see about 5-10% improvement in classification accuracy just from tuning the algorithms from the default parameters. This probably means there is no parameter combination that works for all problems. Tuning is mandatory to see improvement and this feature is built into all AutoML solutions. AutoML also selects the best model for the problem at hand. AutoML can even handle “some” of the typical data cleansing and prep work required for most projects.
Olson demos a couple of AutoML solutions, one an open source solutions that he helped develop – TPOT, and a commercial solutions from DataRobot.
At the end of the talk he reviews his perspective of the future of AutoML in the next 3-5 years. Here are some predictions:
- AutoML will also handle most of the data cleaning process
- AutoML will vastly improve deep learning
- AutoML will scaled to large data sets
- AutoML will become human competitive
- AutoML will transform the practice of data science as we know it
- AutoML is only a small part of a greater meta-learning movement
This is an important talk to consume for any data scientist as it addresses the topics: what is AutoML, why you should care about AutoML and most importantly, why you should use AutoML in your day-to-day workflows. If you’re working in data science today, or managing a team of data scientist, you should view Olson’s presentation. To take a deeper dive into how Auto ML can make a difference to your or your team’s efficiency, check out Olson’s compelling talk from ODSC West 2018
[Related Article: Should You Build or Buy Your Data Science Platform?]
Key Takeaways:
- AutoML is a significant trend in data science today
- There are two main efforts underway in building scalable AutoML tools: open source, and commercial
- Today’s AutoML tools address many parts of the standard data science process
- AutoML addresses the current skills gap in machine learning talent, and increases the data scientist’s productivity
- The future of AutoML, out to the next 3-5 years, looks very promising by expanding the reach of today’s tools