This article is co-written by Joaquin Vanschoren and Pieter Gijsbers.
Today’s society increasingly relies on machine learning models for complex tasks such as decision making and personalized medicine. Constructing a good machine learning model is complicated and time-intensive. Relevant data has to be collected and cleaned, features might need to be engineered, the right machine learning algorithms have to be chosen, their hyperparameters have to be tuned, and the resulting model’s performance should be evaluated. Automated Machine Learning (AutoML) aims to automate these steps.
[Related Article: What Do Managers and Decision Makers Need to Know About AutoML?]
Research in AutoML has been focused on automatic model construction through both machine learning pipeline construction and neural architecture search (NAS). Machine learning pipelines are built with discrete components, e.g. a number of preprocessing steps, one or more a machine learning algorithm, and model ensembles. They are typically more suited for structured data. For unstructured data such as images, audio, or text, NAS instead optimizes the configuration of a neural network, e.g. the number of layers, the layer types or the connections between layers. For both types of tasks AutoML methods have achieved near- or super-human level performance across various datasets.
In our workshop we will be focused on the machine learning pipeline construction AutoML methods. AutoML tools that fit into this category include e.g. auto-sklearn, TPOT, ML-Plan and GAMA. We will cover the different methods that are used to optimize over arbitrary pipelines, such as random search, evolutionary optimization and Bayesian optimization.
Meta-Learning
Humans learn over time which machine learning algorithms work well for which types of data, or which hyperparameters are more important to tune than others and which values should be tried. Likewise, AutoML methods should learn across tasks and construct good machine learning models faster. This learning across multiple tasks is called meta-learning.
To use meta-learning in an automated fashion we can use a meta-dataset, a dataset with machine learning experiment results, detailing how effective different machine learning algorithm configurations are on different datasets. Such a dataset can be obtained through e.g. running many machine learning experiments, or by downloading them an open source repository such as OpenML. We can use these meta-datasets, or any meta-models or other knowledge learned on them, to guide the search of AutoML methods. We will cover how to build meta-datasets with OpenML and different meta-learning techniques to leverage them, such as warm starting, surrogate modeling, or more complex meta-modeling.
GAMA
We will finish the workshop with a demo of AutoML tool GAMA, a novel, research-oriented AutoML tool in Python. It is designed to be easy to use for both end-users and AutoML researchers. It sports different search strategies and a dashboard which visualizes its search results. If you are curious, it’s really easy to get started with this quickstart guide. We’d love for you to give it a go!
See you soon!
[Related Article: How to Prepare for an Automated Future: 7 Steps to Machine Learning]
In our session, we will give more background to the different AutoML approaches, how meta-learning can be applied to enhance AutoML and give a demonstration with AutoML tool GAMA. If you are interested in learning more in the meantime, we published an open-access book on AutoML and an open-source benchmark comparing different open source AutoML tools. See you in London!