Last week, ODSC hosted a talk by Dr. Francesca Lazzeri, Senior Machine Learning Scientist at Microsoft, on the capabilities of automated and interpretable machine learning software in Microsoft’s Azure. Notably, this talk is part of a series that covers a variety of data science topics. The talks are great networking opportunities; the room was packed with technical and non-technical individuals interested in machine learning. Additionally, there was pizza! You can find information on upcoming talks here.
[Related Article: Interpretable Machine Learning – Fairness, Accountability, and Transparency in ML systems]
Dr. Lazzeri explained the role of automated machine learning tools available with Microsoft’s automated ML. While “automated machine learning” may sound like circular logic, the idea is fairly intuitive; Microsoft’s tool uses probabilistic machine learning models to guide decisions throughout the data mining process such that it reduces the necessary time and resources. Essentially, the automated component of the software allows machine learning “best practices” to be employed in a precise manner. The process involves loading the data, defining the goals of the model, and applying constraints. The software then engineers features, evaluates multiple iterations of modeling techniques that appear well suited to the task and ultimately the best model/features combination is selected. Automated ML is essentially a recommendation system, trained to apply the correct model to the correct task and then evaluate the performance. The point of the software, as Dr. Lazzeri put it, is not to completely automate the data scientist position, but to employ computing power to save time and resources, often providing a second opinion to manually conducted modeling efforts.
Automated ML in Azure has the capability to automatically preprocess and engineer features, selecting those that offer the best model performance. The data is normalized or scaled automatically to insure the best algorithm performance is realized. Additional functionality enables the software to impute missing values, apply encoding or add transformations.
The second part of Dr. Lazzeri’s talk focused on the interpretability of machine learning. Automating components of the data mining process can, in some cases, remove data scientists enough that interpretability is lost. Dr. Lazzeri stressed that understanding why machine learning models are making the predictions they make is key to ensuring ethical data analysis. The decisions that machine learning models make can have real-world consequences, so ensuring fairness is a concern. For example, when a machine learning model is applied to predict the risk of cancer, it is imperative to understand why. Additionally, the interpretability of machine learning can be key to a model’s value in business. Data scientists often need to explain to decision-makers or other internal clients the rationale behind certain model results. To address this problem, the Azure ML team developed an interpretability tool kit. The kit includes a number of functions that analyze feature importance and provide interactive visualizations. The main package in Python is “azureml.explain.model” containing a variety of functionalities available here. The interpretability tool kit is available on its own as well as with automated ML discussed previously.
[Related Article: Watch: The Future of Machine Learning]
Dr. Lazzeri demonstrated how intelligent applications for machine learning can save time and resources, but to be clear the Azure services can be costly. There seems to be a balance between wielding open source tools like scikit-learn and Keras, and employing automated ML software. The decision to use Azure’s tools likely depends on the extent of financial resources available and the amount of modeling to be conducted.
If you found this topic interesting, stop by the next ODSC meetup talk on August 14th, titled “Understanding Machine Learning Results to Increase their Value & Avoid Pitfalls” by Dr. Linda M. Zeger of Auroral LLC.