Key Takeaways:
- It’s important for data scientists to understand the so-called “gap” between statistics and machine learning, and how there actually is a lot of commonality between the two; it’s just a matter of how you look at things.
- PyMC3 is a very useful probabilistic programming framework for Python.
- There are a number of methods, on both the machine learning and statistics side, to bridge the gap.
Thomas Wiecki has spent all of his professional career since graduate school at Quantopian, a fintech business developing Bayesian models to evaluate trading algorithms. In his talk at ODSC Europe 2018, Wiecki started off with the comic below.
Slide copyright Thomas Wiecki, Ph.D., ODSC Europe 2018
[Related Article: Machine Learning vs. Statistics]
“The title of the talk implies that there is indeed a gap and that is actually not an uncontested point and there are quite a few members of the machine learning community who like similarly in the statistics community feel that it’s all just one thing,” said Wiecki. “There’s this comic that I think is quite funny and also has some truth to it. The sentiment is if you’re standing in front of a crowd like me right now and you’re talking about artificial intelligence, which of course sounds amazing, but really that is just machine learning and that really is just window dressing for good old statistics. I think there’s some truth to that and definitely there are very deep connections between machine learning and statistics but there are also many differences in the cultures and the histories of how the two fields developed and the emphasis they placed on certain problems that they want to solve.”
What are the differences between machine learning and statistics? Answers to this question vary as widely as the array of tools employed by the two disciplines. Although both share more similarities than differences, their cultures, and roots, as well as the language they use, are quite different. More recently, however, we can see a healthy cross-pollination where each field starts to adopt ideas of the other. Wiecki’s talk takes a look at the ideas the two disciplines have developed, identify those that have already crossed the chasm, and those that are still grounded in one of the two fields. Some examples of such concepts are informative priors, neural networks, uncertainty, regularization, and hierarchical models.
The talk shows how to combine these various ideas to provide a rich toolbox to solve wide-ranging data science problems. To do this Wiecki defined a number of characteristics for two protagonists in a techie version of a Socratic dialog. On one side was “Statistical Rick” who carried the point of view of traditional statisticians, while “Machine Learning Morty” had the more modern perspective of a present-day data scientist. The conversations rang true for me as I’ve played Rick’s role in the past due to my academic background in applied statistics.
Slide copyright Thomas Wiecki, Ph.D., ODSC Europe 2018
Wiecki’ proposition is that there is not enough cross-talk between these two groups, even though their methods are often complementary and can be combined to solve complex problems that neither one alone can solve. The balance of the talk supports this perspective and Rick & Morty work together to solve increasingly complex problems – in this case in the area of quantitative finance based on his company’s algorithmic trading platform. The discussion includes ways to bridge the disciplines of machine learning and statistics.
Dr. Wiecki provided coding examples using PyMC3, a probabilistic programming framework for Python that includes Bayesian modeling and probabilistic machine learning with Theano.
Slide copyright Thomas Wiecki, Ph.D., ODSC Europe 2018
[Related Article: Prophet is Data Science not Statistics, and there is a Difference]
To take a deeper dive into the relationship between machine learning and statistics, check out Dr. Wiecki’s very fun full talk from ODSC Europe 2018 below.