In this blog post I want to introduce some basic concepts of reinforcement learning, some important terminology, and show a simple use case where I create a game playing AI in KNIME Analytics Platform. After reading this, I hope you’ll have a better understanding of the usefulness of reinforcement learning, as well as some key vocabulary to facilitate learning more.
Reinforcement Learning and How It’s Used
You may have heard of Reinforcement Learning (RL) being used to train robots to walk or gently pick up objects; or perhaps you may have heard of it’s uses in the discovery of new chemical compounds for medical use. It’s even being applied to regular vehicle and network traffics! In any case, we’ll start at the beginning.
Reinforcement learning is an area of Machine Learning and has become a broad field of study with many different algorithmic frameworks. Summarized briefly, it is the attempt to build an agent that is capable of interpreting its environment and taking an action to maximize its reward.
At first glance this sounds similar to supervised learning, where you seek to maximize a reward or minimize a loss as well. The key difference is that those rewards or losses are not obtained from labeled data points but from direct interaction with an environment, be it reality or simulation. This agent can be composed of a machine learning model – either entirely, partially, or not at all.
Fig 1: Reinforcement Learning cycle wherein the agent recursively interacts with its environment and learns by associating rewards with its actions.
https://commons.wikimedia.org/wiki/File:Reinforcement_learning_diagram.svg
A simple example of an agent that contains no machine learning model is a dictionary or a look-up table. Imagine you’re playing “Rock-Paper-Scissors” against an agent that can see your hand before it makes its move. It’s fairly straightforward to build this look up table, as there are only three possible game states for the agent to encounter:
Player’s Move | Agent’s Move |
Rock | Paper |
Paper | Scissors |
Scissors | Rock |
Fig 2: Look-up table instructing a Rock-Paper-Scissors agent on which move to take based on its opponent’s move.
This can get out of hand very quickly, however. Even a very simple game such as Tic-Tac-Toe has nearly 10 million possible board states. A simple look-up table would never be practical, and let’s not even talk about the number of board states in games like Chess or Go…
This is where machine learning comes into the equation
Through different modeling techniques, commonly Neural Networks, thanks to their iterative training algorithm, an agent can learn to make decisions based on environment states it has never seen before.
While it is true that Tic-Tac-Toe has many possible board states and a look-up table is impractical, it would still be possible to build an optimal agent with a few simple IF statements. I use the Tic-Tac-Toe example anyway, because of its simple environment and well-known rules.
Agent-agent game sessions
In my example workflow, the agent plays against itself a configured number of times. By default the network plays 25 sets of 100 games for a total of 2,500 games. This is the Easy Mode AI available in the KNIME WebPortal. The Hard Mode AI was allowed to play an additional 100 sets of 100 games a total of 12,500 games. To further improve the AI we could tune the network architecture or play with different reward functions.
The game as a web application
The second application we need is a web application. From a web browser, a user should be able to play against the agent. To deploy the game on a web browser we use the KNIME WebPortal, a feature of the KNIME Server.
In KNIME Analytics Platform, JavaScript-based nodes for data visualization can build parts of web pages. Encapsulating such JavaScript-based nodes into components allows the construction of dashboards as web pages with fully connected and interactive plots and charts. In particular, we used the Tile View node to display each of the nine sections of the Tic-Tac-Toe board, and show a blank, human, or KNIME icon on each.
The deployment workflow that allows a human to play (and further train) the agent is shown in Fig. 1. A game session on the resulting web-application is shown in Fig. 2.
Fig 1: KNIME workflow for creating the playable webportal application seen below.
Fig 2: Playing against the AI on Hard on the KNIME Server Webportal
Summing up
If this brief look at Reinforcement Learning has inspired you, and you’d like to read more about this use case and some of the mathematics behind it, check out the full article on the KNIME blog. Or download the training and deployment workflows on the KNIME Hub.
About the author
Corey Weisinger is a Data Scientist with KNIME, based in Austin, Texas. He studied Mathematics at Michigan State University focusing on Actuarial Techniques and Functional Analysis. Before coming to KNIME, he worked as an Analytics Consultant for the auto industry in Detroit, Michigan. He currently focuses on Signal Processing and Numeric Prediction techniques and is the author of the Alteryx to KNIME ebook..
Further Reading and References
Markov Decision Process: https://arxiv.org/abs/1907.10243
Reinforcement Learning: http://incompleteideas.net/sutton/book/the-book.html
Deep Reinforcement Learning: https://arxiv.org/abs/1811.12560
KNIME Transfer Learning Blog: https://www.knime.com/blog/transfer-learning-made-easy-with-deep-learning-keras-integration
Tic-Tac-Toe: https://en.wikipedia.org/wiki/Tic-tac-toe
Chemical Drug Discovery: https://arxiv.org/abs/1911.07630
Bayesian Optimization: https://arxiv.org/abs/1807.02811
Traffic Regulation: https://arxiv.org/abs/2007.10960
Deep RL for Robotics: https://arxiv.org/abs/1610.00633
KNIME Analytics Platform: https://www.knime.com/knime-analytics-platform
KNIME Server: https://www.knime.com/knime-server
KNIME Webportal: https://www.knime.com/knime-software/knime-webportal
KNIME and Keras: https://www.knime.com/deeplearning/keras
KNIME Cheat-Sheets: https://www.knime.com/sites/default/files/110519_KNIME_Machine_Learning_Cheat%20Sheet.pdf
Tic-Tac-Toe Learning Workflow: https://kni.me/w/pjN-0Sm6RtZ3b3Hl
Tic-Tac-Toe Playing Workflow: https://kni.me/w/JwmYV-QHc1cWF5xK
KNIME Parameter Optimization: https://kni.me/w/lkw5Tu3h_pVXzVUe