Sunday, September 22, 2024
Google search engine
HomeData Modelling & AITeaching KNIME to Play Tic-Tac-Toe

Teaching KNIME to Play Tic-Tac-Toe

In this blog post I want to introduce some basic concepts of reinforcement learning, some important terminology, and show a simple use case where I create a game playing AI in KNIME Analytics Platform. After reading this, I hope you’ll have a better understanding of the usefulness of reinforcement learning, as well as some key vocabulary to facilitate learning more.

Reinforcement Learning and How It’s Used

You may have heard of Reinforcement Learning (RL) being used to train robots to walk or gently pick up objects; or perhaps you may have heard of it’s uses in the discovery of new chemical compounds for medical use. It’s even being applied to regular vehicle and network traffics! In any case, we’ll start at the beginning.

Reinforcement learning is an area of Machine Learning and has become a broad field of study with many different algorithmic frameworks. Summarized briefly, it is the attempt to build an agent that is capable of interpreting its environment and taking an action to maximize its reward.

At first glance this sounds similar to supervised learning, where you seek to maximize a reward or minimize a loss as well. The key difference is that those rewards or losses are not obtained from labeled data points but from direct interaction with an environment, be it reality or simulation.  This agent can be composed of a machine learning model – either entirely, partially, or not at all.

Fig 1:  Reinforcement Learning cycle wherein the agent recursively interacts with its environment and learns by associating rewards with its actions.
https://commons.wikimedia.org/wiki/File:Reinforcement_learning_diagram.svg

A simple example of an agent that contains no machine learning model is a dictionary or a look-up table. Imagine you’re playing “Rock-Paper-Scissors” against an agent that can see your hand before it makes its move. It’s fairly straightforward to build this look up table, as there are only three possible game states for the agent to encounter:

Player’s Move Agent’s Move
Rock Paper
Paper Scissors
Scissors Rock

 Fig 2:  Look-up table instructing a Rock-Paper-Scissors agent on which move to take based on its opponent’s move.

This can get out of hand very quickly, however. Even a very simple game such as Tic-Tac-Toe has nearly 10 million possible board states. A simple look-up table would never be practical, and let’s not even talk about the number of board states in games like Chess or Go…

This is where machine learning comes into the equation

Through different modeling techniques, commonly Neural Networks, thanks to their iterative training algorithm, an agent can learn to make decisions based on environment states it has never seen before.

While it is true that Tic-Tac-Toe has many possible board states and a look-up table is impractical, it would still be possible to build an optimal agent with a few simple IF statements. I use the Tic-Tac-Toe example anyway, because of its simple environment and well-known rules.

Agent-agent game sessions

In my example workflow, the agent plays against itself a configured number of times. By default the network plays 25 sets of 100 games for a total of 2,500 games. This is the Easy Mode AI available in the KNIME WebPortal. The Hard Mode AI was allowed to play an additional 100 sets of 100 games a total of 12,500 games. To further improve the AI we could tune the network architecture or play with different reward functions.

The game as a web application

The second application we need is a web application. From a web browser, a user should be able to play against the agent. To deploy the game on a web browser we use the KNIME WebPortal, a feature of the KNIME Server.

In KNIME Analytics Platform, JavaScript-based nodes for data visualization can build parts of web pages. Encapsulating such JavaScript-based nodes into components allows the construction of dashboards as web pages with fully connected and interactive plots and charts. In particular, we used the Tile View node to display each of the nine sections of the Tic-Tac-Toe board, and show a blank, human, or KNIME icon on each.

The deployment workflow that allows a human to play (and further train) the agent is shown in Fig. 1. A game session on the resulting web-application is shown in Fig. 2.

Fig 1:  KNIME workflow for creating the playable webportal application seen below.

An Introduction to Reinforcement LearningFig 2:  Playing against the AI on Hard on the KNIME Server Webportal

Summing up

If this brief look at Reinforcement Learning has inspired you, and you’d like to read more about this use case and some of the mathematics behind it, check out the full article on the KNIME blog. Or download the training and deployment workflows on the KNIME Hub.


About the author

Corey Weisinger is a Data Scientist with KNIME, based in Austin, Texas. He studied Mathematics at Michigan State University focusing on Actuarial Techniques and Functional Analysis. Before coming to KNIME, he worked as an Analytics Consultant for the auto industry in Detroit, Michigan. He currently focuses on Signal Processing and Numeric Prediction techniques and is the author of the Alteryx to KNIME ebook..

Further Reading and References

Markov Decision Process:                  https://arxiv.org/abs/1907.10243

Reinforcement Learning:                    http://incompleteideas.net/sutton/book/the-book.html

Deep Reinforcement Learning:          https://arxiv.org/abs/1811.12560

KNIME Transfer Learning Blog:         https://www.knime.com/blog/transfer-learning-made-easy-with-deep-learning-keras-integration

 

Tic-Tac-Toe:                                       https://en.wikipedia.org/wiki/Tic-tac-toe

Chemical Drug Discovery:                  https://arxiv.org/abs/1911.07630

Bayesian Optimization:                       https://arxiv.org/abs/1807.02811

Traffic Regulation:                              https://arxiv.org/abs/2007.10960

Deep RL for Robotics:                        https://arxiv.org/abs/1610.00633

 

KNIME Analytics Platform:                 https://www.knime.com/knime-analytics-platform

KNIME Server:                                    https://www.knime.com/knime-server

KNIME Webportal:                              https://www.knime.com/knime-software/knime-webportal

KNIME and Keras:                              https://www.knime.com/deeplearning/keras

KNIME Cheat-Sheets:                        https://www.knime.com/sites/default/files/110519_KNIME_Machine_Learning_Cheat%20Sheet.pdf

 

Tic-Tac-Toe Learning Workflow:        https://kni.me/w/pjN-0Sm6RtZ3b3Hl

Tic-Tac-Toe Playing Workflow:          https://kni.me/w/JwmYV-QHc1cWF5xK

KNIME Parameter Optimization:        https://kni.me/w/lkw5Tu3h_pVXzVUe

Dominic Rubhabha-Wardslaus
Dominic Rubhabha-Wardslaushttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Recent Comments