Data science and artificial intelligence are everywhere. So are video games. It’s no surprise that it was only a matter of time until people started getting creative with combining the two in unique ways. And no, I’m not talking about improving in-game AI (because clearly, Skyrim doesn’t care), and I’m not talking about analyzing game sales either. Today, I want to look at some fun reinforcement learning use cases, mostly in the world of gaming, ranging from creating new AI to beating the game using bots.
First of all, what is reinforcement learning?
Short answer (aka the answer for non-data scientists): it’s a data science thing. Long answer (for the data scientists): Reinforcement learning is a paradigm of learning, where the AI agent learns through experience. It interacts with the environment and gets rewards from the environment as feedback. In gamer terms, when you’re fighting an NPC or boss and it keeps learning how you play and gets harder as time goes on. Or if you’re lucky, you get a companion AI that gets better, not worse, by working with you.
Now, to connect that logic to data science, some smart folks have created their own AI/bots that learn how the game works, and keep getting better through experience. You can also think of it as learning by experience, or through trial and error. Because of this experimental nature, agents (aka the AI), can learn a set of rules even in an environment or situation it’s never been in before, leading to interesting reinforcement learning use cases.
Starting simple with tic-tac-toe
Gaming isn’t just PS5 and OP PCs – it can be as simple as good ‘ol Tic-Tac-Toe. With their online portal, KNIME developed a tic-tac-toe bot that learned as it played. By creating two agents that played against each other a set number of times, and by being given a very vague general idea of the rules of tic-tac-toe, these agents were able to learn the game and be quite formidable. In an experiment where the two agents were allowed to play more games to better learn the game, they were given the “hard mode” title.
Moving on to Chess
Chess is a lot more complicated than the 3×3 square in tic-tac-toe. It has a broader grid to move across, more rules in general, and more pieces with their own rules. It’s hard enough for a human to learn Chess, let alone an AI agent. A while back, Google’s subsidiary DeepMind repurposed AlphaGo, a game-playing AI known for beating world-famous Go players, to play Chess instead, now calling it AlphaZero. In less than 4 hours of learning Chess from start to finish, AlphaZero was able to beat the world’s best Chess program, Stockfish 8, in a 100-game match-up.
Going virtual with Atari
Atari had more than just Pong and that terrible E.T. game, so it was a fun place for some researchers to start. DeepMind also used RL to play Atari a bit, training an agent to play Pong, Space Invaders, and more. In their research paper, they created a single agent that could play seven different games (which is still probably more than your out-of-touch parents can play), reliably, and able to surpass the skill of a human player. See it in action here.
Researchers from NVIDIA also explored the design and implementation of a CUDA port of the Atari Learning Environment (ALE), a system for developing and evaluating DRL algorithms using Atari games.
Doting with Dota 2
A few years ago, OpenAI created the OpenAI Five, a team of five AI agents designed to play the MOBA game, Dota 2 (Defense of the Ancients), using reinforcement learning, to play against human players. In fact, this project really supported the idea of learning over time; in the span of a few months, the agents when from just competing with high-ranking players to demolishing even the best players in the competition. I’m going to use that as my excuse for why I lost really, really badly when I (once) attempted Dota 2. Yep. AI.
Actually designing games with reinforcement learning
Reinforcement learning isn’t just used for creating OP AI – but it can be used to design games and game levels too. RL has been used for procedural content generation, which means randomly generated levels/scenarios/bosses/etc. For non-gamers, think of it as playing a level of Mario, and each new level is completely unique each time you play it. For those really out of the loop, it’s like driving to work but your office is different every day you go in. It keeps things exciting.
Hope for better AI
Here’s where I get really curious – Microsoft and Ninja Theory working together to make better AI. Project Paidia is a collaboration project between the two to use reinforcement learning to develop better gaming AI, such as for team-based games that require excellent collaboration. They give an example of the Ninja Theory game Bleeding Edge that requires players to work with AI to score points and achieve goals. It’s still early, but I’m hopeful that this will lead to better AI companions in gaming.
Taking over the galaxy in StarCraft II
Possibly the example that got the most news attention in recent memory, another AI from Google’s DeepMind – AlphaStar – was able to place in the top 0.15% of all game players in StarCraft II, an incredibly popular online game with thousands of players – all known for being quite devoted and passionate about becoming an expert at the game.
Hope for the future
When you combine reinforcement learning use cases and everything else above, there are a few common themes among the possibilities that reinforcement learning can bring to gaming. We’re on the cusp of better random level generation, final boss-tier AI in competition, and most importantly, better AI companions. Not everything will be improved by tomorrow, but with so many tech giants now involved in gaming, the future is bright for AI and gaming.
Learn more about reinforcement learning with Pieter Abbeel at ODSC West 2022.
There’s quite a lot that you can do with reinforcement learning – whether it’s related to these reinforcement learning use cases or not. The core skills can be used across a variety of purposes, from stock trading and finance to cybersecurity and art. Regardless of your application, there’s always a use for reinforcement learning. At ODSC West 2022, reinforcement learning expert and Professor at the University of California, Berkeley, Pieter Abbeel will present an in-depth tutorial on the fundamentals of Reinforcement Learning. This tutorial will cover the foundations of Deep Reinforcement Learning, including MDPs, DQN, Policy Gradients, TRPO, PPO, DDPG, SAC, TD3, model-based RL, as well as current research frontiers.
Session Outline:
Module 1: Introduction to Markov Decision Processes (MDPs) and Exact Solution Methods (which only apply to small problems)
Module 2: Deep Q Networks and Application to Atari
Module 3: Policy Gradients, Trust Region Policy Optimization (TRPO), Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradients (DDPG), Twin Delayed Deep Deterministic Policy Gradients (TD3), Soft Actor Critic (SAC) and Application to Robot Learning
Module 4: Model-based Reinforcement Learning
Module 5: Current Research Frontier
Looking for something a little sooner? Check out our upcoming webinar, “Foundations of Deep Reinforcement Learning” – an Interview with Pieter Abbeel, PhD on September 16th at 1PM EST