What is reinforcement learning?
We look at a method of AI development built on the idea of positive and negative feedback
One of the most compelling areas of artificial intelligence is reinforcement learning. It's a subset of machine learning (ML) that's become a popular technology to test on games, such as abstract board game Go, but its advancement could have wider benefits for the world.
It's a strand of AI that aims to replicate human-like abilities in machines and, in terms of games, it has surpassed that challenge. It has taken on and beaten a number of world champions in their respective field.
The most recent example is Ke Jie. The Chinese player was a three-time world champion at Go, dominating the 'sport' from 2014 onwards. But in 2017 he was beaten three times by Google's DeepMind, which used reinforcement learning to thwart him.
The year before, DeepMind's AlphaGo lost to 18-time Go champion Lee Se-dol, who remains the only human to beat the machine. He announced his retirement in 2019, citing the dominance of AI, stating that it "cannot be defeated".
While gaming is the best use case to date, this subset of machine learning has huge potential for developing technologies such as robotics and automation. Any breakthroughs in reinforcement learning could have a significant impact on business and wider society.
What is RL?
Reinforcement learning is a method of training machine learning algorithms to find their own way of reaching complex end goals instead of making choices based on a preloaded list of possible decisions set by a programmer. Using positive and negative reinforcement, correct decisions made towards achieving a goal are rewarded while incorrect decisions are penalised. While in the case of a human, a reward may symbolise a treat of some kind, in the case of machine learning the reward is simply a positive evaluation of an action.
It differs from supervised learning which has limitations. It involves, as mentioned previously, giving a machine learning algorithm a set of decisions to choose from. Using the game of Go and an example, someone training the algorithm could give it a list of moves to make in a given scenario which the program could then choose from. The problem with this model is that the algorithm then becomes only as good as the human programming it, which means the machine cannot learn by itself.
The goal of reinforcement learning is to train the algorithm to make sequential decisions to reach an end goal and over time, the algorithm will learn how to make decisions that reach the goal in the most efficient way using reinforcement. When trained using reinforcement learning, artificial intelligence systems can draw experiences from many more decision trees than humans which makes them better at solving complex tasks, at least in gamified environments.
Learning to win
Reinforcement learning shares many similarities with supervised learning in a classroom. A framework establishing the ground rules is still required, but the software agent is never told what instructions it should follow, nor is it given a database from which to draw upon. This type of approach allows a system to create its own dataset from its actions, built using trial and error, to establish the most efficient route to a reward.
This is all done sequentially - a software agent will take one action at a time until it encounters a state for which it is penalised. For example, a virtual car leaving a road or track will produce an error state, and revert the problem back to its starting position.For many processes, we don't actually need the system to learn to make new decisions as it develops, rather just refine its data processing capabilities, as is the case with facial recognition technology. However, for some, reinforcement learning is by far the most beneficial form of development.
One of the most famous examples is the case of Google's DeepMind, which uses a Deep Q-Learning algorithm. This was created to master Atari Breakout, the classic 70s arcade game, in which players are required to smash through eight rows of blocks with a ball and paddle. During its development, the software agent was only provided with the information that appeared on screen and was tasked with simply maximising its score.
As you might expect, the agent struggled to get to grips with the game early on. Researchers found it was unable to grasp the controls and consistently missed the ball with the paddle. After a great deal of trial and error, the agent eventually figured out that if it angled the ball so that it became stuck between the highest layer and the top wall, it could break down the majority of the wall with only a small number of paddle hits. Not only that, it was able to understand that each time the ball travelled back to the paddle, the efficiency of the run dropped, and the length of the game increased.
The agent was basing its decisions off a policy network. Every action taken by the agent was recorded by the network, which also notes the result and what could be done differently to change that result. The result, also known as a state, can, therefore, be predicted by the agent.
Embedding AI-powered analytics into your application
How leveraging AI to power analytics in your software platform gives you a competitive advantageDownload now
Problems with reinforcement learning
The example above is useful for understanding the fundamental principles of reinforcement learning, but gaming environments, no matter how large, only offer limited scope for learning and rarely offer anything meaningful beyond simple testing.
However, success is not always easily translated into real-world use cases, particularly as it relies on a system of reward and failure states that are often ambiguous in reality. Tasking an agent with solving a particular challenge within tight parameters is one thing, but creating a realistic simulation that's applicable for everyday use is far harder.
If we take the example of an autonomous vehicle system, creating a simulation for it to learn from can be incredibly challenging. Not only does the simulation need to accurately represent a real-world road, and convey the various laws and restrictions that govern car use, but it also needs to take into account consistent changes in traffic volume, the sudden actions of other human drivers, and random obstacles.
There are also a variety of technical challenges that limit the potential of this type of learning. There are examples of systems 'forgetting' older actions, results and predictions when new knowledge is acquired. There's also a problem with agents successfully achieving a desired positive state, but doing so in an inefficient or undesired way. For example, in the recent work by Deepsense.ai, which sought to teach an algorithm to run, it found that the agent developed a tendency to jump instead as it arrived at its future positive state far more quickly.
We are still some way off a machine learning like a human and reinforcement learning is not an easy technology to implement. But, with time it could be the driving force of the future.
The future of machine learning?
Gaming environments, no matter how large they are, offer a limited scale for machine learning and are really only useful for testing. In the real world, there is a range of applications that RL could potentially revolutionise, but it would require agents to learn far more complicated environments. So, while it could accelerate automated software for robotics and factory machines, web system configurations, or even in medical diagnosis, it might be some time before any real progress is made.
We are still some way off a machine being able to learn like a human, and reinforcement learning is not an easy technology to implement. But, with time it could be the driving force of future technology.