What is reinforcement learning?
We look at a method of AI development built on the idea of positive and negative feedback
The field of reinforcement learning has exploded in popularity recent years, and a number of compelling use cases have seen this strand of artificial intelligence not only replicate human-like abilities in machines, but even prove that software can beat world champions at their own games.
In 2017, Ke Jie, who had been the reigning world champion for abstract board game Go since 2014, was beaten three times by Google's DeepMind - which used reinforcement learning to outsmart the world's number one.
But, it's not just used for figuring out games. This subset of machine learning (ML) relies on limited human instruction, which has huge potential for the development of robotics and automation. It could be the application that comes to fully define artificial intelligence, as it really does the 'learning' part of ML.
What is RL?
Reinforcement learning is a method of training machine learning algorithms to find their own way of reaching complex end goals instead of making choices based on a preloaded list of possible decisions set by a programmer. Using positive and negative reinforcement, correct decisions made towards achieving a goal are rewarded while incorrect decisions are penalised. While in the case of a human, a reward may symbolise a treat of some kind, in the case of machine learning the reward is simply a positive evaluation of an action.
It differs from supervised learning which has limitations. It involves, as mentioned previously, giving a machine learning algorithm a set of decisions to choose from. Using the game of Go and an example, someone training the algorithm could give it a list of moves to make in a given scenario which the program could then choose from. The problem with this model is that the algorithm then becomes only as good as the human programming it, which means the machine cannot learn by itself.
The goal of reinforcement learning is to train the algorithm to make sequential decisions to reach an end goal and over time, the algorithm will learn how to make decisions that reach the goal in the most efficient way using reinforcement. When trained using reinforcement learning, artificial intelligence systems can draw experiences from many more decision trees than humans which makes them better at solving complex tasks, at least in gamified environments.
Learning to win
Reinforcement learning shares many similarities with supervised learning in a classroom. A framework establishing the ground rules is still required, but the software agent is never told what instructions it should follow, nor is it given a database from which to draw upon. This type of approach allows a system to create its own dataset from its actions, built using trial and error, to establish the most efficient route to a reward.
This is all done sequentially - a software agent will take one action at a time until it encounters a state for which it is penalised. For example, a virtual car leaving a road or track will produce an error state, and revert the problem back to its starting position.For many processes, we don't actually need the system to learn to make new decisions as it develops, rather just refine its data processing capabilities, as is the case with facial recognition technology. However, for some, reinforcement learning is by far the most beneficial form of development.
One of the most famous examples is the case of Google's DeepMind, which uses a Deep Q-Learning algorithm. This was created to master Atari Breakout, the classic 70s arcade game, in which players are required to smash through eight rows of blocks with a ball and paddle. During its development, the software agent was only provided with the information that appeared on screen and was tasked with simply maximising its score.
As you might expect, the agent struggled to get to grips with the game early on. Researchers found it was unable to grasp the controls and consistently missed the ball with the paddle. After a great deal of trial and error, the agent eventually figured out that if it angled the ball so that it became stuck between the highest layer and the top wall, it could break down the majority of the wall with only a small number of paddle hits. Not only that, it was able to understand that each time the ball travelled back to the paddle, the efficiency of the run dropped, and the length of the game increased.
The agent was basing its decisions off a policy network. Every action taken by the agent was recorded by the network, which also notes the result and what could be done differently to change that result. The result, also known as a state, can, therefore, be predicted by the agent.
Problems with reinforcement learning
The example above is useful for understanding the fundamental principles of reinforcement learning, but gaming environments, no matter how large, only offer limited scope for learning and rarely offer anything meaningful beyond simple testing.
However, success is not always easily translated into real-world use cases, particularly as it relies on a system of reward and failure states that are often ambiguous in reality. Tasking an agent with solving a particular challenge within tight parameters is one thing, but creating a realistic simulation that's applicable for everyday use is far harder.
If we take the example of an autonomous vehicle system, creating a simulation for it to learn from can be incredibly challenging. Not only does the simulation need to accurately represent a real-world road, and convey the various laws and restrictions that govern car use, but it also needs to take into account consistent changes in traffic volume, the sudden actions of other human drivers, and random obstacles.
There are also a variety of technical challenges that limit the potential of this type of learning. There are examples of systems 'forgetting' older actions, results and predictions when new knowledge is acquired. There's also a problem with agents successfully achieving a desired positive state, but doing so in an inefficient or undesired way. For example, in the recent work by Deepsense.ai, which sought to teach an algorithm to run, it found that the agent developed a tendency to jump instead as it arrived at its future positive state far more quickly.
We are still some way off a machine learning like a human and reinforcement learning is not an easy technology to implement. But, with time it could be the driving force of the future.
The future of machine learning?
Gaming environments, no matter how large they are, offer a limited scale for machine learning and are really only useful for testing. In the real world, there is a range of applications that RL could potentially revolutionise, but it would require agents to learn far more complicated environments. So, while it could accelerate automated software for robotics and factory machines, web system configurations, or even in medical diagnosis, it might be some time before any real progress is made.
We are still some way off a machine being able to learn like a human, and reinforcement learning is not an easy technology to implement. But, with time it could be the driving force of future technology.
What you need to know about migrating to SAP S/4HANA
Factors to assess how and when to begin migrationDownload now
Your enterprise cloud solutions guide
Infrastructure designed to meet your company's IT needs for next-generation cloud applicationsDownload now
Testing for compliance just became easier
How you can use technology to ensure compliance in your organisationDownload now
Best practices for implementing security awareness training
How to develop a security awareness programme that will actually change behaviourDownload now