Reinforcement Learning: Optimising Data-Driven Decision-making by Rewarding Positive Outcomes

AI Dec 14, 2021

You may have heard of the term reinforcement learning (RL) in different contexts such as gaming and robotics. However, many people are not entirely clear on what RL is and the types of challenges for which it presents a great solution.

Let’s start with understanding what reinforcement learning is using a simple non-technical and real-world example: training a dog to learn new behaviours.

By presenting the dog with a treat at around the same time you create a sound (using a whistle or a clicker), gradually the dog will become conditioned to learn that the sound indicates a treat is about to be served. This is essentially “reinforcing” your dog to learn an association between a stimulus, response and reward. Over time, the dog will become accustomed to this sound and responds every time it hears it. With this technique, you can train your dog to learn certain behaviours and be rewarded for them.

Now let’s apply this “treat-and-behave” model to the context of reinforcement learning. In this scenario, the dog becomes the agent, the treat becomes the reward system, and good behavior is the ‘optimal solution’ or outcome we are seeking. The term “agent” and “reward system” might be strange to you, but we will get to them later.

What is Reinforcement Learning (RL)?

Reinforcement Learning is a branch of machine learning based on rewarding desired behaviors and punishing undesired ones. Referring to our example, one would reward the dog’s good behavior by giving him/her a treat and ‘punishing’ undesired behavior by not offering a reward. In this example, the dog is the agent, an entity who makes decisions about how to act according to rewards and punishments that are received. To put this in another way, both the start and end states are known, but there may be multiple ways for the agent to act to achieve its goal. Reinforcement learning algorithms try to find the most optimal route from this starting state to some desired outcome. To do so, a RL agent must be able to perceive and interpret its environment, take actions and learn through trial and error. By identifying which actions progress or prevent it from achieving its goal through the reinforcement of the reward system, the agent can more frequently engage in those actions that advance it towards the desired outcome.

The key to this machine learning method is the appropriate rewarding system. This method would assign positive values to the desired actions to encourage the agent and negative values to undesired ones. This technique would force the agent to seek long-term and maximum overall reward to achieve the optimal solution.

To establish the rewarding system, a q-table is often used, which lists out all the possible actions an agent could take and assign a reward to each move. Take the game flappy bird as an example. To list out all the possible moves the agent could take for this game is simple: click the mouse or don’t click. Clicking makes the bird fly up, not clicking lets the bird drop. Feeding the agent about the height of each pillar then would successfully establish the rewarding system, enabling a RL system to quickly learn how to play this game.

A breakthrough in RL

One of the most significant accomplishments of RL is AlphaGo by DeepMind. Go is a board game originated in China over 3000 years ago, which requires multiple layers of strategies. Two players, using either white or black stones, take turns placing their stones on a board. The goal is to surround and capture their opponent's stones or strategically create spaces of territory. Once all possible moves have been played, both the stones on the board and the empty points are tallied. The highest number wins. AlphaGo is a computer program that combines advanced search trees with deep neural networks. These neural networks take a description of the Go board as an input and process it through a number of different network layers containing millions of neuron-like connections. What made AlphaGo so famous was a match in October 2015, against the reigning three-time European Champion Fan Hui. AlphaGo won the first ever game against a Go professional with a score of 5-0.

Until this event, no AI had ever beaten a high-ranked human player on a standard board game, let alone a game as complex as Go. But AlphaGo demonstrates the power of RL even in the most challenging of domains. The success of AlphaGo pushed the limits of AI capability and inspired new interest in the potential of RL.

But the applications of RL are not limited to 3000-year-old board games and there are many real-life use-cases and benefits that can be derived through RL.

Reinforcement Learning in Healthcare

Reinforcement learning presents the opportunity to transform healthcare by providing treatments based on learnings from RL systems, which have been able to find optimal policies using knowledge of previous patients, treatments and outcomes. Given the adaptive nature of reinforcement learning, this technique could offer many advantages over many control-based systems, especially in chronic disease care.

In this scenario, the input is a set of clinical observations and assessments of a patient. The outputs are the treatment options for every stage of the patient’s condition. A significant advantage of RL is its ability to determine time-dependent decisions for the best treatment for a patient at a specific time, meaning that patients with chronic diseases could have less stress and more confidence with the assistance of RL systems. For example, diabetes patients could use RL to keep their blood sugar level monitored throughout the day, and give out proper suggestions on diet, exercise and insulin intake.


Reinforcement Learning is an exciting domain of AI that promises to unlock many opportunities in the future, such as gaming, robotics, healthcare and more. Hopefully this brief insight into the power and application of RL will help inspire others to consider its transformative potential.

If you wish to learn more about reinforcement learning or another area or application of artificial intelligence, please connect with the team at Brainpool by sending us a message at [email protected]

Written by: Clayton Black and Alex Shen

Brainpool AI

Brainpool is an artificial intelligence consultancy specialising in developing bespoke AI solutions for business.