Computer ScienceAI & Machine LearningMedium

Reinforcement Learning

Also known as:RLReward-Based Learning

Reinforcement learning (RL) is a machine learning paradigm in which an agent learns to make decisions by interacting with an environment, receiving numerical rewards for desirable actions and penalties for undesirable ones. The agent's goal is to learn a policy — a mapping from states to actions — that maximizes cumulative long-term reward. RL has achieved superhuman performance in games like Chess, Go, and Atari, and is used in robotics, recommendation systems, and large language model alignment.

Key Formula

Q(s,a) = Q(s,a) + alpha * [reward + gamma * max Q(s',a') - Q(s,a)]

LaTeX: Q(s,a) \leftarrow Q(s,a) + \alpha\left[r + \gamma \max_{a'} Q(s',a') - Q(s,a)\right]

SymbolMeaningUnit
Q(s,a)Expected cumulative reward for state s and action adimensionless
\alphaLearning ratedimensionless (0 to 1)
rImmediate reward receiveddimensionless
\gammaDiscount factor (weight of future rewards)dimensionless (0 to 1)
Q(s',a')Q-value of the next state-action pairdimensionless

Reinforcement Learning Algorithms Comparison

AlgorithmTypeKey FeatureNotable Application
Q-LearningModel-free, value-basedTabular Q-tableSimple grid worlds
DQNDeep model-freeNeural network Q-functionAtari games (DeepMind)
Policy Gradient (REINFORCE)Policy-basedDirect policy optimizationRobotics control
PPOActor-criticStable, scalable updatesChatGPT RLHF alignment
AlphaZeroModel-based + MCTSSelf-play tree searchChess, Go, Shogi

Interactive Tools

OpenAI Gym / Gymnasium

Standard Python toolkit for developing and comparing RL algorithms

Open Tool

Spinning Up in Deep RL (OpenAI)

Educational resource for learning deep reinforcement learning

Open Tool

Brilliant.org — Machine Learning

Interactive introduction to RL concepts including rewards and policies

Open Tool
Diagram showing the agent-environment interaction loop in reinforcement learning

Wikimedia Commons, CC BY-SA

Related Terms

"Reinforcement" derives from Latin "reinforcere" (to strengthen again). The term reinforcement learning was established in the AI literature through the work of Richard Sutton and Andrew Barto, whose 1998 textbook "Reinforcement Learning: An Introduction" became the definitive reference.

reinforcement-learningagentrewardpolicydeep-rl