Computer ScienceAI & Machine LearningMedium

Reinforcement Learning

Also known as:RLReward-Based Learning

Reinforcement learning (RL) is a machine learning paradigm in which an agent learns to make decisions by interacting with an environment, receiving numerical rewards for desirable actions and penalties for undesirable ones. The agent's goal is to learn a policy — a mapping from states to actions — that maximizes cumulative long-term reward. RL has achieved superhuman performance in games like Chess, Go, and Atari, and is used in robotics, recommendation systems, and large language model alignment.

Key Formula

Q(s,a) = Q(s,a) + alpha * [reward + gamma * max Q(s',a') - Q(s,a)]

LaTeX: Q(s,a) \leftarrow Q(s,a) + \alpha\left[r + \gamma \max_{a'} Q(s',a') - Q(s,a)\right]

Symbol	Meaning	Unit
Q(s,a)	Expected cumulative reward for state s and action a	dimensionless
\alpha	Learning rate	dimensionless (0 to 1)
r	Immediate reward received	dimensionless
\gamma	Discount factor (weight of future rewards)	dimensionless (0 to 1)
Q(s',a')	Q-value of the next state-action pair	dimensionless

Reinforcement Learning Algorithms Comparison

Algorithm	Type	Key Feature	Notable Application
Q-Learning	Model-free, value-based	Tabular Q-table	Simple grid worlds
DQN	Deep model-free	Neural network Q-function	Atari games (DeepMind)
Policy Gradient (REINFORCE)	Policy-based	Direct policy optimization	Robotics control
PPO	Actor-critic	Stable, scalable updates	ChatGPT RLHF alignment
AlphaZero	Model-based + MCTS	Self-play tree search	Chess, Go, Shogi

Interactive Tools

OpenAI Gym / Gymnasium

Standard Python toolkit for developing and comparing RL algorithms

Open Tool

Spinning Up in Deep RL (OpenAI)

Educational resource for learning deep reinforcement learning

Open Tool

Brilliant.org — Machine Learning

Interactive introduction to RL concepts including rewards and policies

Open Tool

Wikimedia Commons, CC BY-SA

Related Terms

Computer Science

Machine Learning

Machine learning is a branch of artificial intelligence in which systems learn from data to improve their performance on tasks without being explicitly programmed for each task. It works by identifying statistical patterns in training data and using those patterns to make predictions or decisions on new, unseen data. Machine learning powers applications ranging from spam filters and recommendation engines to medical diagnosis and autonomous vehicles.

Computer Science

Neural Network

A neural network is a computational model loosely inspired by the structure of biological brains, consisting of layers of interconnected nodes (neurons) that process and transform data. Each neuron computes a weighted sum of its inputs, applies a non-linear activation function, and passes the result to the next layer. Neural networks are the foundation of modern AI and are capable of learning highly complex patterns in images, text, audio, and tabular data.

Computer Science

Deep Learning

Deep learning is a subset of machine learning that uses neural networks with many hidden layers (hence "deep") to automatically extract hierarchical representations from raw data. Lower layers learn low-level features (edges, phonemes), while deeper layers combine them into increasingly abstract concepts (faces, words). Deep learning has revolutionized computer vision, natural language processing, and speech recognition, achieving human-level or superhuman performance on many benchmarks.

"Reinforcement" derives from Latin "reinforcere" (to strengthen again). The term reinforcement learning was established in the AI literature through the work of Richard Sutton and Andrew Barto, whose 1998 textbook "Reinforcement Learning: An Introduction" became the definitive reference.

reinforcement-learningagentrewardpolicydeep-rl