A neural network is a computational model loosely inspired by the structure of biological brains, consisting of layers of interconnected nodes (neurons) that process and transform data. Each neuron computes a weighted sum of its inputs, applies a non-linear activation function, and passes the result to the next layer. Neural networks are the foundation of modern AI and are capable of learning highly complex patterns in images, text, audio, and tabular data.
Activation at layer l = sigma( W[l] * activation[l-1] + bias[l] )
LaTeX: a^{(l)} = \sigma\!\left(W^{(l)} a^{(l-1)} + b^{(l)}\right)
| Symbol | Meaning | Unit |
|---|---|---|
| a^{(l)} | Activation vector at layer l | dimensionless |
| \sigma | Non-linear activation function (e.g., ReLU, sigmoid) | dimensionless |
| W^{(l)} | Weight matrix at layer l | dimensionless |
| a^{(l-1)} | Activation vector from previous layer | dimensionless |
| b^{(l)} | Bias vector at layer l | dimensionless |
| Function | Formula | Output Range | Typical Use |
|---|---|---|---|
| Sigmoid | 1 / (1 + e^{-x}) | (0, 1) | Binary classification output |
| Tanh | (e^x − e^{−x}) / (e^x + e^{−x}) | (−1, 1) | Hidden layers (older nets) |
| ReLU | max(0, x) | [0, ∞) | Default hidden layer activation |
| Leaky ReLU | max(0.01x, x) | (−∞, ∞) | Avoids dying ReLU problem |
| Softmax | e^{xi} / Σe^{xj} | (0, 1) summing to 1 | Multi-class output layer |
TensorFlow Playground
Interactive browser-based neural network visualizer for experimentation
Open ToolKhan Academy — Neural Networks
Introductory lessons on neural networks and pattern recognition
Open Tool3Blue1Brown — Neural Networks (YouTube)
Visually intuitive video series explaining how neural networks work
Open ToolWikimedia Commons, CC BY-SA
Deep learning is a subset of machine learning that uses neural networks with many hidden layers (hence "deep") to automatically extract hierarchical representations from raw data. Lower layers learn low-level features (edges, phonemes), while deeper layers combine them into increasingly abstract concepts (faces, words). Deep learning has revolutionized computer vision, natural language processing, and speech recognition, achieving human-level or superhuman performance on many benchmarks.
Backpropagation (backward propagation of errors) is the algorithm used to train neural networks by efficiently computing the gradient of the loss function with respect to every weight in the network. It applies the chain rule of calculus in a reverse pass through the network — from the output layer back to the input layer — so that each weight can be updated in the direction that reduces the loss. Without backpropagation, training deep neural networks with millions of parameters would be computationally infeasible.
Gradient descent is an iterative optimization algorithm that minimizes a function (such as a neural network's loss function) by repeatedly moving the parameters in the direction opposite to the gradient of the function at the current point. Because the gradient points toward the steepest ascent, subtracting it from the parameters moves the model toward a local (or global) minimum. Variants like Stochastic Gradient Descent (SGD) and Adam are the workhorses of modern deep learning training.
The term "neural network" was introduced by Warren McCulloch and Walter Pitts in their 1943 paper "A Logical Calculus of the Ideas Immanent in Nervous Activity." "Neural" derives from Greek "neuron" (nerve, sinew), coined in relation to biological neurons.