Computer ScienceAI & Machine LearningMedium

Transfer Learning

Also known as:Fine-TuningDomain AdaptationPre-trained Model Adaptation

Transfer learning is a machine learning technique where a model trained on one large task is adapted (fine-tuned) for a different but related task, leveraging previously learned representations instead of training from scratch. It dramatically reduces the data and computation required for new tasks by reusing features such as edges in vision models or syntactic patterns in language models. Transfer learning is foundational to modern AI, enabling pre-trained models like ResNet, BERT, and GPT to be fine-tuned for specialised applications with small datasets.

Transfer Learning Strategies and When to Use Them

StrategyLayers FrozenNew Data SizeSimilarity to Source TaskExample
Feature extractionAll (use as fixed encoder)SmallHighImageNet → medical X-ray
Fine-tuning (shallow)All but last 2–3 layersSmall–MediumMediumBERT → sentiment
Fine-tuning (deep)None or few early layersMedium–LargeLowGPT → code generation
Domain adaptationPartially frozenMediumDomain shift onlyEnglish → French NLP
Multi-task learningShared backboneVariableMultiple tasksObject detect + segment

Interactive Tools

TensorFlow Transfer Learning Tutorial

Open Tool

Hugging Face Fine-Tuning Guide

Open Tool

Kaggle Transfer Learning Course

Open Tool
Diagram illustrating transfer learning from a large pre-trained model to a smaller fine-tuned model

Wikimedia Commons, CC BY-SA

Related Terms

Computer Science

Convolutional Neural Network

A convolutional neural network (CNN) is a deep learning architecture designed for processing structured grid data such as images, using learnable convolutional filters that detect spatial features like edges, textures, and shapes. The network stacks convolutional layers (feature extraction) with pooling layers (spatial downsampling) and fully connected layers (classification). CNNs revolutionised computer vision after AlexNet won the ImageNet competition in 2012 with significantly lower error rates than prior methods.

Computer Science

Transformer (AI)

The Transformer is a deep learning architecture introduced by Vaswani et al. in 2017 that relies entirely on self-attention mechanisms rather than recurrence or convolutions to model relationships between all positions in a sequence in parallel. It consists of an encoder–decoder structure with multi-head attention, positional encodings, and feed-forward layers. Transformers are the foundation of modern large language models including BERT, GPT, T5, and PaLM, and have also been applied to vision, audio, and multimodal tasks.

Computer Science

Regularization (ML)

Regularization in machine learning refers to techniques that add a penalty term to the loss function to discourage model complexity, thereby reducing overfitting and improving generalisation to unseen data. The two most common forms are L1 (Lasso) regularization, which promotes sparsity by penalising the absolute values of weights, and L2 (Ridge) regularization, which penalises the squared values, shrinking all weights toward zero. Regularization is a fundamental concept in statistical learning theory, closely tied to the bias–variance trade-off.

The concept of "transfer" in learning was discussed by Lorien Pratt in 1993 in the context of neural networks, and later formalised by Bengio, Hinton, and others. "Transfer" derives from Latin transferre (to carry across); "learning" from Old English leornian.

transfer-learningfine-tuningpre-trained-modelsdeep-learningfew-shot-learning