Transfer learning
Reuse pretrained torchvision models on your own dataset to reach strong accuracy with very little data and very little training time.
Goal of the lesson
By the end of this 3-hour session you should be able to:
- explain why pretrained features generalize across vision tasks,
- distinguish feature extraction from fine-tuning,
- load a pretrained
torchvisionmodel and inspect its parts, - replace the classifier head and freeze the backbone,
- apply the model’s own preprocessing transforms,
- train a classifier on a small dataset and reach >90% accuracy in minutes,
- compare two architectures and report on the winner.
Suggested timing
| Block | Topic |
|---|---|
| 15 min | What transfer learning is and why it works |
| 25 min | Load a pretrained model, inspect its layers |
| 25 min | Replace head, freeze backbone, train |
| 30 min | Evaluate, predict, save |
| 25 min | Fine-tuning the backbone |
| 60 min | Capstone — pretrained model on your own dataset |
Why transfer learning
Training a CNN from scratch needs a lot of data and a lot of compute. The first few layers of any vision network learn very generic features — edges, corners, textures, colors — that are useful for almost any image task. The deeper layers combine those into more task-specific patterns.
Transfer learning is the trick that makes deep learning practical for small projects:
- take a model pretrained on a large dataset (typically ImageNet, ~1.3M images, 1000 classes),
- keep most of its weights,
- retrain only the final classifier on your data.
Two flavors:
| Approach | Trainable parameters | When to use |
|---|---|---|
| Feature extraction | Only the new classifier head | Small dataset, similar domain. Default starting point. |
| Fine-tuning | The new head + (some of) the backbone, with a tiny learning rate | Larger dataset, or domain that drifts noticeably from ImageNet (medical, satellite, drawings). |
We’ll start with feature extraction and add fine-tuning at the end.
Setup
uv init --python 3.12 transfer
cd transfer
uv add torch torchvision matplotlib pillow requests
=
We’ll reuse the pizza/steak/sushi dataset from the datasets chapter:
=
= /
=
= /
= /
= / Load a pretrained model
torchvision.models exposes architectures plus their pretrained weights. Each set of weights advertises:
- the model itself,
- the transforms it expects at inference time,
- metadata (number of parameters, ImageNet accuracy).
=
= You're reading a preview.
Sign in to read the full article. Any account opens 10 free articles a month; students and teachers read their course pages without limit.
Sign in