Course Outline
CSCI 4052U — Machine Learning II
Course Description
This course builds on the foundations of machine learning to explore modern deep learning architectures for computer vision, natural language processing, and generative modelling. Students progress from basic neural network building blocks through increasingly sophisticated architectures, culminating in state-of-the-art generative AI systems.
Unit 1: Preliminaries
Review of core deep learning building blocks that serve as the foundation for all subsequent units.
- Linear Networks: Linear regression as a neural network, regularisation techniques
- Linear Classification: Softmax classifiers, cross-entropy loss
- Multilayer Perceptrons: Hidden layers, activation functions, universal approximation
- Convolutional Networks: Convolution operations, pooling, feature maps
Unit 2: ConvNets for Image Classification
A survey of landmark convolutional neural network architectures that defined the modern era of computer vision.
- Early ConvNets: AlexNet, VGGNet, Network-in-Network, GoogLeNet/Inception
- ResNet: Residual connections, skip connections, deep network training
- DenseNet: Dense connectivity patterns, feature reuse
- MobileNet: Depthwise separable convolutions, efficient architectures for deployment
Unit 3: ConvNets for Object Detection
Extending image classification to the more challenging task of localising and classifying multiple objects within an image.
- The R-CNN Family: R-CNN, Fast R-CNN, Faster R-CNN — from selective search to learned region proposals
- SSD (Single Shot MultiBox Detector): Multi-scale feature maps, default boxes, single-pass detection
- YOLO (v1 and v2): Real-time object detection, grid-based prediction, architectural improvements
Unit 4: Language Modeling
Transitioning from vision to sequence modelling, covering the architectures that underpin modern NLP.
- Attention and Transformers: Self-attention mechanism, multi-head attention, positional encodings, the Transformer architecture
- Landmark Architectures: GPT (autoregressive), BERT (masked language modelling), ViT (vision transformers), T5 (text-to-text)
Unit 5: Generative AI
The probabilistic and architectural foundations of modern generative models, progressing from theory to state-of-the-art systems.
- Probabilistic Models: Probability distributions, discriminative vs. generative models, explicit vs. implicit density, maximum likelihood, KL divergence, latent variable models
- Variational Autoencoders: Autoencoders, latent variable models, the ELBO, reparameterisation trick, generation and disentanglement
- Diffusion Models: Forward (noising) and reverse (denoising) processes, diffusion kernels, ELBO simplification to noise prediction, U-Net architecture, time embeddings, DDIMs, cascaded generation, conditional guidance
- CLIP (Contrastive Language-Image Pre-training): Dual-encoder architecture, contrastive training objective, zero-shot classification, text conditioning for diffusion models