Course Outline

CSCI 4052U — Machine Learning II

Course Description

This course builds on the foundations of machine learning to explore modern deep learning architectures for computer vision, natural language processing, and generative modelling. Students progress from basic neural network building blocks through increasingly sophisticated architectures, culminating in state-of-the-art generative AI systems.

Unit 1: Preliminaries

Review of core deep learning building blocks that serve as the foundation for all subsequent units.

  • Linear Networks: Linear regression as a neural network, regularisation techniques
  • Linear Classification: Softmax classifiers, cross-entropy loss
  • Multilayer Perceptrons: Hidden layers, activation functions, universal approximation
  • Convolutional Networks: Convolution operations, pooling, feature maps

Unit 2: ConvNets for Image Classification

A survey of landmark convolutional neural network architectures that defined the modern era of computer vision.

  • Early ConvNets: AlexNet, VGGNet, Network-in-Network, GoogLeNet/Inception
  • ResNet: Residual connections, skip connections, deep network training
  • DenseNet: Dense connectivity patterns, feature reuse
  • MobileNet: Depthwise separable convolutions, efficient architectures for deployment

Unit 3: ConvNets for Object Detection

Extending image classification to the more challenging task of localising and classifying multiple objects within an image.

  • The R-CNN Family: R-CNN, Fast R-CNN, Faster R-CNN — from selective search to learned region proposals
  • SSD (Single Shot MultiBox Detector): Multi-scale feature maps, default boxes, single-pass detection
  • YOLO (v1 and v2): Real-time object detection, grid-based prediction, architectural improvements

Unit 4: Language Modeling

Transitioning from vision to sequence modelling, covering the architectures that underpin modern NLP.

  • Attention and Transformers: Self-attention mechanism, multi-head attention, positional encodings, the Transformer architecture
  • Landmark Architectures: GPT (autoregressive), BERT (masked language modelling), ViT (vision transformers), T5 (text-to-text)

Unit 5: Generative AI

The probabilistic and architectural foundations of modern generative models, progressing from theory to state-of-the-art systems.

  • Probabilistic Models: Probability distributions, discriminative vs. generative models, explicit vs. implicit density, maximum likelihood, KL divergence, latent variable models
  • Variational Autoencoders: Autoencoders, latent variable models, the ELBO, reparameterisation trick, generation and disentanglement
  • Diffusion Models: Forward (noising) and reverse (denoising) processes, diffusion kernels, ELBO simplification to noise prediction, U-Net architecture, time embeddings, DDIMs, cascaded generation, conditional guidance
  • CLIP (Contrastive Language-Image Pre-training): Dual-encoder architecture, contrastive training objective, zero-shot classification, text conditioning for diffusion models