Convnets for image classification

We discuss convolutional network architectures for the learning task of image classification.

Early Convolutional Neural Network Architectures: AlexNet, NiN, VGG, and GoogLeNet

Early Convolutional Neural Network Architectures: AlexNet, NiN, VGG, and GoogLeNet @1.0.0

Abstract

This lecture explores the foundational architectures in the modern era of deep convolutional neural networks (CNNs) for image classification. We begin with AlexNet (2012), examining its pioneering use of deep layers, ReLU activations, and GPU acceleration. We then discuss Network-in-Network (NiN, 2013), which introduced 1x1 convolutions and Global Average Pooling to replace heavy fully-connected layers. We proceed to VGG (2014), which demonstrated the power of depth using modular, uniform 3x3 convolutions. Finally, we analyze GoogLeNet (Inception v1, 2014), which introduced parallel processing paths within the Inception module to increase depth and width without a corresponding explosion in computational cost.

Updated: 2026-01-19 19:14:21.111728

Deep Residual Learning for Image Recognition

Deep Residual Learning for Image Recognition @1.0.0

Abstract

Training substantially deeper neural networks is challenging due to the degradation problem, where accuracy saturates and then degrades rapidly as depth increases. This document explores the deep residual learning framework, which reformulates layers as learning residual functions with reference to the layer inputs. By utilizing identity shortcut connections, these residual networks (ResNets) are easier to optimize and can gain accuracy from considerably increased depth, achieving state-of-the-art results on ImageNet and COCO datasets.

Updated: 2026-01-21 07:37:23.868560

Densely Connected Convolutional Networks (DenseNet)

Densely Connected Convolutional Networks (DenseNet) @1.0.0

Abstract

This lecture explores the Densely Connected Convolutional Network (DenseNet), an architecture that builds upon the insights of residual learning to maximize information flow and feature reuse in deep neural networks. By connecting every layer to every subsequent layer within a dense block, DenseNet alleviates the vanishing gradient problem, strengthens feature propagation, and substantially reduces the number of parameters required compared to traditional architectures. We examine the motivation behind this connectivity pattern, formalize the architecture using concatenation rather than summation, and analyze its training behavior and efficiency on benchmarks like CIFAR and ImageNet.

Updated: 2026-01-21 07:40:40.590383

MobileNets: Architectural Efficiency as a First-Class Design Principle

MobileNets: Architectural Efficiency as a First-Class Design Principle @1.0.0

Abstract

This document presents the MobileNet architecture, a class of efficient convolutional neural networks designed for mobile and embedded vision applications. Unlike approaches that rely on compressing existing large models, MobileNets are built on a streamlined architecture that uses depthwise separable convolutions to construct lightweight deep neural networks. We analyze the mathematical formulation of depthwise separable convolutions, quantifying the reduction in computation and parameters compared to standard convolutions. Furthermore, we introduce two global hyperparameters—width multiplier and resolution multiplier—that allow model developers to explicitly trade off latency and accuracy, effectively treating efficiency as a parameterized design variable rather than an afterthought.

Updated: 2026-01-21 18:43:37.389294