Lecture 2.1
Steerable Basis
Welcome to Module 2! This module focuses on Steerable Group Convolutions, a class of methods that represent a highly efficient and mathematically elegant approach to equivariant deep learning. In this first lecture, we introduce the foundational concept: the Steerable Basis.
Overview of Module 2
Throughout this module, we will focus primarily on the 2D case ($SO(2)$) to build intuition, before moving to the more complex 3D setting ($SO(3)$) in Module 3. The roadmap for this module is:
- 2.1 Steerable Basis: What are steerable kernels?
- 2.2 Regular G-Conv: How to use them in standard group convolutions.
- 2.3 & 2.4 Group Theory: Irreducible representations, Fourier transforms, and induced representations.
- 2.5 Steerable G-CNNs: The full framework.
- 2.6 General Design: Activation functions and architecture design.
- 2.7 Harmonic Networks: A deep dive into a classic paper.
1. Steerable Kernels and Basis Functions
The core idea of steerable methods is to avoid discretizing the group (e.g., trying to rotate an image by pixels) and instead work with a basis that transforms analytically.
A vector of basis functions $\mathbf{\Psi}(\mathbf{x}) = (\psi_1(\mathbf{x}), \dots, \psi_N(\mathbf{x}))^T$ is called steerable if a transformation of the domain (e.g., a rotation $\mathbf{R}$) can be exactly mimicked by a linear transformation of the basis functions themselves.
Formally, for any group element $g \in G$, there exists a matrix $\rho(g)$ such that:
$$ \mathbf{\Psi}(g^{-1} \cdot \mathbf{x}) = \rho(g) \mathbf{\Psi}(\mathbf{x}) $$Here, $\rho(g)$ is a representation of the group $G$. It tells us how the basis functions "mix" when rotated.
2. Circular Harmonics (The Fourier Basis on the Ring)
In the 2D setting, the group of interest is $SO(2)$ (rotations). The natural steerable basis for functions on the circle $S^1$ is the Circular Harmonics (or Fourier basis):
$$ Y_m(\alpha) = e^{i m \alpha} $$These functions are perfectly steerable. Rotating the domain by an angle $\theta$ corresponds to a simple phase shift in the codomain:
$$ Y_m(\alpha - \theta) = e^{i m (\alpha - \theta)} = e^{-i m \theta} \, Y_m(\alpha) $$Here, the "matrix" $\rho(g)$ is just the complex scalar $e^{-i m \theta}$. This is an Irreducible Representation (Irrep) of $SO(2)$. Ideally, we want to represent any angular function as a weighted sum of these harmonics:
$$ f(\alpha) = \sum_{m} w_m Y_m(\alpha) $$3. Polar Separability in $\mathbb{R}^2$
Images exist on $\mathbb{R}^2$, not just the circle. However, rotations in $\mathbb{R}^2$ only affect the angular coordinate, leaving the radius invariant. We can therefore use Polar Coordinates $(r, \alpha)$ and construct Separable Kernels:
$$ K(r, \alpha) = \phi(r) \cdot \psi(\alpha) $$- Radial Part $\phi(r)$: Only depends on distance from origin. Can be any function (e.g., Gaussian, B-Splines, or a learned MLP). It is invariant to rotations.
- Angular Part $\psi(\alpha)$: Only depends on angle. We expand this in our steerable Circular Harmonic basis.
This allows us to rotate the entire 2D kernel $K(\mathbf{x})$ simply by transforming the angular weights, which is computationally exact and efficient.
4. Real-Valued Representations
In practice, we often prefer to work with real numbers rather than complex ones. We can replace the complex basis $e^{i m \alpha}$ with pairs of real functions: $\cos(m\alpha)$ and $\sin(m\alpha)$.
The representation $\rho(\theta)$ then becomes a block-diagonal matrix of $2 \times 2$ rotation matrices:
$$ \rho_m(\theta) = \begin{pmatrix} \cos(m\theta) & -\sin(m\theta) \\ \sin(m\theta) & \cos(m\theta) \end{pmatrix} $$Each frequency $m$ forms an independent 2D subspace that rotates at a speed of $m$ times the input rotation. This block-diagonal structure is key to understanding steerable networks and harmonic networks.