Lecture 1.1

Introduction & Motivation

Welcome to the first lecture of the course on Group Equivariant Deep Learning (GEDL). This module is part of the "Deep Learning 2" course at the University of Amsterdam. In this series, we explore how to build neural networks that respect the geometric structure of data.

1. Geometric Guarantees & Invariance

In many applications, we want to build neural networks that have specific geometric guarantees. A prime example comes from histopathology, where the task is to classify a cell as either healthy or pathological. If we image a cell, its orientation is typically arbitrary. The physics/biology of the cell does not depend on whether the microscope was rotated 90 degrees or not. Therefore, if a network classifies a cell as "healthy," it should also classify a rotated version of that cell as "healthy."

Standard Convolutional Neural Networks (CNNs) do not guarantee this. They might correctly classify the upright image but fail on the rotated one. This lack of robustness is problematic for safety-critical applications.

Problem: Lack of Invariance

A standard CNN is not rotation invariant. $f(\text{image}) \neq f(\text{rotated image})$ generally. We want $f(g \odot \mathbf{x}) = f(\mathbf{x})$ for all geometric transformations $g$ (e.g., rotations).

2. Data Augmentation vs. Equivariance

A common engineering solution is data augmentation: training the network on many rotated copies of the data. While this encourages the network to learn invariance, it does not guarantee it. It forces the network to use its capacity to learn the same feature in multiple orientations (e.g., a vertical edge detector and a horizontal edge detector are learned separately), leading to redundancy.

In this course, we aim to bake geometric constraints into the architecture. This approach guarantees invariance (or equivariance) by design, improves data efficiency, and reduces parameter redundancy.

3. Equivariance & Weight Sharing

The core property we are interested in is equivariance. While invariance means the output doesn't change when the input transforms, equivariance means the output transforms predictably.

Definition: Equivariance

An operator $\Phi: \mathcal{X} \to \mathcal{Y}$ is equivariant to a group $G$ if for every transformation $g \in G$, transforming the input and then applying the operator is the same as applying the operator and then transforming the output:

$$ \Phi(g \odot x) = g \odot \Phi(x) $$

Here, $\odot$ denotes the action of the group on the respective spaces.

Why is this important?

  • No Information Loss: If features shift or rotate, we don't lose them; they just move to a new location in the feature map.
  • Weight Sharing: Just as standard convolutions share weights across translations (detecting a cat in the top-left is the same as in the bottom-right), group convolutions share weights across rotations and other transformations. This drastically improves parameter efficiency.

Redundancy in Representations

A fascinating analysis by Olah et al. (Distill) shows that standard CNNs learn "rotated copies" of the same features (e.g., edge detectors at various angles). Group CNNs eliminate this redundancy by learning a single "canonical" feature and generating the rotated versions via the group structure.

4. Beyond Scalar Problems: Physics & Chemistry

Equivariance is not just for efficiency; sometimes the problem demands it. Consider predicting forces in an N-body simulation or deformations in a molecule.

  • Vector/Tensor Fields: If you rotate an entire molecule, the force vectors acting on its atoms should rotate with it. A model predicting these forces must be rotation equivariant. The output is not a scalar (invariant) but a vector (geometric quantity).
  • Computational Chemistry: Predicting low-energy conformations requires understanding how atoms interact in 3D space, respecting the symmetries of Euclidean space (SE(3)).

5. Recognition by Components

The philosophy of Group CNNs aligns with the "Recognition by Components" theory in psychology. Humans recognize objects by identifying parts (geons) and their relative poses.

For example, a "face" consists of two eyes, a nose, and a mouth in a specific geometric configuration. A Group CNN can explicitely model these relationships. If the face rotates, the relative configuration of its parts (eyes relative to nose) remains the invariant, but the global pose changes. Group convolutions allow us to detect these parts and their poses explicitly.

6. Symmetry in Nature

Finally, we observe that symmetry is ubiquitous in nature (plants, crystals, animals). Evolution favors symmetric structures because they are efficient to encode using a compact set of rules (a genetic "generator"). Similarly, Group CNNs use symmetry to efficiently encode visual representations, allowing us to build powerful models with fewer parameters and less data.