Lecture 1.2
Group Theory I: Groups, Actions, and Representations
This lecture covers the minimal group theoretical prerequisites needed to understand Regular Group Convolutional Neural Networks. We look at groups as mathematical constructs representing transformations.
1. What is a Group?
Intuitively, a group is a set of transformations that we can combine (perform one after another) and un-do (invert). Formally:
A group is a set $G$ equipped with a binary operation $\cdot$ (group product) that satisfies four axioms:
- Closure: For all $g, h \in G$, $g \cdot h \in G$.
- Associativity: $(g \cdot h) \cdot k = g \cdot (h \cdot k)$.
- Identity: There exists an element $e \in G$ such that $e \cdot g = g \cdot e = g$.
- Inverse: For every $g \in G$, there exists $g^{-1} \in G$ such that $g \cdot g^{-1} = g^{-1} \cdot g = e$.
Example: The Translation Group $(\mathbb{R}^2, +)$
The set of 2D translation vectors forms a group.
Product: Adding two translation vectors $\mathbf{x} + \tilde{\mathbf{x}}$.
Identity: The zero vector $\mathbf{0}$.
Inverse: The negative vector $-\mathbf{x}$.
2. The Roto-Translation Group ($SE(2)$)
Things get interesting when we combine different types of transformations, like translations and rotations. The Special Euclidean group $SE(2)$ consists of pairs $(\mathbf{x}, \theta)$ where $\mathbf{x} \in \mathbb{R}^2$ is a translation and $\theta \in [0, 2\pi)$ is a rotation.
Notice that we cannot just add these elements as vectors. If you rotate-then-translate, the outcome is different than if you translate-then-rotate. The group product reflects the concatenation of transformations:
Here, the rotation $\mathbf{R}_{\theta_2}$ acts on the translation vector $\mathbf{x}_1$. This mixing of components makes $SE(2)$ a semi-direct product group: $SE(2) \cong \mathbb{R}^2 \rtimes SO(2)$.
Matrix Representation
Often, it is convenient to represent group elements as matrices. For $SE(2)$, we can use $3 \times 3$ homogeneous matrices:
$$ \mathbf{M}(g) = \begin{pmatrix} \cos\theta & -\sin\theta & x \\ \sin\theta & \cos\theta & y \\ 0 & 0 & 1 \end{pmatrix} $$Then the group product corresponds simply to matrix multiplication.
3. Affine Groups
Many groups we care about (translation, scale, rotation) fall under the umbrella of Affine Groups. These are groups $G = \mathbb{R}^d \rtimes H$ consisting of a translation part $\mathbb{R}^d$ and a linear transformation group $H$ (like rotations $SO(d)$ or scaling).
4. Group Actions
The group product tells us how group elements interact with each other. But how do they affect the world (e.g., vectors, images)? This is defined by a Group Action.
A group action $\odot$ of $G$ on a set $X$ is a map $G \times X \to X$ such that:
$$ g \cdot (h \cdot x) = (g \cdot h) \cdot x $$ $$ e \cdot x = x $$For example, $SO(2)$ acts on vectors in $\mathbb{R}^2$ by matrix-vector multiplication: $\mathbf{R}_\theta \odot \mathbf{x}$.
5. Representations
When the set $X$ is a vector space (like $\mathbb{R}^d$ or the space of images/functions $\mathbb{L}_2(\mathbb{R}^d)$) and the action is linear, we call it a Representation, denoted $\rho(g)$.
The Regular Representation
How does a group act on an image/function? If we have a function $f: \mathbb{R}^2 \to \mathbb{R}$ (an image), and we rotate the domain, the function transforms as:
Basically, to find the value of the transformed image at $x$, we look up the value of the original image at the coordinate where $x$ came from (the inverse transform $g^{-1} x$).
6. Equivariance Defined
Finally, we define the core concept of the course formally.
An operator $\Phi$ is equivariant if it commutes with the group action:
$$ \Phi(\mathcal{L}_g f) = \mathcal{L}'_g (\Phi(f)) $$Note that the representations $\mathcal{L}$ and $\mathcal{L}'$ can be different. For example, the input might rotate (standard representation), while the output (a feature map) might undergo a shift-twist transformation (induced representation).