Lecture 2.5
Steerable Group Convolutions
We synthesize the previous concepts to define the Steerable Group Convolution, the general equivariant linear operator between feature fields. We derive the constraint that any such operator must satisfy.
1. The Equivariance Constraint
We seek a linear operator $\mathcal{K}$ that maps an input feature field $f_{in}$ (of type $\rho_{in}$) to an output field $f_{out}$ (of type $\rho_{out}$) such that it commutes with the group action. It turns out this operator is always a convolution with a kernel $K: \mathbb{R}^d \to \mathbb{R}^{d_{out} \times d_{in}}$.
For the convolution to be equivariant, the kernel $K(\mathbf{x})$ must satisfy:
$$ K(g\mathbf{x}) = \rho_{out}(g) K(\mathbf{x}) \rho_{in}(g)^{-1} $$This equation says: If you rotate the spatial domain ($g\mathbf{x}$), the kernel matrix at that new location must look like the original kernel matrix ($K(\mathbf{x})$), but with its input and output indices rotated by $\rho_{in}$ and $\rho_{out}$ respectively.
2. Solving the Constraint
This constraint restricts the space of possible kernels. For example:
- Scalar to Scalar ($\rho=1$): $K(g\mathbf{x}) = K(\mathbf{x})$. The kernel must be rotationally invariant (isotropic), i.e., a function only of radius $r$.
- Vector to Vector: The kernel must have angular dependence that perfectly matches the rotation of the vectors. This is solved by expanding the kernel in a basis of Harmonic Functions combined with Clebsch-Gordan coefficients.
3. Practical Implementation
Fortunately, you rarely need to solve this constraint yourself. Modern libraries like e2cnn automates this:
- You specify the Input Type (e.g., `FieldType(r2_act, [1]*3)` for 3 RGB channels).
- You specify the Output Type (e.g., `FieldType(r2_act, [vector] * 16)` for 16 vector channels).
- The library automatically constructs the basis for $K$ that satisfies the constraint for these specific types.