Lecture 2.6
Activation Functions
Designing activation functions for Steerable CNNs requires care. We cannot simply apply element-wise ReLU to a vector field, because this would break the equivariance.
1. Why Element-wise ReLU Fails
Applying a non-linearity $\sigma$ to each component of a vector $\mathbf{v} = (v_1, v_2)$ destroys its geometric properties.
Example: Consider a vector $\mathbf{v} = (1, 0)$. Rotating 90° gives $\mathbf{v}' = (0, 1)$.
- Apply ReLU then Rotate: $\text{Rot}(\text{ReLU}(1,0)) = \text{Rot}(1,0) = (0,1)$.
- Rotate then Apply ReLU: $\text{ReLU}(\text{Rot}(1,0)) = \text{ReLU}(0,1) = (0,1)$. (Matches)
Counter-Example: Consider $\mathbf{v} = (1, -1)$. Rotating 90° gives $(1, 1)$.
- Apply ReLU then Rotate: $\text{Rot}(1, 0) = (0, 1)$.
- Rotate then Apply ReLU: $\text{ReLU}(1, 1) = (1, 1)$. Mismatch!
2. Permissible Nonlinearities
To maintain equivariance, the operation must commute with the group action. We have three main options:
A. Norm-Based Nonlinearities
Apply the nonlinearity only to the magnitude of the vector, preserving its direction:
$$ \mathbf{v}' = \sigma(\|\mathbf{v}\|) \frac{\mathbf{v}}{\|\mathbf{v}\|} $$Since the norm $\|\mathbf{v}\|$ is invariant to rotation, this operation is safe.
B. Gated Nonlinearities
This is the most common approach. We produce a separate Scalar Gate $s$ (Type-0 field) along with our vector field $\mathbf{v}$. We apply the sigmoid function to the gate and multiply specific vector channels:
$$ \mathbf{v}' = \sigma(s) \cdot \mathbf{v} $$Since $s$ transforms trivially ($\rho(g)=1$), the scaling factor $\sigma(s)$ is invariant to rotation, preserving the covariance of $\mathbf{v}$.
C. Inverse Fourier Nonlinearities
We can also (1) use the Inverse Fourier Transform to compute the signal on the group, (2) apply standard Point-wise ReLU, and (3) transform back with Fourier Transform. This is computationally more expensive but allows using standard ReLUs.