Part I — Definitions & invariances
Part I — Definitions & invariances (locked-in foundations)
I.1 FTLE field and heterogeneity metric
We define the FTLE field (a scalar over input space):
We measure spatial heterogeneity of this field using:
Here
- grid-uniform over a 2D evaluation grid, or
- data-uniform over sampled datapoints
, or - boundary-band (conditioned near the decision boundary).
This choice matters because global variance can be dominated by regions with larger area weight (e.g., far outside the boundary).
Two technical clarifications that matter later:
- What is
?
In practice, you computeon a grid in , not on i.i.d. samples from the training distribution. That’s okay, but you must be explicit:
- either
is sampled from the data distribution , - or
is sampled from a reference measure on the plane (e.g., uniform over the evaluation square), - or
is the empirical uniform measure over grid points.
Note:
I.2 Kernel / representation rotation metrics (KA / RA)
Let
Interpretation:
means the NTK barely rotates (lazy / kernel-like). means the representation geometry barely rotates. - Smaller values indicate stronger feature motion / geometry rotation.
Range: for PSD Gram/kernel matrices,
I.3 Scale invariance (magnitude vs geometry)
KA/RA are cosine similarities, hence invariant to global scaling:
So:
- pure magnitude rescaling of
or does not change or ; - changes in
imply a change in direction (geometry), not just scale.
I.4 Gain: post-hoc scaling vs training dynamics
Key distinction:
- Post-hoc scaling (rescale a trained network without retraining):
If scaling multiplies the relevant Jacobian by a constant factoruniformly in , then:
a constant shift in
- Gain as a training hyperparameter:
Gain can change optimization trajectory (effective gradient scale, feature evolution), soand can rotate differently. Therefore gain can affect and also affect indirectly through training-changed geometry.
This requires the scaling to produce a uniform multiplicative factor in the Jacobian across
I.5 What is “locked in” after Part I
measure rotation / feature motion, not magnitude. measures spatial heterogeneity of FTLE over the chosen measure . - Gain effects must be split into:
- pure rescaling (does not change RA/KA; shifts
by constant under uniform Jacobian scaling), - vs training-dynamics change (can change geometry and thus all metrics).
- pure rescaling (does not change RA/KA; shifts
Next reading: Part II — Dataset audits (facts first, interpretation second)