Part II — Dataset audits (facts first, interpretation second)
II.1 Dataset A: FTLE heterogeneity
Dataset A — NL_Glambda_trend.csv)
A.1.1 What the dataset actually contains (fact check)
From your description and usage:
-
Grid over:
- width
- depth
- width
-
For each
: - 16 runs from gain
base learning-rate combinations
- 16 runs from gain
-
Recorded:
This is internally consistent.
II.1.1 Experimental grid
We evaluate the FTLE field
Parameters:
- Width
- Depth
- 16 runs per
from gain learning-rate combinations
II.1.2 Empirical facts
From the data:
- Width dependence
For fixed:
- Depth dependence at moderate/large width
For fixed:
- Narrow-width anomaly (
)
is non-monotone in (a “U-turn”). - Variability across hyperparameters is largest at
.
These are purely observational statements.
II.1.3 Immediate but safe interpretation
Because
means the FTLE field becomes more spatially uniform with respect to the grid measure. - At large width, increasing depth appears to homogenize the FTLE field.
- At small width, training dynamics are sensitive and geometry is unstable.
No claim is made yet about feature learning.
II.2 Dataset B: Kernel and representation alignment (KA / RA)
Dataset B — RA / KA grid (RA_KA_NL_dataset.csv)
A.2.1 What the dataset actually contains
- Same
grid - One
and one per point - These are alignments between initial and final geometry
This is important: RA/KA measure total rotation over training, not instantaneous dynamics.
II.2.1 Definitions (reminder)
Kernel alignment:
Representation alignment:
Both measure rotation away from initialization, not magnitude change.
II.2.2 Empirical facts
From the data:
- Width dependence
- Depth dependence
with the strongest effect at small
- Wide-but-deep regime
- Even at
, increasing depth produces nontrivial rotation:
II.2.3 Meaning (strictly limited)
quantify feature motion / geometry rotation. - They do not encode where in input space the changes occur.
- They do not imply localization or ridges.
II.3 Cross-dataset summary (no synthesis yet)
From Part II alone:
- Width suppresses both:
- feature motion (RA/KA
), - FTLE heterogeneity (
).
- feature motion (RA/KA
- Depth:
- promotes feature motion (RA/KA
), - but often suppresses FTLE heterogeneity at large width.
- promotes feature motion (RA/KA
This establishes a tension, not a contradiction:
- feature rotation and FTLE heterogeneity are not the same axis.
This tension motivates the theory development in later parts.
Status after Part II (progress bar)
- ✔ Definitions fixed and invariant properties established (Part I)
- ✔ Empirical trends recorded cleanly (Part II)
- ⏳ Theory explaining why these axes decouple (next)