Gain, Geometry, and Rich vs Lazy Dynamics
(Math Notes for Phase 2 / Phase 3)
1. What “geometry” means in our theory
In this project, geometry does not mean Euclidean distance in input space.
It means relative structure of sensitivity induced by the network.
Formally, geometry lives in:
- ratios of singular values of the Jacobian
- directional imbalance (anisotropy)
- spatial variation of sensitivity over the data manifold
Geometry is about shape, not magnitude.
2. Jacobian as the local structure of the network
Let the network representation at depth
The input–representation Jacobian is
For a small perturbation
The Jacobian fully characterizes the network’s local behavior.
3. Singular values = directional stretching
Take the singular value decomposition:
- Each right singular vector
is an orthogonal input direction - Each singular value
is how much that direction is stretched
This defines the local geometry induced by the network.
4. Isotropy vs anisotropy
Isotropic Jacobian
- All directions treated equally
- Local geometry is spherical
- No preferred features
- Kernel-like / lazy behavior
Anisotropic Jacobian
- Certain directions dominate
- Geometry is elongated / folded
- Selective, data-aligned feature learning
- Rich behavior
5. Anisotropy scalar
We summarize spectral shape (not scale) using an anisotropy scalar:
Dataset-averaged anisotropy:
Interpretation:
: no dominant directions (flat geometry) : strong directional dominance (structured geometry)
Anisotropy measures geometric engagement, not task correctness.
6. FTLE: accumulated sensitivity
Finite-Time Lyapunov Exponent (FTLE):
FTLE measures accumulated stretching across depth or iterations.
- Flat FTLE field → geometry-free dynamics
- Structured FTLE field → ridges, valleys, spatial variation
7. : geometry visibility, not geometry itself
We quantify FTLE field heterogeneity by:
Key distinction:
- Anisotropy → local, directional structure
→ global, spatial variation of accumulated sensitivity
They are related but not equivalent:
but not vice versa.
8. What the gain parameter is
Each layer’s weights are scaled by a gain
Thus each Jacobian scales as:
Across depth
Gain rescales magnitude, not geometry.
9. What gain does not change
Because gain multiplies all singular values equally:
- anisotropy ratios are unchanged
- preferred directions are unchanged
- geometry is unchanged
Gain cannot create anisotropy.
10. Visual intuition (precise)
Lazy regime (isotropic)
Before gain:
- FTLE field ≈ flat plateau
After gain:
- FTLE field ≈ taller flat plateau
Shape unchanged.
Rich regime (anisotropic)
Before gain:
- FTLE field has ridges and valleys
After gain:
- ridges get taller
- valleys get deeper
- locations unchanged
Shape preserved, contrast amplified.
11. Why gain does NOT define Rich vs Lazy
Rich vs Lazy is determined by scaling laws:
- parameterization
- width
- depth
Gain only controls where we sit inside a regime:
- contractive
- critical
- expansive
Gain reveals geometry; it does not decide its existence.
12. Important caveat: learning can be real even if
A network may be learning faithfully while remaining isotropic if:
- the data distribution is intrinsically isotropic
- signal is high-rank and evenly distributed
- task is already linear
In such cases:
is correct behavior, not laziness.
13. Final conceptual hierarchy
- Anisotropy: does geometry exist?
- FTLE structure: where does geometry appear?
- Gain: how visible is the geometry?
14. One-line summary
Gain lifts or lowers the sensitivity landscape; anisotropy shapes it.
This separation is the backbone of the theory.
(Addendum) 15. : variance of stretch (not log-stretch)
In the code, besides
15.1 From FTLE to a Jacobian-norm proxy
Recall FTLE (finite-time log-growth rate):
In your depth-wise setting, a common approximation (and the one implicitly used in the code) is:
so that
In the script you do exactly this conversion:
- compute
, - exponentiate to get a proxy for
, - then take variance over the dataset.
15.2 Definition of
Define the Jacobian-norm proxy:
Then:
So:
= variance of the log-stretch rate across = variance of the stretch magnitude across
15.3 Why is different from
Because exponentiation amplifies tails:
- small differences in
can create huge differences in , - especially when
is large.
Heuristically, if
So:
is the stable geometry-heterogeneity indicator (log domain) is the sensitive tail indicator (linear domain)
15.4 Interpretation in our theory
-
In a lazy/isotropic regime:
nearly constant nearly constant
-
In a rich/structured regime:
varies across space - tails get amplified
can become very large (especially at large )
15.5 Relationship to gain
Since gain rescales Jacobians roughly as
- This tends to shift
multiplicatively:
so
- But gain still does not create geometry: if
is flat, both and remain near zero (just rescaled).
15.6 Quick summary
: heterogeneity of log-stretch (geometry visibility, stable) : heterogeneity of stretch (tail-sensitive, can explode with depth/gain)
The correct identity is:
So what’s happening is:
- FTLE compresses growth into a log-average
undoes that compression by exponentiating
Conceptually:
FTLE lives in the log domain,