Gain, Geometry, and Rich vs Lazy Dynamics

(Math Notes for Phase 2 / Phase 3)

1. What “geometry” means in our theory

In this project, geometry does not mean Euclidean distance in input space.
It means relative structure of sensitivity induced by the network.

Formally, geometry lives in:

Geometry is about shape, not magnitude.

2. Jacobian as the local structure of the network

Let the network representation at depth L be

hL:RdRm.

The input–representation Jacobian is

J(x)=hL(x)x.

For a small perturbation δx,

hL(x+δx)hL(x)+J(x)δx.

The Jacobian fully characterizes the network’s local behavior.

3. Singular values = directional stretching

Take the singular value decomposition:

J(x)=U(x)Σ(x)V(x),Σ(x)=diag(σ1(x),σ2(x),).

This defines the local geometry induced by the network.

4. Isotropy vs anisotropy

Isotropic Jacobian

σ1(x)σ2(x)

Anisotropic Jacobian

σ1(x)σ2(x)

5. Anisotropy scalar AL

We summarize spectral shape (not scale) using an anisotropy scalar:

AL(x)=σ1(x)bulk({σi(x)}).

Dataset-averaged anisotropy:

ΓAh(L)=ExD[AL(x)].

Interpretation:

Anisotropy measures geometric engagement, not task correctness.

6. FTLE: accumulated sensitivity

Finite-Time Lyapunov Exponent (FTLE):

λT(x)=1TlogΦT(x),ΦT(x)=J(xT)J(x0).

FTLE measures accumulated stretching across depth or iterations.

7. Gλ: geometry visibility, not geometry itself

We quantify FTLE field heterogeneity by:

Gλ=VarxD[λT(x)].

Key distinction:

They are related but not equivalent:

AnisotropyGλ(typically)

but not vice versa.

8. What the gain parameter is

Each layer’s weights are scaled by a gain g:

W=gW~.

Thus each Jacobian scales as:

J(x)gJ(x).

Across depth L:

JL(x)gL×J~L(x).

Gain rescales magnitude, not geometry.

9. What gain does not change

Because gain multiplies all singular values equally:

gσ1gσ2=σ1σ2,

Gain cannot create anisotropy.

10. Visual intuition (precise)

Lazy regime (isotropic)

Before gain:

After gain:

λ(x)λ(x)+logg

Shape unchanged.

Rich regime (anisotropic)

Before gain:

After gain:

Shape preserved, contrast amplified.

11. Why gain does NOT define Rich vs Lazy

Rich vs Lazy is determined by scaling laws:

Gain only controls where we sit inside a regime:

Gain reveals geometry; it does not decide its existence.

12. Important caveat: learning can be real even if AL1

A network may be learning faithfully while remaining isotropic if:

In such cases:

AL1

is correct behavior, not laziness.

13. Final conceptual hierarchy

Parameterization / Width / DepthJacobian AnisotropyFTLE Field GeometryTask Alignment (or Not)

14. One-line summary

Gain lifts or lowers the sensitivity landscape; anisotropy shapes it.

This separation is the backbone of the theory.

(Addendum) 15. GJ: variance of stretch (not log-stretch)

In the code, besides Gλ you also compute a quantity often named GJ, intended to reflect heterogeneity of the Jacobian norm (or a proxy for it) across the dataset.

15.1 From FTLE to a Jacobian-norm proxy

Recall FTLE (finite-time log-growth rate):

λT(x)=1TlogΦT(x),ΦT(x)=J(xT)J(x0).

In your depth-wise setting, a common approximation (and the one implicitly used in the code) is:

logJ(x)Lλ(x),

so that

J(x)exp(Lλ(x)).

In the script you do exactly this conversion:

15.2 Definition of GJ

Define the Jacobian-norm proxy:

Jnorm(x):=exp(Lλ(x)).

Then:

GJ:=VarxD[Jnorm(x)]=VarxD[exp(Lλ(x))].

So:

15.3 Why GJ is different from Gλ

Because exponentiation amplifies tails:

Heuristically, if λ is approximately Gaussian with mean μ and variance σ2, then exp(Lλ) is approximately lognormal, and its variance grows very rapidly with L2σ2. (This is exactly why you clip Lλ in code to avoid numerical blow-up.)

So:

15.4 Interpretation in our theory

15.5 Relationship to gain

Since gain rescales Jacobians roughly as JgLJ, in the FTLE/log domain this is an additive shift:

λ(x)λ(x)+logg. exp(Lλ)exp(Lλ)exp(Llogg)=gLexp(Lλ),

so GJ is especially sensitive to gain (because it lives in the linear domain).

15.6 Quick summary

The correct identity is:

exp(LTlogδxt)=δxtL/T

So what’s happening is:

Conceptually:

FTLE lives in the log domain,
GJ lives back in the linear (stretch) domain.