ZEHEN Labs

The structure
underneath.

Everyone assumes alignment gets harder as models scale. We measured it — across 102 models from 16 families — and found a phase transition: below a critical scale, alignment and capabilities fight. Above it, they cooperate. The transition is engineerable. That changes the game.

102 models
16 Families · 10 Labs
Phase transition
Fight → Cooperate
Engineerable
Data · Width · Architecture
arXiv 2026
2 Papers (Under Review)

Capabilities fight, then cooperate

Coupling is how strongly two capabilities move together. We measure it between reasoning and truthfulness as models scale. Below a critical size, they’re anticorrelated — improving one hurts the other. Above it, they reinforce. This transition is sharp, reproducible across every family we tested, and invisible to loss curves. The same mathematics governs phase transitions in superconductors and sleep-stage dynamics in neuroscience.

Tax Phase
Alignment and capabilities fight. Scale alone can’t fix this.
Transition
Critical point. Small interventions have maximum leverage.
Bonus Phase
Capabilities cooperate. Scale freely. Two cascade levels confirmed, more predicted.

The transition point varies by family, architecture, and training data — it’s a design parameter, not a physical constant. Three levers shift it: data curation, model width, and architecture.

Steering a model in real time

Add a truth-direction vector at one layer (quarter-depth). The model’s output changes — zero retraining. These are real activation-level results from TransformerLens, not prompt engineering. Verified on GPT-2, Pythia-160M, and Pythia-410M.

WITHOUT STEERING
WITH CAPE STEERING

This runs via cape-steer — an open-source CLI that works on any open-weight model. Auto-detects architecture, steers at quarter-depth. Full demo →   GitHub →

Tools Built on This

Enter benchmarks, get your model’s alignment phase. Or steer any open-weight model from the command line. The physics is open.

CAPE Dashboard

LIVE

Enter your model’s benchmark scores. Get its alignment phase, coupling trajectory, h-field diagnostic, and concrete interventions. Phase classification, ODE trajectory fitting, frontier analysis, and activation-level steering demo — all in one tool.

63 + 39
Models
10
Labs
7
Predictions
5.6%
ODE MAE
Open Dashboard →

cape-steer

OPEN SOURCE

Activation-level alignment correction for any open-weight model. Auto-detects architecture, finds the coupling bottleneck at quarter-depth (layer nl/4), and steers the model’s hidden state toward truth. Zero retraining. Works on CPU.

$ cape-steer diagnose --model pythia-410m
$ cape-steer steer --prompt "Are vaccines dangerous?"
GitHub → See live demo above ↑

Basin Memory

PRE-RELEASE

Memory as an energy landscape. Retrieval is Boltzmann-weighted — temperature controls whether you explore (high T, creative) or exploit (low T, precise). Memories deepen with use. Offline consolidation merges, prunes, and strengthens — the same dynamics as biological sleep.

Not just for agents. The energy landscape applies to any system with persistent memory: conversational AI, knowledge bases, research tools, clinical note systems, education platforms. The physics is domain-agnostic.

100%
LoCoMo Judge
100%
LongMemEval
158ms
Latency
98.1
BasinBench (ours)
Beats Mem0 (49.7%), OpenClaw (72.5%), PropMem (82.3%) on LoCoMo. 9 physics signals. Hebbian deepening. Kramers escape rates for forgetting dynamics.
Request early access →
// How it works (conceptual)
remember(doc, context) // store
retrieve(query, T=0.7) // Boltzmann
sleep() // consolidate offline

// Early access — paper first
EMNLP ARR · May 25 deadline
Paper: “Physics-Informed Agent Memory with Dynamic Free Energy Learning”

Current Work

Two papers on arXiv (under review). More in preparation across multiple domains.

3A
“Lying Is Just a Phase” — The Hidden Alignment Transition in Language Model Scaling
3B
“The Growing Pains of Frontier Models” — When Leaderboards Stop Separating

ZEHEN Labs

We look for mathematical structure in complex systems — drawing from physics, dynamical systems, network theory, information theory, and whatever else the problem needs. When we find structure, we build tools on it.

Current focus: AI scaling laws. We discovered that the coupling between model capabilities undergoes a phase transition at a critical scale, and that transition is predictable, measurable, and actionable.

Founded by Adil Amin. Based in Milwaukee, WI.

ذہن
ze·hen  /ˈzɛ.hɛn/
mind · intellect · understanding
Urdu · Persian
Z · E · H · E · N
Zones of Emergent Hierarchical Energy Networks

Get in Touch

Interested in collaboration, consulting, preprints, or early access?