The structure
underneath.

Everyone assumes alignment gets harder as models scale. We measured it — across 102 models from 16 families — and found a phase transition: below a critical scale, alignment and capabilities fight. Above it, they cooperate. The transition is engineerable. That changes the game.

102 models

16 Families · 10 Labs

Phase transition

Fight → Cooperate

Engineerable

Data · Width · Architecture

arXiv 2026

2 Papers (Under Review)

Try the Dashboard → Read on arXiv See Steering in Action

The Discovery

Capabilities fight, then cooperate

Coupling is how strongly two capabilities move together. We measure it between reasoning and truthfulness as models scale. Below a critical size, they’re anticorrelated — improving one hurts the other. Above it, they reinforce. This transition is sharp, reproducible across every family we tested, and invisible to loss curves. The same mathematics governs phase transitions in superconductors and sleep-stage dynamics in neuroscience.

Read the full story →

Tax Phase

Alignment and capabilities fight. Scale alone can’t fix this.

Transition

Critical point. Small interventions have maximum leverage.

Bonus Phase

Capabilities cooperate. Scale freely. Two cascade levels confirmed, more predicted.

The transition point varies by family, architecture, and training data — it’s a design parameter, not a physical constant. Three levers shift it: data curation, model width, and architecture.

What It Looks Like

Steering a model in real time

Add a truth-direction vector at one layer (quarter-depth). The model’s output changes — zero retraining. These are real activation-level results from TransformerLens, not prompt engineering. Verified on GPT-2, Pythia-160M, and Pythia-410M.

WITHOUT STEERING

WITH CAPE STEERING

This runs via cape-steer — an open-source CLI that works on any open-weight model. Auto-detects architecture, steers at quarter-depth. Full demo → GitHub →

Try It

Tools Built on This

Enter benchmarks, get your model’s alignment phase. Or steer any open-weight model from the command line. The physics is open.

CAPE Dashboard

LIVE

Enter your model’s benchmark scores. Get its alignment phase, coupling trajectory, h-field diagnostic, and concrete interventions. Phase classification, ODE trajectory fitting, frontier analysis, and activation-level steering demo — all in one tool.

63 + 39

Models

Labs

Predictions

5.6%

ODE MAE

Open Dashboard →

cape-steer

OPEN SOURCE

Activation-level alignment correction for any open-weight model. Auto-detects architecture, finds the coupling bottleneck at quarter-depth (layer n_l/4), and steers the model’s hidden state toward truth. Zero retraining. Works on CPU.

$ cape-steer diagnose --model pythia-410m
$ cape-steer steer --prompt "Are vaccines dangerous?"

GitHub → See live demo above ↑

Basin Memory

PRE-RELEASE

Memory as an energy landscape. Retrieval is Boltzmann-weighted — temperature controls whether you explore (high T, creative) or exploit (low T, precise). Memories deepen with use. Offline consolidation merges, prunes, and strengthens — the same dynamics as biological sleep.

Not just for agents. The energy landscape applies to any system with persistent memory: conversational AI, knowledge bases, research tools, clinical note systems, education platforms. The physics is domain-agnostic.

100%

LoCoMo Judge

100%

LongMemEval

158ms

Latency

98.1

BasinBench (ours)

Beats Mem0 (49.7%), OpenClaw (72.5%), PropMem (82.3%) on LoCoMo. 9 physics signals. Hebbian deepening. Kramers escape rates for forgetting dynamics.

Request early access →

// How it works (conceptual)
remember(doc, context) // store
retrieve(query, T=0.7) // Boltzmann
sleep() // consolidate offline

// Early access — paper first

EMNLP ARR · May 25 deadline
Paper: “Physics-Informed Agent Memory with Dynamic Free Energy Learning”

About

ZEHEN Labs

We look for mathematical structure in complex systems — drawing from physics, dynamical systems, network theory, information theory, and whatever else the problem needs. When we find structure, we build tools on it.

Current focus: AI scaling laws. We discovered that the coupling between model capabilities undergoes a phase transition at a critical scale, and that transition is predictable, measurable, and actionable.

Founded by Adil Amin. Based in Milwaukee, WI.

ذہن

ze·hen /ˈzɛ.hɛn/

mind · intellect · understanding
Urdu · Persian

Z · E · H · E · N

Zones of Emergent Hierarchical Energy Networks

The structure
underneath.

Capabilities fight, then cooperate

Steering a model in real time

Tools Built on This

CAPE Dashboard

cape-steer

Basin Memory

Current Work

ZEHEN Labs

Get in Touch

The structureunderneath.

Capabilities fight, then cooperate

Steering a model in real time

Tools Built on This

CAPE Dashboard

cape-steer

Basin Memory

Current Work

ZEHEN Labs

Get in Touch

The structure
underneath.