Papers — ZEHEN Labs

Submitted

Paper 3A NeurIPS 2026

“Lying Is Just a Phase”

The Hidden Alignment Transition in Language Model Scaling

Below a family-dependent critical scale N_c, the coupling between reasoning (HellaSwag) and truthfulness (TruthfulQA) is negative — scaling reasoning hurts truthfulness. Above N_c, they cooperate. N_c varies 60× across families (0.12B–7B) and is a design parameter, not a physical constant: width, data curation, and architecture each shift it independently. A coupled ODE cross-predicts held-out Llama-2 at 5.6% MAE. The isocline classifier separates standard-trained from curated families. Curated models (Phi, Qwen3) bypass the tax entirely.

63base models 16families r = -0.989pre-Nc 5.6%ODE MAE

PDF Code + Data Dashboard

Paper 3B NeurIPS 2026

“The Growing Pains of Frontier Models”

Capability Coupling Analysis at Frontier Scale

At frontier scale (SWE-bench vs GPQA Diamond, 34+5 models, 10 labs), capabilities remain cooperative (r = +0.72, slope 0.513). The h-field diagnostic — deviation from the cooperation trend — reveals each lab’s training philosophy: Google is reasoning-specialist (h̄ = +5.5), Anthropic is coding-rich (h̄ = −6.9). Per-lab coupling slopes span 5× (Google 1.15 vs DeepSeek 0.23). Tax excursions are temporary — Sonnet 4.6 (h = −13.1) recovers at Opus 4.6 (h = +3.5). The h-field is descriptive, not causal. Seven falsifiable predictions with timestamped deadlines.

39frontier models 10labs 0.513slope 7predictions

PDF Code + Data Dashboard

Writing Now

Paper 3I EMNLP ARR · May 25

Basin Memory

Physics-Informed Agent Memory with Dynamic Free Energy Learning

Agent memory that uses energy landscapes instead of vector similarity. Retrieval is Boltzmann-weighted (temperature controls exploration vs exploitation), memories deepen with use (frequency = basin depth), and offline “sleep” consolidation merges, prunes, and strengthens — the same physics that governs CAPE coupling.

Why it’s different: Every existing agent memory (Mem0, Zep, Letta, MemGPT) uses semantic similarity + heuristics. Basin Memory uses a free-energy landscape — retrieval quality improves with use because the energy surface reshapes. It’s a learning theory, not a retrieval system.

Benchmarks

LoCoMo (full, Judge)	100% (1986/1986)
LongMemEval (Judge)	100% (500/500)
BasinBench (our benchmark, to be released)	98.1/100
Retrieval latency	158ms
vs Mem0	49.7% LoCoMo
vs OpenClaw	72.5% LoCoMo
vs PropMem	82.3% LoCoMo

Access: pip install basin-memory (pre-release). Request early access for the full physics engine.

100%LoCoMo Judge 158mslatency 9physics signals

Near-Ready

Paper 1 85% Written

“A Calorimeter Is All You Need”

Inverse Gor’kov Framework for Superconductor Classification · Nature / PRB

Classifies superconductor pairing symmetry from bulk thermodynamics alone. 33/33 known materials classified correctly. Leave-one-out error 3.9%. Det(a) sign-change discriminator separates s-wave from sign-changing. L1/L0 boosting ratio fingerprints: 0.2x conventional, 3-5x s±, 11x d-wave.

33/33classified 3.9%LOO error

Paper 2 85% Written

“Thirty-Three Times Too Heavy”

Mass Enhancement and Pairing in Heavy Fermions · Nature Physics / PRL

Three falsifiable predictions from one framework: FeSe 8-pressure trajectory (β ~ 0.12), La3Ni2O7 ΔC/γTc = 1.7±0.3, UTe2 Leggett mode at 0.6–14 GHz (Method B ~4.6 GHz, unmeasured).

0.6–14 GHzUTe2 prediction 3falsifiable predictions

Paper 3E Data Complete

SFEE Universality

From CeRh₂As₂ to AI Scaling · PRL

R² = 0.855, Bayes factor 1049.6. The same free-energy structure governs superconductor phase boundaries and AI scaling transitions.

R² = 0.855 BF = 1049.6

Sleep 75% Written

GL Dynamics for EEG Sleep Stage Transitions

Nature (target)

Critical slowing down 2.60x, susceptibility 4.6x, Kramers escape rate within 1.4x of observation. Sleep consolidation connects to Basin Memory offline processing.

2.60xcritical slowing 4.6xsusceptibility

Planned & In Progress

3C

GL Phase Theory — RG Flow + Beyond-GL + Feynman Rules

From 125-page monograph · Nature Physics / ICLR

3F

Alignment Engineering — Self-Aligning via CAPE

EMNLP / ICML Safety

3G

Microscopic Thermodynamics — Per-Layer SAE Feature Coupling

NeurIPS Interpretability

3K

Architecture Dynamics — f_k Library (Transformer, Mamba, CNN, MoE)

NeurIPS / ICML

3L

Crown Jewel Intervention — 5-Arm Coupling Modification Suite

Standalone

Publication Program