Research program spanning AI scaling, agent memory, quantum materials, and more. Contact [email protected] for preprints.
Submitted
Paper 3ANeurIPS 2026
“Lying Is Just a Phase”
The Hidden Alignment Transition in Language Model Scaling
Below a family-dependent critical scale Nc, the coupling between reasoning (HellaSwag) and truthfulness (TruthfulQA) is negative — scaling reasoning hurts truthfulness. Above Nc, they cooperate. Nc varies 60× across families (0.12B–7B) and is a design parameter, not a physical constant: width, data curation, and architecture each shift it independently. A coupled ODE cross-predicts held-out Llama-2 at 5.6% MAE. The isocline classifier separates standard-trained from curated families. Curated models (Phi, Qwen3) bypass the tax entirely.
63base models16familiesr = -0.989pre-Nc5.6%ODE MAE
At frontier scale (SWE-bench vs GPQA Diamond, 34+5 models, 10 labs), capabilities remain cooperative (r = +0.72, slope 0.513). The h-field diagnostic — deviation from the cooperation trend — reveals each lab’s training philosophy: Google is reasoning-specialist (h̄ = +5.5), Anthropic is coding-rich (h̄ = −6.9). Per-lab coupling slopes span 5× (Google 1.15 vs DeepSeek 0.23). Tax excursions are temporary — Sonnet 4.6 (h = −13.1) recovers at Opus 4.6 (h = +3.5). The h-field is descriptive, not causal. Seven falsifiable predictions with timestamped deadlines.
Physics-Informed Agent Memory with Dynamic Free Energy Learning
Agent memory that uses energy landscapes instead of vector similarity. Retrieval is Boltzmann-weighted (temperature controls exploration vs exploitation), memories deepen with use (frequency = basin depth), and offline “sleep” consolidation merges, prunes, and strengthens — the same physics that governs CAPE coupling.
Why it’s different: Every existing agent memory (Mem0, Zep, Letta, MemGPT) uses semantic similarity + heuristics. Basin Memory uses a free-energy landscape — retrieval quality improves with use because the energy surface reshapes. It’s a learning theory, not a retrieval system.
Benchmarks
LoCoMo (full, Judge)
100% (1986/1986)
LongMemEval (Judge)
100% (500/500)
BasinBench (our benchmark, to be released)
98.1/100
Retrieval latency
158ms
vs Mem0
49.7% LoCoMo
vs OpenClaw
72.5% LoCoMo
vs PropMem
82.3% LoCoMo
Access:pip install basin-memory (pre-release). Request early access for the full physics engine.
100%LoCoMo Judge158mslatency9physics signals
Near-Ready
Paper 185% Written
“A Calorimeter Is All You Need”
Inverse Gor’kov Framework for Superconductor Classification · Nature / PRB
Classifies superconductor pairing symmetry from bulk thermodynamics alone. 33/33 known materials classified correctly. Leave-one-out error 3.9%. Det(a) sign-change discriminator separates s-wave from sign-changing. L1/L0 boosting ratio fingerprints: 0.2x conventional, 3-5x s±, 11x d-wave.
33/33classified3.9%LOO error
Paper 285% Written
“Thirty-Three Times Too Heavy”
Mass Enhancement and Pairing in Heavy Fermions · Nature Physics / PRL
Three falsifiable predictions from one framework: FeSe 8-pressure trajectory (β ~ 0.12), La3Ni2O7 ΔC/γTc = 1.7±0.3, UTe2 Leggett mode at 0.6–14 GHz (Method B ~4.6 GHz, unmeasured).
0.6–14 GHzUTe2 prediction3falsifiable predictions
Paper 3EData Complete
SFEE Universality
From CeRh2As2 to AI Scaling · PRL
R² = 0.855, Bayes factor 1049.6. The same free-energy structure governs superconductor phase boundaries and AI scaling transitions.
R² = 0.855BF = 1049.6
Sleep75% Written
GL Dynamics for EEG Sleep Stage Transitions
Nature (target)
Critical slowing down 2.60x, susceptibility 4.6x, Kramers escape rate within 1.4x of observation. Sleep consolidation connects to Basin Memory offline processing.