Home - Edward Turner

Hey, I'm Ed. I'm a member of technical staff at Anthropic in the Horizons team. Before this I researched mechanistic interpretability under Neel Nanda, was a quant trader and studied Maths & Stats at Oxford. In general I'm interested in building awesome, aligned prediction tools and deep understanding.

Current Research Interests

Reinforcement learning
Understanding the geometry of representations within transformers
Understanding why transformers generalise as they do

Recent Updates

Joined Anthropic
Our Emergent Misalignment work is now a homework at Harvard
Published our research update on narrow misalignment being possible but unstable
Built easy-dataset-share - a pip installable CLI tool to aid dataset sharing for researchers
Our paper “Model Organisms for Emergent Misalignment” (arXiv:2506.11613) was accepted to the ICML Reliable and Responsible Foundation Models workshop
Our paper “Convergent Linear Representations of Emergent Misalignment” (arXiv:2506.11618) was accepted to the ICML Actionable Interpretability workshop
Our work was covered by MIT Technology Review

Preview

Current Research Interests

Recent Updates