
Hey, I'm Ed. I'm a member of technical staff at Anthropic in the Horizons team. Before this I researched mechanistic interpretability under Neel Nanda, was a quant trader and studied Maths & Stats at Oxford. In general I'm interested in building awesome, aligned prediction tools and deep understanding.
Current Research Interests
- Reinforcement learning
- Understanding the geometry of representations within transformers
- Understanding why transformers generalise as they do
Recent Updates
-
Joined Anthropic
-
Our Emergent Misalignment work is now a homework at Harvard
-
Published our research update on narrow misalignment being possible but unstable
-
Built easy-dataset-share - a pip installable CLI tool to aid dataset sharing for researchers
-
Our paper “Model Organisms for Emergent Misalignment” (arXiv:2506.11613) was accepted to the ICML Reliable and Responsible Foundation Models workshop
-
Our paper “Convergent Linear Representations of Emergent Misalignment” (arXiv:2506.11618) was accepted to the ICML Actionable Interpretability workshop
-
Our work was covered by MIT Technology Review