
Hey, I'm Ed. I'm currently researching mechanistic interpretability under Neel Nanda, focusing on understanding what language models are really doing. My background is in Maths & Stats, and I was previously a quant. In general I'm interested in building awesome prediction tools and deep understanding.
Current Research Interests
- Better understand the geometry of representations within transformers
- Understanding why transformers generalise as they do
- Ambitious mechanistic interpretability
Recent Updates
-
Our Emergent Misalignment work is now a homework at Harvard
-
Published our research update on narrow misalignment being possible but unstable
-
Built easy-dataset-share - a pip installable CLI tool to aid dataset sharing for researchers
-
Built Training Lens - a website for better understanding how representations form during training in models, specifically focusing on interactive visuals to aid understanding
-
Our paper “Model Organisms for Emergent Misalignment” (arXiv:2506.11613) was accepted to the ICML Reliable and Responsible Foundation Models workshop
-
Our paper “Convergent Linear Representations of Emergent Misalignment” (arXiv:2506.11618) was accepted to the ICML Actionable Interpretability workshop
-
Our work was covered by MIT Technology Review