Evan Ryan Gunter

Email: evgunter at gmail dot com

CV, last updated 2/25.

Research interests

Learning theory and mechanistic interpretability

Geometry of loss landscapes

Dependence on specific optimizers of the loss landscape's local geometric properties during training, on which I mentored a MARS project (paper)
Understanding to what extent different training runs lead to qualitatively different solutions; generalization of the finding that, in deep linear nets, all local minima are global to nets with nonlinearities such as ReLUs

Understanding how LLMs work, and application of these insights to mechanistic interpretability

How well SAEs represent true underlying features
How LLMs perform computations on their internal representations of features
How important attention is at scale compared to MLP layers

Limits of future models

Adversarial examples

Whether adversarial examples are preventable or a necessary consequence of some aspect of how ML models work, like superposition
Generalization of adversarial examples between models with different architectures or models trained on different datasets

Theoretical minimum sizes of models with certain capabilites; putting bounds on greatest possible hardware overhang
The limits of out-of-distribution generalization; especially, whether LLMs can gain capabilities qualitatively beyond those demonstrated by humans in their training data

Scalable oversight and control

Self-refining alignment strategies such as Constitutional AI, Superalignment, and multimodal model self-supervision
Non-alignment strategies for safe AI in the short term, especially leveraging the low agency of frontier models

Philosophy

Moral patienthood of AI systems, and connections between goals and affective states
Anthropic reasoning, on which I wrote a thesis with advisor Chip Sebens
- Consequences of different formalizations of anthropic reasoning
- Application to the many-worlds interpretation of quantum mechanics
- Application to multiverses governed by different laws
- Addressing mathematical issues in infinite universes
The hard problem of consciousness and Russellian monism, on which I did a small research project with advisor Frederick Eberhardt and collaborator Alex Denko
Philosophy of fundamental physics

Ontology of physical laws
Implications of the many-worlds interpretation of quantum mechanics

Projects and publications

Background

I have recently transitioned to researching AI safety. Previously I was a research engineer at Granica working on data compression. I graduated from Caltech in 2019, with B.S. degrees in mathematics (advisor Nets Katz), computer science (advisor Chris Umans), and philosophy (advisor Chip Sebens).