Email: evgunter at gmail dot com
CV, last updated 2/25.
Learning theory and mechanistic interpretability
- Geometry of loss landscapes
- Dependence on specific optimizers of the loss landscape's local geometric properties during training, on which I mentored a MARS project (paper)
- Understanding to what extent different training runs lead to qualitatively different solutions; generalization of the finding that, in deep linear nets, all local minima are global to nets with nonlinearities such as ReLUs
- Understanding how LLMs work, and application of these insights to mechanistic interpretability
- How well SAEs represent true underlying features
- How LLMs perform computations on their internal representations of features
- How important attention is at scale compared to MLP layers
- Adversarial examples
- Whether adversarial examples are preventable or a necessary consequence of some aspect of how ML models work, like superposition
- Generalization of adversarial examples between models with different architectures or models trained on different datasets
- Theoretical minimum sizes of models with certain capabilites; putting bounds on greatest possible hardware overhang
- The limits of out-of-distribution generalization; especially, whether LLMs can gain capabilities qualitatively beyond those demonstrated by humans in their training data
Scalable oversight and control
- Self-refining alignment strategies such as Constitutional AI, Superalignment, and multimodal model self-supervision
- Non-alignment strategies for safe AI in the short term, especially leveraging the low agency of frontier models
- Moral patienthood of AI systems, and connections between goals and affective states
- Anthropic reasoning, on which I wrote a thesis with advisor Chip Sebens
- Consequences of different formalizations of anthropic reasoning
- Application to the many-worlds interpretation of quantum mechanics
- Application to multiverses governed by different laws
- Addressing mathematical issues in infinite universes
- The hard problem of consciousness and Russellian monism, on which I did a small research project with advisor Frederick Eberhardt and collaborator Alex Denko
- Philosophy of fundamental physics
- Ontology of physical laws
- Implications of the many-worlds interpretation of quantum mechanics
Projects and publications
I have recently transitioned to researching AI safety.
Previously I was a research engineer at Granica working on data compression.
I graduated from Caltech in 2019, with B.S. degrees in mathematics (advisor Nets Katz), computer science (advisor Chris Umans), and philosophy (advisor Chip Sebens).