On the Biology of a Large Language Model

27 March 2025

Research Paper

Conducted by :

Anthropic

#Large Language Models #AI Ethics #Constitutional AI #AI Alignment #AI Safety #AI Transparency

Authors

Jack Lindsey (Anthropic)

Wes Gurnee (Anthropic)

Emmanuel Ameisen (Anthropic)

Brian Chen (Anthropic)

Adam Pearce (Anthropic)

Nicholas L. Turner (Anthropic)

Craig Citro (Anthropic)

David Abrahams (Anthropic)

Shan Carter (Anthropic)

Basil Hosmer (Anthropic)

Jonathan Marcus (Anthropic)

Michael Sklar (Anthropic)

Adly Templeton (Anthropic)

Trenton Bricken (Anthropic)

Callum McDougall (Anthropic)

Hoagy Cunningham (Anthropic)

Thomas Henighan (Anthropic)

Adam Jermyn (Anthropic)

Andy Jones (Anthropic)

Andrew Persic (Anthropic)

Zhenyi Qi (Anthropic)

T. Ben Thompson (Anthropic)

Sam Zimmerman (Anthropic)

Kelley Rivoire (Anthropic)

Thomas Conerly (Anthropic)

Chris Olah (Anthropic)

Joshua Batson (Anthropic)

Abstract

Large language models display impressive capabilities. However, for the most part, the mechanisms by which they do so are unknown. The black-box nature of models is increasingly unsatisfactory as they advance in intelligence and are deployed in a growing number of applications. Our goal is to reverse engineer how these models work on the inside, so we may better understand them and assess their fitness for purpose.

The challenges we face in understanding language models resemble those faced by biologists. Living organisms are complex systems which have been sculpted by billions of years of evolution. While the basic principles of evolution are straightforward, the biological mechanisms it produces are spectacularly intricate. Likewise, while language models are generated by simple, human-designed training algorithms, the mechanisms born of these algorithms appear to be quite complex.

Sources

Page source

Notice something missing or incorrect?
Suggest changes on GitHub