← All papers

#9 Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

It turns out it might be possible to understand how frontier LLMs work by inspecting middle model layers.

← All papers