#9 Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet

2024-05-22

It turns out it might be possible to understand how frontier LLMs work by inspecting middle model layers.