#9 Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
It turns out it might be possible to understand how frontier LLMs work by inspecting middle model layers.
We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
It turns out it might be possible to understand how frontier LLMs work by inspecting middle model layers.