The Anatomy of an LLM

The Anatomy of an LLM

I think I’ve finished something I’ve been building for a while:

The Anatomy of an LLM

It is a visual, interactive walk-through of the main machinery inside a large language model.

Not just “it predicts the next token”, but what actually happens before that:

  • How text becomes tokens
  • How tokens become vectors
  • How neural networks process those vectors
  • How attention works
  • How transformer blocks are stacked together
  • How the model ends up choosing the next token
  • And why things like context windows, KV cache and quantization matter

It started as a small explanation, but of course that quickly got out of hand.

It is still marked beta, and I’m sure there are things that can be improved. But at some point you have to stop tweaking and publish.

Take a look: The Anatomy of an LLM