Tokens, Latent Spaces, Embeddings#

The Question#

What’s a token?

In 2026, most people know about a “large language model.” That it generates the next most plausible word, one word at a time. But what happens under the hood that lets it do that? An LLM takes words, turns them into numbers, does math on those numbers, and gets back new words.

How does it know that chicken and steak are more similar than chicken and spinach? It’s not reading a dictionary. It’s doing geometry. How?

Kitchen Sink to Ordered Table#

Let’s say you have a table covered with a bunch of ingredients. Leafy vegetables, spices, meats, grains, bread, pastas, herbs, rice, tomatoes, roots, potatoes, seafood, flours.

They’re scattered randomly. You ask someone to arrange them, just “organize this.” Think for a second what you would do. Green things should be close together. Leafy vegetables, then herbs, which should be next to spices because herbs and spices are used similarly. Maybe flour is near spices because they’re both powdery. Potatoes near the roots feels right. Seafood and meat should be close to one another, maybe near the pasta. And the pasta near the potatoes.

There could be thousands of ways to organize this table. But if you asked a thousand people to do it, common patterns would emerge. People would cluster similar things together and push different things apart without needing detailed instructions. Organization is just encoding what’s similar and what’s different.

A table gives you two dimensions to work with, left/right and up/down. Say the table has a grid, 6 × 14, starting from the bottom left. The bottom left square is (1, 1), one square to the right is (2, 1), one square on top of that is (2, 2), and so on.

Now every ingredient has a meaningful pair of numbers, taking the center-ish of where each item lies:

chicken: (12, 5)
steak: (13, 4)
flour: (7, 5)
kale: (2, 2)

Those numbers aren’t arbitrary. Chicken isn’t just a word anymore. It has a location.

More Dimensions, More Meaning#

Now think about your dresser and closet. Underwear and socks go near each other. Jackets and coats go near each other. T-shirts with tank tops and workout shirts. With a closet and a dresser you get three dimensions: up/down, left/right, front/back. You might have a drawer of socks, dress socks on the left and athletic socks on the right. And a drawer below has underwear and undershirts.

It’s hard to visualize more than three dimensions. But going to four, five, hundreds of dimensions just means expanding the ways you can place things relative to one another.

More dimensions solve a real problem. On a two-dimensional table, chicken is close to steak, both meats. But chicken is also close to eggs, both from the same animal. Move chicken closer to eggs and you push it away from steak. You can’t satisfy both at once. With hundreds of dimensions, each one captures a different axis of meaning. Chicken can be close to steak on one axis, close to eggs on another, and far from spinach on both. The relationships coexist without conflicting.

The extra dimensions aren’t wasted space. They’re what allow the model to encode the full complexity of how concepts relate to each other.

The Latent Space#

That organized, multidimensional table has a name: the latent space. “Latent” because the structure is hidden. It’s not something anyone designs by hand. The model discovers it during training. “Space” because it’s a coordinate system, just like our grid table, but stretched across hundreds or thousands of dimensions. Every concept the model understands has a location in this space, and the distances between locations encode meaning.

So What’s an Embedding?#

The coordinates of an ingredient on the organized table are its embedding. Chicken at (12, 5) — those numbers are chicken’s embedding. A location in the latent space that captures its meaning and relationship to everything else.

Closing the Loop#

So when someone tells you that an LLM is “doing math on vectors,” here’s what they mean: every word gets turned into a token, every token gets mapped to a point in a vast organized space. Locations that are close together mean similar things. The model looks at the locations of your input words, does math, and predicts where the next token should land in that space. Then it converts that location back into a word. The whole thing is just a very sophisticated version of organizing a messy table. And then navigating it.