Embeddings — How Models Represent Meaning

PM: Read in full — 20 min

The Meaning of Meaning, Numerically

A language model can't work with words directly — it works with numbers. Embeddings are how text gets turned into numbers in a way that preserves meaning.

An embedding is a vector: a list of numbers (typically hundreds to thousands of values) representing the semantic content of a word, sentence, or document. The geometric relationships between vectors capture semantic relationships between the things they represent.

The Classic Example: Vector Arithmetic on Meaning

The canonical demonstration comes from Word2Vec (Mikolov et al., 2013). Take the embedding for "king," subtract the vector for "man," add the vector for "woman," and the result lands near the embedding for "queen."

king − man + woman ≈ queen

This is not a coincidence. It means the model has learned a dimension in embedding space that encodes something like gender, and another that encodes royalty. Semantic relationships become geometric ones.

Visualizing Embedding Space

Embedding vectors typically have 768–4096 dimensions, not directly visualizable. Researchers use dimensionality reduction (t-SNE, UMAP) to project them into 2D. The structure that emerges:

Words with similar meanings cluster together. This spatial structure is what makes similarity search work.

Sentence and Document Embeddings

Word embeddings were the first generation. Modern systems use sentence or document embeddings: a single vector representing an entire passage.

These are produced by passing text through an embedding model and pooling the per-token representations. The resulting vector captures the meaning of the full passage, not individual words.

This is the foundation of semantic search:

At index time: embed every document. Store the vectors.
At query time: embed the query using the same model.
Find documents whose vectors are most similar to the query vector (cosine similarity or dot product).
Return those documents.

The difference from keyword search: semantic search finds documents that mean the same thing even if they share no keywords. "How do I cancel my subscription?" matches "steps to terminate my account."

Dimensions: What They Mean and How Many You Need

Every embedding model description includes a number — 768, 1536, 3072 — and every vector database requires you to configure one. Here is how to reason about it.

What a dimension is

Each dimension is one independent axis in the model's learned semantic space. Two-dimensional geography uses latitude and longitude to locate any point on Earth's surface. A 768-dimensional embedding locates a piece of text using 768 independently varying features the model extracted during training. No single dimension has a human-interpretable label — dimension 412 doesn't mean "formality" — but together they form a space where semantic relationships become geometric ones.

Adding a dimension gives the model one more axis to separate concepts that differ in some way. Phrases that are nearly synonymous in a 2D space might be clearly distinct in 768D because there are 766 more axes to capture the differences. Concepts that crowd together in low-dimensional space spread out as dimensionality increases.

What you gain from more dimensions

Finer semantic discrimination. Subtle distinctions — "cancel my order" vs "cancel my subscription" vs "cancel my account" — are easier to separate in higher-dimensional space.
Less crowding. In low dimensions, many semantically unrelated concepts end up geometrically close to each other simply because there isn't room to keep them apart.
Better coverage of specialized domains. Technical, legal, and scientific vocabulary forms tight clusters in low-dimensional general-purpose spaces. More dimensions give the model room to spread them out.

When more dimensions stop helping

Two forces push back against adding dimensions.

Diminishing returns: after a point, additional dimensions encode increasingly subtle distinctions or correlated redundancies. Quality gains per added dimension shrink toward zero.

The curse of dimensionality: in very high-dimensional spaces, cosine similarity loses discriminating power — pairwise distances between random vectors converge as dimensionality grows, so everything starts to look roughly equally similar to everything else. This isn't a practical concern at the ranges current models use (hundreds to a few thousand), but it is why you cannot keep adding dimensions indefinitely.

Cost: storage and ANN index size scale linearly with dimensions. A 3072-dimensional collection costs roughly twice as much to store and index as a 1536-dimensional one with the same number of vectors.

What experiments find

The MTEB leaderboard (Massive Text Embedding Benchmark) evaluates embedding models across retrieval, classification, clustering, and reranking tasks and provides a consistent cross-model view of quality vs. dimension count.

Dimension range	Where you find it	Quality profile
256–512	Lightweight/fast retrieval, MRL-truncated models	Good for broadly distinct concepts; degrades on fine-grained distinctions
768–1024	Most open-source models (BERT-family, sentence-transformers)	Sweet spot for most production retrieval workloads
1536–3072	OpenAI text-embedding-3, Cohere embed-v3	Meaningful gains on hard retrieval tasks; meaningful cost increase
>3072	No major general-purpose model lives here	Returns don't justify cost

The jump from 256 → 768 is usually substantial. The jump from 768 → 1536 is real but modest for most tasks. Beyond 1536, gains are incremental.

Matryoshka Representation Learning: buying back dimensions cheaply

A recent training approach allows dimension truncation without retraining. Matryoshka Representation Learning (MRL) (Kusupati et al., 2022) trains the model to front-load the most important variation into early dimensions, so any prefix of the full embedding is itself a valid high-quality representation. You get a 1536-dimensional model that can serve at 256, 512, or 1024 dimensions at query time by simply discarding the later dimensions.

OpenAI's text-embedding-3 models expose this via a dimensions API parameter. Cohere's embed-v3 supports it. A growing number of open-source sentence-transformers models are trained with MRL as well.

This matters operationally: you can prototype at 256 dimensions (cheap, fast), measure retrieval quality on your actual queries, and increase dimensions only where quality drops materially.

Finding the right dimension count for your use case

Use an MRL-compatible model — text-embedding-3-small or text-embedding-3-large (OpenAI), embed-v3-english (Cohere), or any Matryoshka-trained sentence-transformers model.
Build a query eval set from your actual domain — 50 to 200 queries with known relevant documents. A set drawn from real user queries beats a generic benchmark.
Measure recall@10 at decreasing dimension levels: 1536 → 1024 → 512 → 256.
Find the knee — the dimension level where quality first drops more than 2–3 percentage points. Use the value just above it.
Lock it in before production — your vector database collection dimension is set at creation. Changing it means re-embedding every stored vector.

PM Takeaway

Dimension count is a cost-quality dial, not a "bigger is better" decision. For most enterprise retrieval tasks, 768–1024 dimensions hits the knee of the quality curve. If your model supports MRL, measure before committing — the right answer is specific to your content and query distribution, not to a model benchmark.

Why Embeddings Matter for AI Products

RAG systems depend entirely on embedding quality. If the embedding model can't distinguish "cancel my order" from "cancel my subscription," your retrieval returns the wrong documents and the LLM gives the wrong answer. Retrieval quality = embedding quality.

Known gap — numerical content: research evaluating 13 embedding models (Deng et al., 2025) found they perform significantly worse on queries involving numerical values, dates, or quantitative comparisons than on semantic similarity tasks — the core use case embeddings were designed for. Embedding models are highly reliable for semantic matching ("find documents about the same concept"); the reliability gap is specific to numerical and quantitative lookup. For use cases where numerical precision matters (financial documents, clinical data, scientific tables), hybrid search — combining dense embeddings with BM25 keyword matching — recovers meaningful accuracy.

Classification and clustering can use embeddings as features. An embedding model plus a simple classifier is often faster, cheaper, and more interpretable than a large generative model for many classification tasks.

Embedding drift: changing your embedding model makes existing vectors incompatible. All stored vectors must be regenerated. This is a real operational concern for production systems.

Cross-modal embeddings: models like CLIP put images and text in the same vector space, enabling text queries to retrieve images and vice versa. This underlies multimodal search and the image understanding in current multimodal models like GPT, Claude, and Gemini.

PM Takeaway

Embedding quality is the hidden multiplier on every RAG system. When AI gives wrong answers, check what was retrieved before debugging the prompt. Better prompts don't fix a retrieval problem — better embeddings or better chunking does.

The Meaning of Meaning, Numerically​

The Classic Example: Vector Arithmetic on Meaning​

Visualizing Embedding Space​

Sentence and Document Embeddings​

Dimensions: What They Mean and How Many You Need​

What a dimension is​

What you gain from more dimensions​

When more dimensions stop helping​

What experiments find​

Matryoshka Representation Learning: buying back dimensions cheaply​

Finding the right dimension count for your use case​

Why Embeddings Matter for AI Products​

Further Reading​