Glossary · Foundations

Embedding

An embedding is a numeric vector that represents the meaning of text, an image, or audio. Definition, top embedding models, and how they power search.

By Kadin Nestler · May 28, 2026 · Updated May 28, 2026

How an embedding is produced

You pass a piece of text (or an image) through an embedding model — a neural network trained to map semantic similarity into geometric closeness. The output is a fixed-length vector, typically 384 to 4096 dimensions. Two texts about the same topic — "kitten" and "baby cat" — land near each other in that space even though they share no words. Two texts about different topics land far apart.

Top embedding models in 2026

  • OpenAI text-embedding-3-large — 3072 dimensions, strong general-purpose.
  • OpenAI text-embedding-3-small — 1536 dimensions, cheaper, faster.
  • Cohere embed-v3 — strong multilingual.
  • Voyage AI voyage-3 — popular with Anthropic ecosystem teams.
  • Open-source: bge-large, e5-mistral, nomic-embed — competitive on MTEB benchmark, hostable locally.

What embeddings power

  • Semantic search — find documents by meaning, not keyword.
  • RAG retrieval — fetch relevant context for LLM grounding.
  • Clustering — group similar customer tickets, products, or queries.
  • Recommendation — surface products similar to ones a user liked.
  • Deduplication — identify near-duplicate content across a corpus.
  • Classification — categorize new items by proximity to labeled examples.

Practical considerations

Pick the embedding model before the vector database — switching models means re-embedding the entire corpus. Cost: roughly $0.02 to $0.13 per million tokens for managed embedding APIs in 2026. For private or regulated data, host an open-source model locally (Nomic, BGE) to avoid sending content to a third party. The MTEB leaderboard (Massive Text Embedding Benchmark) tracks model quality across 56 tasks and is the standard reference.

What it means for your business

Embeddings are invisible plumbing in most AI products, but the model and dimension choices determine whether your "AI search" actually finds the right document or returns garbage on edge cases.

  • Vector Database — A vector database stores embeddings and finds similar items by approximate nearest-neighbor search. Definition, top vendors, and when you actually need one.
  • Retrieval-Augmented Generation (RAG) — RAG is the technique of fetching documents from a database and feeding them to an LLM before it answers. Definition, architecture, and SMB use cases.
  • Large Language Model (LLM) — A Large Language Model is a transformer-based neural network trained on trillions of tokens to predict the next token. Definition, key models, and business use.
  • AI Knowledge Base — An AI knowledge base is a structured corpus of documents an AI agent retrieves from to answer questions. Definition, architecture, and SMB setup tips.
  • AI Grounding — Grounding is the practice of tying AI outputs to verified source material. Definition, techniques, and why it is the primary defense against hallucination.