What is a Large Language Model (LLM)? Definition

What an LLM actually does

Strip the marketing away and an LLM does one thing: given a sequence of tokens (roughly: pieces of words), it predicts the probability of every possible next token. Sample from that distribution, append, repeat. Everything else — chat, code, agents, RAG — is built on that primitive. The 2017 "Attention Is All You Need" paper introduced the transformer architecture that makes this scale work.

The frontier model landscape (May 2026)

Anthropic Claude — Opus 4.x, Sonnet 4.x, Haiku 4.x. Strong on long context, reasoning, and code.
OpenAI — GPT-4o, GPT-5 tier. Strong general-purpose model, broad API ecosystem.
Google — Gemini 2.x Pro/Flash. Long context (2M tokens), tight integration with Google Workspace.
Meta — Llama 4 family. Open-weights, hostable in air-gapped environments.
Mistral, DeepSeek, Qwen — strong open alternatives, often cheaper per token for narrow tasks.

Capabilities and limits

Capabilities: in-context learning, multi-step reasoning, code generation, structured extraction, multilingual fluency, tool use, vision (multimodal models). Limits: hallucination on facts outside training data, no real-time information without retrieval, no persistent memory between sessions without a memory layer, cost and latency scale with token count.

How an SMB should think about model choice

Model choice is rarely the bottleneck — orchestration and integration are. Use a frontier model (Claude Sonnet, GPT-4o) for anything customer-facing where quality matters. Use a smaller, cheaper model (Claude Haiku, GPT-4o-mini, Llama 3.1 8B) for high-volume internal classification, summarization, or extraction tasks. Be model-agnostic by default — switching costs are usually one line of code if your prompt is well-structured.

What it means for your business

You do not choose an LLM the way you choose a software vendor. You choose two or three you can swap between, because price-performance shifts every quarter and lock-in to one model is a margin tax.

Embedding — An embedding is a numeric vector that represents the meaning of text, an image, or audio. Definition, top embedding models, and how they power search.
Prompt Engineering — Prompt engineering is the practice of writing instructions to LLMs to get reliable, structured output. Definition, techniques, and when to stop optimizing.
Fine-Tuning — Fine-tuning adapts a pre-trained LLM to a narrow task by training it further on labeled examples. Definition, cost, and when it beats prompting.
Claude Opus — Claude Opus is Anthropic's most capable model, tuned for deep reasoning, long context, and agentic coding. Definition, pricing, and when to use it.
Tool Use — Tool use is when an LLM calls external APIs, databases, or code on its own. Definition, function calling, and how it powers AI agents.