The AI Tools Landscape

PM: Skim — 20 min

Orientation

The AI tools landscape changes faster than any other part of the software ecosystem. This page maps the major categories with enough context to evaluate new entrants as they emerge. Treat specific product names as current examples, not permanent fixtures — verify versions against provider documentation before making architectural decisions.

Frontier Model Providers

These labs develop and deploy the largest, most capable models. Use them when task quality matters most.

Provider	Models	Strengths
Anthropic	Claude Opus 4.8, Claude Sonnet 5	Reasoning, safety, long context, coding
OpenAI	GPT-5.5, GPT-5.4-mini	Breadth, structured outputs, vision
Google DeepMind	Gemini 3.5 Flash, Gemini 3.1 Pro	Multimodal, very long context (1M+ tokens), strong agentic performance

Claude Sonnet 5 (released June 2026) narrows the gap to the Opus tier on reasoning and agentic tasks at roughly one-fifth the cost, and is now the default tier for most production workloads.¹

Gemini 3.5 Pro — Google's next flagship (2M-token context, "Deep Think" reasoning) — is in limited preview with general availability expected July 2026; the table's Gemini 3.1 Pro entry remains current until then.²

Claude's top tier: Fable 5 and Mythos 5

Anthropic's Mythos-class models — its most capable tier, positioned above Opus — reached the public in June 2026 as Claude Fable 5, now generally available to enterprise customers and paid subscribers.³ A restricted variant, Claude Mythos 5 (the same underlying model with certain safeguards lifted), is available only to a small set of vetted cyberdefense and infrastructure partners through Project Glasswing. Fable 5 reports strong agentic-coding results, completing equivalent work with fewer tool calls than previous Opus-tier models.

Upcoming: GPT-5.6

OpenAI previewed the GPT-5.6 family in June 2026 — Sol (flagship), Terra (balanced, ~2× cheaper than GPT-5.5), and Luna (low-cost).⁴ It is a limited preview to a small set of vetted partners; general availability is expected within weeks. The Frontier and Fast/Cheap tables will be updated when GPT-5.6 reaches GA.

Fast/Cheap Models

These trade some quality for dramatically lower cost and latency. Many production systems use them as the default tier and escalate to frontier models only when needed.

Model	Provider	Notes
Claude Haiku 4.5	Anthropic	Extremely fast; strong at classification and extraction
GPT-5.4-mini	OpenAI	Strong all-rounder in the cheap tier
Gemini 3.1 Flash Lite	Google	Very fast; strong on structured extraction
DeepSeek-V4-Flash	DeepSeek API	Frontier-quality at a fraction of frontier pricing
Llama 8B	Meta (open)	Self-hostable; useful for privacy-sensitive workloads

Open and Self-Hosted Models

Open-weight models (weights publicly released, inference self-hosted) are the right choice for privacy-sensitive workloads, regulated industries, or high-volume applications where per-token costs become significant.

Llama (Meta): The most widely deployed open model family. The current generation is Llama 4 (e.g., Scout, with a very long context window, and Maverick); smaller Llama 3.1-class models such as the 8B remain popular for self-hosting. Available on all major inference platforms and hostable on-premises.

DeepSeek: Chinese lab releasing open-weight models that have matched or exceeded frontier closed models on coding and reasoning benchmarks. DeepSeek-V3 and DeepSeek-R1 brought significant attention to the cost efficiency of open models; DeepSeek-R1 also introduced RLVR (Reinforcement Learning from Verifiable Rewards), a training approach that elicits reasoning from reward signals alone without human-labeled reasoning traces — a direction several labs are now exploring for reasoning model training. DeepSeek-V4 (released April 24, 2026, MIT license) ships in two variants — V4-Pro (1.6T total / 49B active) and V4-Flash (284B total / 13B active), both with 1M-token context.⁵ After a permanent June 2026 price cut, V4-Pro runs roughly 12× cheaper than GPT-5.5 at comparable benchmark performance — still among the widest open-vs-closed cost gaps to date. An official V4 GA with time-of-day API pricing is expected mid-July 2026.

Mistral / Mixtral: French lab. The Mixtral 8×7B model demonstrated MoE architecture can match larger dense models at a fraction of the active-parameter cost; the current flagship is Mistral Large 3 (a ~675B-total / 41B-active MoE).

Qwen (Alibaba): Strong multilingual coverage, especially Chinese. The Qwen 3.5 family (early 2026) is competitive with Western open models on coding, math, and long-context tasks.

Caveat: running open models requires real infrastructure and ML engineering capacity. Unless you have that, the cost savings over provider APIs rarely justify the operational overhead.

Embedding Models

Every RAG system needs an embedding model to convert text to vectors.

Model	Provider	Notes
text-embedding-3-small/large	OpenAI	Strong general-purpose; most widely deployed
embed-english-v3	Cohere	Strong on long documents
BAAI/bge-m3	BAAI (open)	Multilingual; good for non-English content
nomic-embed-text	Nomic (open)	Small, fast, self-hostable

Match the embedding model to your content: a monolingual model performs poorly on multilingual corpora.

Orchestration Frameworks

Higher-level abstractions for chains, agents, and RAG pipelines.

LangChain: First widely adopted framework. Comprehensive but complex; known for abstraction leakage. Good for prototyping; many teams write custom code in production.

LlamaIndex: Specialized in retrieval and RAG, and has expanded into full workflow orchestration via LlamaIndex Workflows — an event-driven architecture for complex, stateful agent pipelines. Stronger than LangChain for search-centric applications; increasingly competitive on multi-step agent workloads.

Vercel AI SDK: Designed for frontend/TypeScript. First-class streaming support. Good choice for Next.js applications.

Microsoft Agent Framework 1.0 (April 2026): Production-ready multi-agent SDK for .NET and Python, merging Microsoft's AutoGen and Semantic Kernel projects. AutoGen has transitioned to maintenance mode (bug fixes and security patches only) — new multi-agent projects should use Microsoft Agent Framework instead.

Direct SDK calls: For many production applications, the right answer is to call provider SDKs directly and implement lightweight orchestration. Frameworks add dependencies and upgrade complexity.

Integration Standards

Model Context Protocol (MCP): Anthropic's open standard for connecting LLMs to external tools and data sources. Defines a standard interface so any MCP-compliant model can use any MCP-compliant tool. Gaining adoption in editors, IDEs, and enterprise tools.

The protocol is maturing quickly: the specification scheduled for July 2026 makes MCP stateless at the transport layer, so servers scale on ordinary HTTP infrastructure without sticky sessions, and adds an Extensions framework (long-running Tasks, server-rendered MCP Apps) plus OAuth/OIDC-aligned authorization.⁶ Adoption is broad — public MCP registries index well over 16,000 servers. Security caveat: U.S. NSA/CISA guidance (June 2026) flags tool-poisoning and over-broad authorization as the main MCP risks to control.⁷

Function calling / tool use API: All major providers offer structured tool-use APIs. Schemas are provider-specific but semantically equivalent.

Inference Infrastructure

When hosting open models or requiring lower latency than direct API calls:

Fireworks AI / Together AI: Hosted inference for open models; often cheaper and faster than model-lab APIs
Modal / Replicate: Serverless GPU compute for custom inference
Amazon Bedrock / Azure OpenAI: Enterprise access to major models inside cloud compliance frameworks
Ollama: Local inference for development and air-gapped environments

PM Takeaway

Avoid framework lock-in in production. LangChain and LlamaIndex are excellent for prototyping. Before deploying, evaluate whether you need the framework or whether direct SDK calls with your own lightweight wrappers are simpler to maintain and debug. Most production failures in agent systems trace back to framework behavior, not model behavior.

Orientation​

Frontier Model Providers​

Fast/Cheap Models​

Open and Self-Hosted Models​

Embedding Models​

Orchestration Frameworks​

Integration Standards​

Inference Infrastructure​

Further Reading​

Footnotes​