The AI Tools Landscape
Orientationโ
The AI tools landscape changes faster than any other part of the software ecosystem. This page maps the major categories as of mid-2025, with enough context to evaluate new entrants. Treat specific product names as current examples, not permanent fixtures.
Frontier Model Providersโ
These labs develop and deploy the largest, most capable models. Use them when task quality matters most.
| Provider | Models | Strengths |
|---|---|---|
| Anthropic | Claude Sonnet 4, Claude Opus 4 | Reasoning, safety, long context, coding |
| OpenAI | GPT-4o, o3, o4-mini | Breadth, structured outputs, vision |
| Google DeepMind | Gemini 2.5 Pro, Flash | Multimodal, very long context (1M+ tokens) |
Fast/Cheap Modelsโ
These trade some quality for dramatically lower cost and latency. Many production systems use them as the default tier and escalate to frontier models only when needed.
| Model | Provider | Notes |
|---|---|---|
| Claude Haiku 4 | Anthropic | Extremely fast; strong at classification and extraction |
| GPT-4o-mini | OpenAI | Strong all-rounder in the cheap tier |
| Gemini 2.0 Flash | Very fast; strong on structured extraction | |
| Llama 3.1 8B | Meta (open) | Self-hostable; useful for privacy-sensitive workloads |
Open and Self-Hosted Modelsโ
Open-weight models (weights publicly released, inference self-hosted) are the right choice for privacy-sensitive workloads, regulated industries, or high-volume applications where per-token costs become significant.
Llama 3 (Meta): The most widely deployed open model family. Strong performance at 8B and 70B scales. Available on all major inference platforms and hostable on-premises.
Mistral / Mixtral: French lab. The Mixtral 8ร7B model demonstrated MoE architecture can match larger dense models at a fraction of the active-parameter cost.
Qwen (Alibaba): Strong multilingual coverage, especially Chinese.
Caveat: running open models requires real infrastructure and ML engineering capacity. Unless you have that, the cost savings over provider APIs rarely justify the operational overhead.
Embedding Modelsโ
Every RAG system needs an embedding model to convert text to vectors.
| Model | Provider | Notes |
|---|---|---|
| text-embedding-3-small/large | OpenAI | Strong general-purpose; most widely deployed |
| embed-english-v3 | Cohere | Strong on long documents |
| BAAI/bge-m3 | BAAI (open) | Multilingual; good for non-English content |
| nomic-embed-text | Nomic (open) | Small, fast, self-hostable |
Match the embedding model to your content: a monolingual model performs poorly on multilingual corpora.
Orchestration Frameworksโ
Higher-level abstractions for chains, agents, and RAG pipelines.
LangChain: First widely adopted framework. Comprehensive but complex; known for abstraction leakage. Good for prototyping; many teams write custom code in production.
LlamaIndex: Specialized in retrieval and RAG. Stronger than LangChain for search-centric applications.
Vercel AI SDK: Designed for frontend/TypeScript. First-class streaming support. Good choice for Next.js applications.
Direct SDK calls: For many production applications, the right answer is to call provider SDKs directly and implement lightweight orchestration. Frameworks add dependencies and upgrade complexity.
Integration Standardsโ
Model Context Protocol (MCP): Anthropic's open standard for connecting LLMs to external tools and data sources. Defines a standard interface so any MCP-compliant model can use any MCP-compliant tool. Gaining adoption in editors, IDEs, and enterprise tools.
Function calling / tool use API: All major providers offer structured tool-use APIs. Schemas are provider-specific but semantically equivalent.
Inference Infrastructureโ
When hosting open models or requiring lower latency than direct API calls:
- Fireworks AI / Together AI: Hosted inference for open models; often cheaper and faster than model-lab APIs
- Modal / Replicate: Serverless GPU compute for custom inference
- Amazon Bedrock / Azure OpenAI: Enterprise access to major models inside cloud compliance frameworks
- Ollama: Local inference for development and air-gapped environments
Avoid framework lock-in in production. LangChain and LlamaIndex are excellent for prototyping. Before deploying, evaluate whether you need the framework or whether direct SDK calls with your own lightweight wrappers are simpler to maintain and debug. Most production failures in agent systems trace back to framework behavior, not model behavior.
Further Readingโ
- Agents and Tool Use โ how integration standards like MCP work in practice
- Cost and Latency Tradeoffs โ how to choose between tiers in production