Skip to main content

The AI Tools Landscape

PM: Skim โ€” 20 min

Orientationโ€‹

The AI tools landscape changes faster than any other part of the software ecosystem. This page maps the major categories as of mid-2025, with enough context to evaluate new entrants. Treat specific product names as current examples, not permanent fixtures.

Frontier Model Providersโ€‹

These labs develop and deploy the largest, most capable models. Use them when task quality matters most.

ProviderModelsStrengths
AnthropicClaude Sonnet 4, Claude Opus 4Reasoning, safety, long context, coding
OpenAIGPT-4o, o3, o4-miniBreadth, structured outputs, vision
Google DeepMindGemini 2.5 Pro, FlashMultimodal, very long context (1M+ tokens)

Fast/Cheap Modelsโ€‹

These trade some quality for dramatically lower cost and latency. Many production systems use them as the default tier and escalate to frontier models only when needed.

ModelProviderNotes
Claude Haiku 4AnthropicExtremely fast; strong at classification and extraction
GPT-4o-miniOpenAIStrong all-rounder in the cheap tier
Gemini 2.0 FlashGoogleVery fast; strong on structured extraction
Llama 3.1 8BMeta (open)Self-hostable; useful for privacy-sensitive workloads

Open and Self-Hosted Modelsโ€‹

Open-weight models (weights publicly released, inference self-hosted) are the right choice for privacy-sensitive workloads, regulated industries, or high-volume applications where per-token costs become significant.

Llama 3 (Meta): The most widely deployed open model family. Strong performance at 8B and 70B scales. Available on all major inference platforms and hostable on-premises.

Mistral / Mixtral: French lab. The Mixtral 8ร—7B model demonstrated MoE architecture can match larger dense models at a fraction of the active-parameter cost.

Qwen (Alibaba): Strong multilingual coverage, especially Chinese.

Caveat: running open models requires real infrastructure and ML engineering capacity. Unless you have that, the cost savings over provider APIs rarely justify the operational overhead.

Embedding Modelsโ€‹

Every RAG system needs an embedding model to convert text to vectors.

ModelProviderNotes
text-embedding-3-small/largeOpenAIStrong general-purpose; most widely deployed
embed-english-v3CohereStrong on long documents
BAAI/bge-m3BAAI (open)Multilingual; good for non-English content
nomic-embed-textNomic (open)Small, fast, self-hostable

Match the embedding model to your content: a monolingual model performs poorly on multilingual corpora.

Orchestration Frameworksโ€‹

Higher-level abstractions for chains, agents, and RAG pipelines.

LangChain: First widely adopted framework. Comprehensive but complex; known for abstraction leakage. Good for prototyping; many teams write custom code in production.

LlamaIndex: Specialized in retrieval and RAG. Stronger than LangChain for search-centric applications.

Vercel AI SDK: Designed for frontend/TypeScript. First-class streaming support. Good choice for Next.js applications.

Direct SDK calls: For many production applications, the right answer is to call provider SDKs directly and implement lightweight orchestration. Frameworks add dependencies and upgrade complexity.

Integration Standardsโ€‹

Model Context Protocol (MCP): Anthropic's open standard for connecting LLMs to external tools and data sources. Defines a standard interface so any MCP-compliant model can use any MCP-compliant tool. Gaining adoption in editors, IDEs, and enterprise tools.

Function calling / tool use API: All major providers offer structured tool-use APIs. Schemas are provider-specific but semantically equivalent.

Inference Infrastructureโ€‹

When hosting open models or requiring lower latency than direct API calls:

  • Fireworks AI / Together AI: Hosted inference for open models; often cheaper and faster than model-lab APIs
  • Modal / Replicate: Serverless GPU compute for custom inference
  • Amazon Bedrock / Azure OpenAI: Enterprise access to major models inside cloud compliance frameworks
  • Ollama: Local inference for development and air-gapped environments
PM Takeaway

Avoid framework lock-in in production. LangChain and LlamaIndex are excellent for prototyping. Before deploying, evaluate whether you need the framework or whether direct SDK calls with your own lightweight wrappers are simpler to maintain and debug. Most production failures in agent systems trace back to framework behavior, not model behavior.

Further Readingโ€‹

Anthropic โ€” Anthropic Engineering Blog 2024 (2024)
Read:Blog post in full (non-technical).Skip:The technical spec (github.com/anthropic/mcp) unless you are building integrations.
MCP is Anthropic's open standard for connecting LLMs to external tools and data sources โ€” the "USB-C for AI integrations."