AI for Developers

PM: Read in full — 15 min

The Developer AI Stack

AI is reshaping development workflows faster than any other technical role. Code generation, review, and explanation tools are now embedded in editors, terminals, and CI/CD pipelines. This page covers the high-signal use cases and where the tools still fall short.

High-Value Use Cases

Code Generation and Completion

AI coding assistants (GitHub Copilot, Claude Code, Cursor) generate code inline as you type or on demand. Highest-value scenarios:

Boilerplate and scaffolding: CRUD endpoints, schema migrations, test fixtures, configuration files — code that follows patterns the model has seen many times
Unfamiliar language or framework: generating idiomatic code in a language you use occasionally, where you know what you want but not the exact syntax
Standard algorithms: sorting, parsing, regex patterns, date handling

Where it degrades: novel business logic, cross-system integrations where context lives in files the model hasn't seen, and tasks requiring understanding of your specific data model.

Measured impact: Research on GitHub Copilot (Peng et al., 2023) found developers completed a controlled programming task 55.8% faster with AI assistance — and separate enterprise deployment data showed PR completion rates up roughly 26% across teams. The productivity gain took approximately 11 weeks to fully materialize as developers adapted their workflows to the tool. Teams using AI assistance also reported 56% higher unit test pass rates on first submission, suggesting the tool improves initial quality, not just speed.

Real-world gains are more modest than controlled-task studies suggest. A 2025 METR study of experienced open-source developers found approximately 19% productivity improvement in real conditions — substantially lower than the 55% figure from a single controlled task. Aggregate 2026 data puts real-world productivity gains in the 10–30% range for most teams,¹ with 84% of developers reporting they use AI tools regularly and AI accounting for approximately 41% of all code written.² The gap between controlled-study results and production results is consistent: context complexity, review overhead, and non-coding work all reduce the measurable impact.

Code Explanation and Understanding

LLMs excel at explaining what code does. Paste an unfamiliar function or a gnarly regex and ask for an explanation — the output is usually accurate and faster than tracing execution yourself.

Particularly valuable for: understanding legacy codebases, reviewing open-source library internals, and onboarding to an unfamiliar codebase.

Documentation Generation

Docstring generation is reliable and high-value. The model has seen thousands of well-documented functions and applies that pattern to your code. API documentation and README updates follow the same pattern.

Test Generation

Given a function signature and implementation, models generate unit tests covering common cases and edge cases. Works well for pure functions with well-defined input/output contracts.

Limitations:

Tests for code with complex state or database dependencies often need significant manual correction
The model generates tests that match what it sees — if the implementation is wrong, the tests may be wrong too
Coverage isn't quality; generated tests tend to over-test happy paths and under-test failure modes

Code Review Assistance

LLMs identify common bug patterns, code smells, and security issues. Useful for:

Catching obvious issues before human review
Identifying OWASP-style vulnerabilities (SQL injection, XSS, path traversal)
Flagging missing error handling or resource cleanup

Not a replacement for human code review on architectural decisions, business logic correctness, or system-wide implications.

Debugging

Pasting an error message + stack trace + relevant code and asking "what's wrong?" is highly effective. The model has seen many error messages and can identify common root causes quickly.

Particularly useful for: cryptic framework errors, async/concurrency bugs, environment configuration issues.

Workflow Patterns

The diff review pattern: paste your git diff before committing and ask for a sanity check. Fast, low-cost, catches obvious issues.

The rubber duck pattern: describe the problem you're trying to solve before asking for code. The act of writing the description surfaces ambiguity; the model's clarifying questions improve the solution.

The test-first pattern: write the test specification in comments or pseudocode, then ask the model to implement code that passes it. More reliable than asking for code and then tests.

The Agentic Shift

By 2026, AI coding assistants evolved from inline suggestion to autonomous task delegation. GitHub Copilot Agent Mode (February 2026), Cursor background agents (isolated VMs), Replit Agent 3 (200-minute autonomous runtime), and Google's Antigravity — an agent-first IDE from I/O 2026 that dispatches multiple parallel agents to plan, code, run, and self-verify against the browser — can autonomously write, test, and iterate across files with minimal human input.³⁴

This increases throughput but introduces a new verification challenge: AI-generated code now arrives faster than most teams can review it. The limiting factor is no longer generation speed — it is review and test capacity. Teams adopting agentic coding tools should:

Increase investment in automated testing and code review tooling
Define clear human-in-the-loop checkpoints for security-critical paths
Track code review throughput alongside code generation throughput

Limitations

Hallucinated APIs: models confidently generate code using function names, package versions, or API signatures that don't exist. Always verify against documentation.
Context blindness: generated code looks reasonable but may be inconsistent with your actual data models, conventions, or constraints the model hasn't seen.
Security: do not assume generated code is secure. Security review is still required.
Licensing: code generated by AI tools trained on open-source code may have licensing implications in some jurisdictions. Check your organization's policy.
Vibe coding: a pattern where developers accept AI-generated code by feel — iterating on prompts without carefully reading each generated file. Coined by Andrej Karpathy in early 2026. Effective for quick prototypes; high-risk for production code where subtle logic errors accumulate silently.

PM Takeaway

Developer AI adoption succeeds when it reduces friction on specific, well-scoped tasks (tests, docs, boilerplate) and fails when positioned as a replacement for engineering judgment. The highest ROI is eliminating hand-writing of things the model can reliably produce — not replacing the thinking.

Second Talent. (2026). "84% of Developers Use AI Tools. Productivity Gains Are Only 10%." https://www.secondtalent.com/resources/ai-developer-productivity-tools-2026/ ↩
Index.dev. (2026). "Top 100 Developer Productivity Statistics with AI Tools 2026." https://www.index.dev/blog/developer-productivity-statistics-with-ai-tools ↩
Faros AI. (2026). "Best AI Coding Agents for 2026: Real-World Developer Reviews." https://www.faros.ai/blog/best-ai-coding-agents-2026 ↩
Google Cloud. (2026). "Innovations from Google I/O 26 on Google Cloud." https://cloud.google.com/blog/products/ai-machine-learning/innovations-from-google-io-26-on-google-cloud ↩

The Developer AI Stack​

High-Value Use Cases​

Code Generation and Completion​

Code Explanation and Understanding​

Documentation Generation​

Test Generation​

Code Review Assistance​

Debugging​

Workflow Patterns​

The Agentic Shift​

Limitations​

Footnotes​