Skip to main content

AI for Developers

PM: Read in full โ€” 15 min

The Developer AI Stackโ€‹

AI is reshaping development workflows faster than any other technical role. Code generation, review, and explanation tools are now embedded in editors, terminals, and CI/CD pipelines. This page covers the high-signal use cases and where the tools still fall short.

High-Value Use Casesโ€‹

Code Generation and Completionโ€‹

AI coding assistants (GitHub Copilot, Claude Code, Cursor) generate code inline as you type or on demand. Highest-value scenarios:

  • Boilerplate and scaffolding: CRUD endpoints, schema migrations, test fixtures, configuration files โ€” code that follows patterns the model has seen many times
  • Unfamiliar language or framework: generating idiomatic code in a language you use occasionally, where you know what you want but not the exact syntax
  • Standard algorithms: sorting, parsing, regex patterns, date handling

Where it degrades: novel business logic, cross-system integrations where context lives in files the model hasn't seen, and tasks requiring understanding of your specific data model.

Code Explanation and Understandingโ€‹

LLMs excel at explaining what code does. Paste an unfamiliar function or a gnarly regex and ask for an explanation โ€” the output is usually accurate and faster than tracing execution yourself.

Particularly valuable for: understanding legacy codebases, reviewing open-source library internals, and onboarding to an unfamiliar codebase.

Documentation Generationโ€‹

Docstring generation is reliable and high-value. The model has seen thousands of well-documented functions and applies that pattern to your code. API documentation and README updates follow the same pattern.

Test Generationโ€‹

Given a function signature and implementation, models generate unit tests covering common cases and edge cases. Works well for pure functions with well-defined input/output contracts.

Limitations:

  • Tests for code with complex state or database dependencies often need significant manual correction
  • The model generates tests that match what it sees โ€” if the implementation is wrong, the tests may be wrong too
  • Coverage isn't quality; generated tests tend to over-test happy paths and under-test failure modes

Code Review Assistanceโ€‹

LLMs identify common bug patterns, code smells, and security issues. Useful for:

  • Catching obvious issues before human review
  • Identifying OWASP-style vulnerabilities (SQL injection, XSS, path traversal)
  • Flagging missing error handling or resource cleanup

Not a replacement for human code review on architectural decisions, business logic correctness, or system-wide implications.

Debuggingโ€‹

Pasting an error message + stack trace + relevant code and asking "what's wrong?" is highly effective. The model has seen many error messages and can identify common root causes quickly.

Particularly useful for: cryptic framework errors, async/concurrency bugs, environment configuration issues.

Workflow Patternsโ€‹

The diff review pattern: paste your git diff before committing and ask for a sanity check. Fast, low-cost, catches obvious issues.

The rubber duck pattern: describe the problem you're trying to solve before asking for code. The act of writing the description surfaces ambiguity; the model's clarifying questions improve the solution.

The test-first pattern: write the test specification in comments or pseudocode, then ask the model to implement code that passes it. More reliable than asking for code and then tests.

Limitationsโ€‹

  • Hallucinated APIs: models confidently generate code using function names, package versions, or API signatures that don't exist. Always verify against documentation.
  • Context blindness: generated code looks reasonable but may be inconsistent with your actual data models, conventions, or constraints the model hasn't seen.
  • Security: do not assume generated code is secure. Security review is still required.
  • Licensing: code generated by AI tools trained on open-source code may have licensing implications in some jurisdictions. Check your organization's policy.
PM Takeaway

Developer AI adoption succeeds when it reduces friction on specific, well-scoped tasks (tests, docs, boilerplate) and fails when positioned as a replacement for engineering judgment. The highest ROI is eliminating hand-writing of things the model can reliably produce โ€” not replacing the thinking.