Skip to main content

Prompt Engineering Concepts

PM: Read in full โ€” 20 min

Prompting Is Software Engineeringโ€‹

Prompts are the primary interface between your product and the model. A poorly structured prompt that "works in testing" will fail in production on edge cases. A well-structured prompt is explicit, testable, and maintainable.

System Prompts and User Turnsโ€‹

Modern chat models use a structured conversation format:

  • System prompt: Set by your application, not visible to end users. Establishes persona, constraints, output format, and context. Processed first; high influence on all subsequent behavior.
  • User turn: The user's message.
  • Assistant turn: The model's response, also included in multi-turn history.

The system prompt is your lever for consistent application behavior. Everything that should be consistent across all users belongs in the system prompt.

Few-Shot Promptingโ€‹

Providing examples of the desired input/output pattern in the prompt. This is often the most reliable way to specify complex formatting or nuanced classification.

Zero-shot: instruction only, no examples.
Few-shot: 3โ€“10 examples inline.

Examples override ambiguous instructions. If your instruction says "respond in JSON" but your examples show inconsistent structures, the model follows the examples. Make examples canonical.

Chain-of-Thought Promptingโ€‹

Asking the model to show its reasoning before giving a final answer dramatically improves performance on multi-step problems.

The key finding from Wei et al. (2022): adding "Let's think step by step" measurably improves accuracy on math and reasoning tasks. The model that explains its reasoning is more likely to get the answer right than the same model jumping directly to an answer.

Practical applications:

  • Before asking for a recommendation, ask for an analysis first
  • Before asking for a classification, ask for the criteria used
  • In agentic systems, require the model to plan before executing

The cost: chain-of-thought increases output token count, raising cost and latency.

Structured Outputsโ€‹

Asking the model to respond in a specific format โ€” JSON, Markdown table, XML โ€” improves downstream parsing reliability. Most providers now support JSON mode or structured output mode guaranteeing valid parseable JSON.

Best practices:

  • Always include an example of the exact structure you want, not just a description
  • When using JSON mode, also include the schema in the prompt โ€” the model doesn't see your code
  • Put format specification last in the system prompt, immediately before the boundary with the user turn

Temperature and Samplingโ€‹

Temperature controls the randomness of token selection:

  • Temperature 0: always picks the highest-probability token. Deterministic, conservative, slightly repetitive.
  • Temperature 0.7 (common default): balanced โ€” coherent with controlled variation.
  • Temperature 1.5+: creative and surprising; high risk of incoherence.

For accuracy-critical tasks (classification, extraction, QA): temperature 0 or near it.
For generative tasks (writing, brainstorming): 0.7โ€“1.0.
Never use high temperature for tasks with a single correct answer.

Context Managementโ€‹

Every token in the context window costs money and attention. Efficient prompts:

  • Put important instructions first and last: models attend more to the beginning and end of prompts
  • Trim conversation history for long multi-turn interactions: keep the last N turns
  • Summarize rather than truncate: preserve meaning when dropping old context
  • Use explicit delimiters: <documents>โ€ฆ</documents> and <instructions>โ€ฆ</instructions> help the model distinguish data from instructions

Prompt Injectionโ€‹

When users can influence the prompt โ€” especially in agentic systems โ€” prompt injection is a risk: a malicious input that attempts to override your instructions.

Mitigations:

  • Delimit user-controlled content with XML tags and instruct the model that content inside is data, not instructions
  • For high-stakes agentic actions, require explicit confirmation before execution
  • Validate that model output doesn't reference instructions it shouldn't know about
PM Takeaway

Treat prompts as first-class software artifacts: version-control them, evaluate them with a test set, and measure quality before and after changes. A prompt change has the same blast radius as a code change in a user-facing path.

Further Readingโ€‹

Wei, J. et al. โ€” NeurIPS 2022 (2022)
Read:Abstract, Figure 1, Figure 2.Skip:Sections 3โ€“5 (experiments, results tables).
Simply prompting the model to "think step by step" dramatically improves performance on multi-step reasoning โ€” the foundation for modern reasoning models.
Kojima, T. et al. โ€” NeurIPS 2022 (2022)
Read:Abstract.Skip:Everything else.
"Let's think step by step" โ€” a single phrase added to any prompt โ€” unlocks multi-step reasoning without any task-specific examples.