Prompt Engineering Concepts

PM: Read in full — 20 min

Prompting Is Software Engineering

Prompts are the primary interface between your product and the model. A poorly structured prompt that "works in testing" will fail in production on edge cases. A well-structured prompt is explicit, testable, and maintainable.

System Prompts and User Turns

Modern chat models use a structured conversation format:

System prompt: Set by your application, not visible to end users. Establishes persona, constraints, output format, and context. Processed first; high influence on all subsequent behavior.
User turn: The user's message.
Assistant turn: The model's response, also included in multi-turn history.

The system prompt is your lever for consistent application behavior. Everything that should be consistent across all users belongs in the system prompt.

Few-Shot Prompting

Providing examples of the desired input/output pattern in the prompt. This is often the most reliable way to specify complex formatting or nuanced classification.

Zero-shot: instruction only, no examples.
Few-shot: 3–10 examples inline.

Examples override ambiguous instructions. If your instruction says "respond in JSON" but your examples show inconsistent structures, the model follows the examples. Make examples canonical.

Chain-of-Thought Prompting

Asking the model to show its reasoning before giving a final answer dramatically improves performance on multi-step problems.

The key finding from Wei et al. (2022): adding "Let's think step by step" measurably improves accuracy on math and reasoning tasks. The model that explains its reasoning is more likely to get the answer right than the same model jumping directly to an answer.

Practical applications:

Before asking for a recommendation, ask for an analysis first
Before asking for a classification, ask for the criteria used
In agentic systems, require the model to plan before executing

The cost: chain-of-thought increases output token count, raising cost and latency.

What the research shows: Wei et al. (2022) found that chain-of-thought improves accuracy substantially on tasks with genuine multi-step structure — math, logical deduction, multi-hop reasoning — with gains that scale with model size and task difficulty. For tasks without multi-step structure (factual recall, simple classification), CoT provides no meaningful benefit and occasionally hurts by introducing distracting intermediate reasoning. Apply it selectively: enable CoT for tasks that actually require it, not uniformly.

Structured Outputs

Asking the model to respond in a specific format — JSON, Markdown table, XML — improves downstream parsing reliability. Most providers now support JSON mode or structured output mode guaranteeing valid parseable JSON.

Best practices:

Always include an example of the exact structure you want, not just a description
When using JSON mode, also include the schema in the prompt — the model doesn't see your code
Put format specification last in the system prompt, immediately before the boundary with the user turn

Temperature and Sampling

Temperature controls the randomness of token selection:

Temperature 0: always picks the highest-probability token. Deterministic, conservative, slightly repetitive.
Temperature 0.7 (common default): balanced — coherent with controlled variation.
Temperature 1.5+: creative and surprising; high risk of incoherence.

For accuracy-critical tasks (classification, extraction, QA): temperature 0 or near it.
For generative tasks (writing, brainstorming): 0.7–1.0.
Never use high temperature for tasks with a single correct answer.

Context Management

Every token in the context window costs money and attention. Efficient prompts:

Put important instructions first and last: models attend more to the beginning and end of prompts
Trim conversation history for long multi-turn interactions: keep the last N turns
Summarize rather than truncate: preserve meaning when dropping old context
Use explicit delimiters: <documents>…</documents> and <instructions>…</instructions> help the model distinguish data from instructions

Prompt Injection

When users can influence the prompt — especially in agentic systems — prompt injection is a risk: a malicious input that attempts to override your instructions.

Mitigations:

Delimit user-controlled content with XML tags and instruct the model that content inside is data, not instructions
For high-stakes agentic actions, require explicit confirmation before execution
Validate that model output doesn't reference instructions it shouldn't know about

PM Takeaway

Treat prompts as first-class software artifacts: version-control them, evaluate them with a test set, and measure quality before and after changes. A prompt change has the same blast radius as a code change in a user-facing path.

Prompting Is Software Engineering​

System Prompts and User Turns​

Few-Shot Prompting​

Chain-of-Thought Prompting​

Structured Outputs​

Temperature and Sampling​

Context Management​

Prompt Injection​

Further Reading​