Skill Design: Give Your Agents SOPs, Not Just Prompts

How to encode reusable capabilities that compose into workflows.

Most of us have been there. You craft the perfect prompt, get great output, and save it somewhere — a notes file, a Notion page, a document called "good prompts." Next time you need that task, you copy-paste. It works. You feel efficient.

Then you want something slightly different. You tweak the prompt. It works okay. You save that version too.

Six months later, you have seventeen variations of the same prompt. Some are outdated. Most are redundant. All are fragile — change one word and the output breaks.

This isn't skill. It's prompt hoarding.

A skill is a reusable capability encoded for an agent. Not a prompt you copy. A defined, documented, versioned capability with clear scope, explicit instructions, examples, constraints, and quality criteria.

The difference between a prompt and a skill is the difference between a script and a Standard Operating Procedure. A script says "Do X, Y, Z." An SOP says "Here's the goal, here's the process, here's how you know you're done, here's what you don't do."

AWS calls them "Agent SOPs" — natural language workflows that encode how tasks should be done. Their framework positions SOPs as a middle ground between code-defined workflows (maximum control, minimum flexibility) and model-driven agents (maximum flexibility, minimum control).

The right altitude is between two failures. Too specific: hardcoded logic that breaks when circumstances change. Too vague: high-level guidance that assumes context the model doesn't have. Skills operate at this middle altitude — specific enough to guide behavior, flexible enough to handle variations.

Let's look at a real example: Code Review.

Bad: Just a prompt.

"Review this code for bugs and tell me what's wrong."

Problems: "Bugs" is undefined. "Review" has no scope. No output format specified. No quality criteria. No constraints. Every time you use it, you get different output. No learning. No consistency.

Good: A skill definition.

Now every code review follows the same process. Results are consistent. You can improve the skill over time by adjusting instructions, adding examples, or refining constraints.

The real power of skills shows up when you combine them.

Research Competitor + Summarize Findings + Extract Key Points + Format as Report

If each skill is well-defined, you can chain them: "Research our top three competitors. Summarize findings for each. Extract three key insights per competitor. Format as a comparison report."

Each skill knows what it receives, what it produces, and what quality means. This isn't magic. It's engineering.

Generic agents come with generic skills. That's intentional. They're useful starting points, but they're not yours.

A generic agent has a "Summarize" skill. But what does "summarize" mean for your team?

Your industry has specific terminology. Your stakeholders have specific concerns. Your format follows specific conventions. Your audience has specific knowledge levels.

To get real value, you refine:

Generic: "Summarize this meeting"

Refined: "Summarize this meeting in our company's format: three sections (key decisions, action items with owners, open questions), bullets only, maximum 150 words per section, highlight blockers in red, format for Slack"

Now the skill is aligned with your workflow. It's not generic anymore. It's yours.

To design a skill, start by identifying repeating tasks: weekly reports, daily summaries, recurring analyses, standard reviews. These are skill candidates.

Define the scope: what does this skill do and what doesn't it do? A Code Review skill identifies bugs and security issues, checks for common patterns, suggests fixes. It doesn't architect new solutions, review test coverage, or explain design decisions from scratch.

Document the process as numbered instructions. Be explicit. If you skip a step in documentation, the agent will skip it in execution.

Provide examples of good output. For code review, each finding should have: issue description, file:line location, severity, and a suggestion.

Set constraints: what should the agent never do? For Code Review: never rewrite code, never suggest architectural changes, never mark issues as critical without evidence. These constraints prevent scope creep.

Add quality criteria: how does the agent know when it's done well? "Before submitting, verify every issue has a location, every issue has a suggestion, severities are justified." The agent can now self-check.

Once you've designed several skills, you have a library. A Code Review skill. A Summary skill. A Research skill. A Formatting skill.

When you need a workflow, you compose from the library: "Research [topic] → Summarize findings → Format for report → Review against style guide."

Each step pulls from your defined skills. Each skill knows its scope, constraints, and quality criteria.

This is how you scale AI usage: not by writing new prompts for every task, but by composing existing skills into new workflows.

Novice: Uses generic prompts. Copies prompts without modification. Gets inconsistent results. Doesn't document process.

Developing: Modifies existing prompts. Creates basic skill definitions. Some documentation. Inconsistent quality.

Proficient: Designs new skills from scratch. Documents clear instructions and constraints. Reuses skills across tasks. Composes skills into workflows. Iterates based on output quality.

Expert: Creates skill libraries. Optimizes skills over time. Shares skills across teams. Defines new skills from patterns. Composes workflows automatically.

Take a task you do repeatedly. Write down what you currently ask the AI to do. Define the scope: what it does and doesn't cover. Break it into numbered steps. Add two or three examples of good output. List three constraints: what should never happen. Add quality criteria: when is it done well?

You've just written your first skill definition.

Forward prompting writes instructions for one task at a time. Skill design encodes instructions for a class of tasks, forever. The ROI compounds: first use takes the same effort as a prompt; second use is faster; tenth use has identified edge cases and refined constraints; hundredth use is battle-tested, shareable, improvable.

Skills are infrastructure for AI work. Build them once. Use them always.

Ready to get started?