Claude Code Skill Development: The 30K-Word Field Guide

Six months of breakage, 40+ production-grade Skills, and a 78,000-word internal spec compressed into one 30,000-word read. Beginner to designing your own toolchain — start here.

Five-layer Claude Code Skill architecture diagram from SKILL.md core to engineering documents, illustrating how scripts, prompts, data, and resources stack into a production workflow

Claude Code Skill Development: A 30,000-Word Field Guide From 6 Months and 40+ Production Skills

You're looking for the most complete Claude Code Skill development guide on the open web?

This is it. Six months of breakage, 40+ production-grade Skills, and a 78,000-word internal spec compressed into a 30,000-word distillation. Beginner to "I can build my own toolchain" — one read.


What's inside

30,000 words · 18 chapters · 5 modules · 1 full hands-on build

Module What you'll walk away with
Core concepts What a Skill actually is, how SKILL.md works, the 200K-token survival rules, step-document orchestration
Execution layer Why scripts cost zero context, the 6 ingredients of a working Prompt template, variable plumbing
Data layer Run-directory isolation, checkpoint resume, the 3-layer config model, Schema-driven dynamic forms
Resource layer Credential safety, killing magic strings, presets, HTML output templates
Engineering setup / guide / changelog / troubleshoot — the 4 documents that make a Skill maintainable and shareable

By the end you'll be able to:

  • Build your first runnable Skill from scratch in 20 minutes
  • Replace SubAgents with scripts and cut token consumption by ~80%
  • Wire up checkpoint resume so a compacted context doesn't lose your work
  • Turn private workflow knowledge into reusable software assets

Things people are actually building with this:

  • Content production: long-form blog posts, short-form social posts, newsletter drafts, all from one trigger
  • Data wrangling: PDF → Markdown at scale, video transcript extraction, multi-source aggregation
  • Auto-publishing: image generation, multi-platform fan-out, scheduled scrapers
  • Research: competitor briefings, deep-dive equity reports, Reddit signal mining
  • Dev assist: code review, PR drafting, doc generation

I've used this same spec to build 40+ Skills covering content, research, dev, and ops. This is not a beginner walkthrough. This is the production methodology.


How to read this

Who this is for: people who already use Claude Code's basics (chat, file operations, command execution) and want to design their own workflow Skills.

What you'll get: the complete design logic behind my Skill spec, plus the muscle memory to ship production Skills on your own — directory layout, script discipline, context budgeting, resume mechanics.

What you should already know: basic Claude Code use, comfort in a terminal, JSON (a structured data format) and Markdown (a lightweight markup language) syntax. Coding skill is not required, but knowing some Python will make a few sections smoother.

How this is different from other tutorials: most Claude Code guides teach you "how to use it." This one teaches you "how to build with it." You stop being an AI consumer and start being an AI workflow designer.

One thing that might rewire your reading: this article isn't only written for you — it's also written for your Agent. Drop the whole thing into Claude Code and it can use the architecture, naming rules, and script conventions inside as a direct reference for scaffolding your next Skill. In other words, this article is itself an Agent starter kit. For an even fuller manual, the unabridged 78,000-word internal spec is what I feed Claude when I want it to operate at full production quality.

Most people use AI like a chat box — type something in, get something out, react with "wow" or "meh."

A smaller group thinks differently. They want AI to deliver against their standard, their process, their rhythm. Reliably. Repeatedly.

If that second group sounds like you, welcome.

Skill architecture at a glance

Before the chapters, here's the global picture. The Skill spec I use is a five-layer architecture. Each layer solves a different problem:

Layer Components Purpose
Layer 5 — Engineering setup.md · guide.md · changelog.md · troubleshoot.md Make it usable and maintainable by other people
Layer 4 — Resources credentials/ · definitions/ · presets/ · templates/ Safe, consistent, configurable
Layer 3 — Data runs/ · state/ · config/ · params.schema.json Where data lives, how a Skill recovers from a crash
Layer 2 — Execution scripts/ · prompts/ · variable placeholders Scripts do the labor, Prompts conduct the brainwork
Layer 1 — Core SKILL.md · workflow/ · platform constraints The skeleton: what a Skill is

Bottom up: skeleton → muscle → blood → wardrobe → quality control. Each layer rests on the one below, but you don't need all five to start. The simplest Skill is just one file in Layer 1.


Skill development guide overview — isometric five-layer stack showing Core, Execution, Data, Resource, and Engineering

Table of contents

Part 1 — Core concepts (mandatory)

  1. What is a Skill?
  2. SKILL.md — the entry file
  3. Platform constraints
  4. Step documents

Part 2 — Execution layer (where the work happens)
5. Script spec
6. Prompt templates
7. Variable placeholders

Part 3 — Data layer (the foundation under runtime)
8. Run-data spec
9. Parameter config spec
10. Parameter Schema spec

Part 4 — Resource layer (the polish)
11. Credential management
12. Constant definitions
13. Preset configs
14. HTML templates

Part 5 — Engineering (the craft)
15. setup.md
16. guide.md
17. changelog.md
18. troubleshoot.md

Hands-on: build your first Skill from scratch
Appendix
Closing notes


Part 1 — Core concepts (mandatory)

Core concepts layer — isometric foundation slab with SKILL.md, constraints, step docs, and definition pillars

This part is the foundation of the whole guide. No matter what kind of Skill you set out to build, these four chapters are non-negotiable. Like building a house — fancy finishes don't matter if the foundation cracks.


1. What is a Skill? — Where the whole system starts

1.1 What the spec says

You know Claude Code? It's the CLI AI tool from Anthropic — you talk to it in your terminal and it writes code, runs analyses, and edits files for you.

A Skill is Claude's "skill plug-in" — you write a structured set of instructions, and Claude follows your rules to complete a specific kind of task. Skills don't only run inside Claude Code; they also work on Claude.ai (the web app) and through the API. Anthropic positions them as a cross-platform open standard.

Here's an analogy. If Claude Code is a new intern, then a Skill is the standard operating procedure you write for that intern. Good SOP, the intern ships work on their own. Bad SOP, the intern freezes.

Anthropic's official definition is sharper:

Skills are portable instruction sets that extend what Claude can do. Think of them as "recipes" — structured knowledge that Claude can follow to perform specific tasks consistently and well.

The Complete Guide to Building Skills for Claude, Anthropic, 2026

Every Skill is a workflow. They just differ in step count.

The simplest Skill is one file with one step.

A complex Skill can be a dozen steps deep, calling scripts, spinning up SubAgents (think of a SubAgent as Claude's intern's intern — a smaller AI that takes a chunk of the work off the main one), generating reports, completing an entire pipeline end to end.

What can a Skill actually do?
Category Examples
Content creation Auto-write long-form posts, generate slide decks, translate articles
Social ops Scrape platform data, draft posts, batch-generate notes
Dev assist Code review, SEO audit, build automation
Data work Bulk collection, scoring/analysis, format conversion

1.2 Why it's designed this way

Why document-driven instead of code-driven?

In traditional development, you write code that tells a program what to do. Claude is a different beast — it understands natural language. You don't need to write code to direct it. You need to write a clear instruction document.

Document-driven design buys you four things:

  1. Zero coding floor — you can ship a Skill without writing code, as long as you can write down what should happen
  2. Maximum readability — anyone who opens the doc understands what the Skill does
  3. Easy to maintain — changing behavior means editing prose, not recompiling and redeploying
  4. Progressive enhancement — start with one file, add scripts, configs, and templates only when you need them

Anthropic boils the design philosophy down to three principles:

  1. Progressive Disclosure — load detail in tiers; Claude only reads the deeper material when it's needed
  2. Composability — every Skill runs on its own, but composes with other Skills and with MCP (Model Context Protocol — the standard interface that lets AI call external tools)
  3. Portability — works the same on Claude.ai, Claude Code, and the API

My spec sits on top of those three and extends them with script discipline, a data layer, and a resource layer — basically the engineering bits.

Why bother with a spec at all?

Building Skills without a spec is like building houses without a building code. Every Skill ends up with a different layout, nobody can read each other's work, and when something breaks, nobody knows where to look.

My spec answers three core questions:

Question The spec's answer
Where does a file go? Fixed directory templates
How is each file written? Standard templates and required fields per file type
How are files chained? A workflow table that defines step order and data flow

2. SKILL.md — the entry file

SKILL.md entry file — isometric file card with 16 frontmatter field badges and 500-line ceiling ruler

2.1 What the spec says

SKILL.md is the only entry file for any Skill. Claude Code uses it to discover and load your Skill.

Think of SKILL.md as the cover and table of contents of a book:

  • Cover (the Frontmatter — the metadata block at the top of the file): tells the system "what I'm called and what I can do"
  • Table of contents (the workflow table): tells Claude "in what order to execute which steps"
  • Body (everything below): tells Claude "when to trigger me, how to behave, what to reference"

Avoid time-bound instructions: Anthropic's best-practice docs explicitly warn against writing things like "if before this date, use the old method." Write the current method in the body and tuck legacy modes into a collapsed block. Otherwise the doc rots into misinformation.

The four-part naming convention

Every Skill in my system gets a four-part name: prefix-domain-object-action.

Take awp-social-xhs-creating:

Part Value Meaning
prefix xiangyu Fixed identifier (you'd swap in your own)
domain social Social domain
object xhs The platform being acted on
action creating The verb

A few more examples:

Name Meaning
awp-content-ppt-generating content domain + slide deck + generate
awp-dev-feature-designing dev domain + feature + design

I currently use 13 official domains: dev, doc, social, content, scrape, util, skill management, knowledge base, automation, video, github, seo, cms.

Action suffixes are always English -ing forms — -designing, -building, -reviewing, -creating, -collecting, -publishing, and so on. 15 standard actions total.

Why this much structure? Because Skill counts grow. With 5 Skills, ad-hoc names are fine. With 50, a missing convention is chaos. Four-part naming lets you tell from the name alone what domain a Skill is in, what it operates on, and what it does.

Vs. official: Anthropic's name field only requires kebab-case (lowercase + hyphens) and forbids prefixes like claude or anthropic. The four-part scheme is my own answer to managing 40+ Skills. If you only have a handful, a simple kebab-case name is plenty.

Required Frontmatter fields

Each SKILL.md opens with a metadata block (Frontmatter — the section wrapped in three dashes). Two fields are mandatory:

Field Rules Example
name ≤ 64 characters, lowercase letters / digits / hyphens only awp-social-xhs-creating
description ≤ 1024 characters, describes what + when to trigger "Deeply scrape Xiaohongshu creator data and auto-draft notes. Triggered when the user says 'write Xiaohongshu', 'draft a note', etc."

Important: write description in the third person — "Extracts text from a PDF and produces a report," not "I can help you extract PDFs" or "You can use this to extract PDFs." The reason: description is injected into the system prompt, and first/second-person phrasing breaks Skill discovery and matching. This is explicit in Anthropic's guidance.

A handful of optional fields:

Field Notes
allowed-tools Tools the Skill is allowed to call
model Pin a specific model
context Set to fork to run in an isolated environment
hooks Lifecycle hooks (experimental)
user-invocable Whether the Skill shows up in the slash-command menu

Heads up on portability: allowed-tools is officially supported. model, context, hooks, and user-invocable are Claude Code CLI extensions and may not be recognized on Claude.ai or via the API. If you only run Skills inside Claude Code, use them freely. If you need to ship cross-platform, stick to the official fields.

Three-tier loading

This part is genuinely clever. Claude doesn't read all your files at once — it loads in tiers:

Tier When loaded What's in it Cost
L1 Always Just the name and description ~100 tokens (a token is roughly a syllable; one CJK character is 1–2 tokens) per Skill
L2 On trigger The body of SKILL.md Keep under 500 lines (Anthropic best practice)
L3+ On demand Step docs, references, etc. Unlimited

You don't carry an encyclopedia in your pocket. You remember the table of contents and open the right chapter when you need it.

What makes the tiering elegant: L1 burns ~100 tokens per Skill (a one-line summary) so Claude knows the Skill exists. L2 only loads when triggered. L3 only loads when you reach the step that needs it.

This three-tier loading lines up exactly with Anthropic's Progressive Disclosure principle — they list it as the first of the three foundational principles. My spec just adds the per-tier token math and recommended line counts.

The "one level deep" rule: Anthropic best practice says references inside SKILL.md should be at most one level deep. If a referenced file references another file (nested chain), Claude may only head -100 the second level, losing information. All reference files should link directly from SKILL.md, never form chains.

The workflow table

The workflow table is the heart of SKILL.md — it defines the execution sequence. Six columns:

Step Role Executor Doc Input Output
01 Initialize Main Agent step01-init.md User trigger state/
02 Collect data Script step02-collect.md User params step02-collect/
03 Analyze SubAgent step03-analyze.md step02 output step03-analyze/
04 Generate output Script step04-output.md step03 output output/

What each column means: Step number, Role (a 2–6-word summary), Executor (who does it), Doc (where the instructions live), Input (what's needed), Output (what's produced).

One table strings the entire workflow together. What each step does, who does it, where data comes from, where results go — visible at a glance.

Checklist mode: Anthropic best practice recommends giving complex workflows a copyable progress checklist. Claude can paste the checklist into its reply and tick boxes as it goes. Better than a plain step list — both you and Claude always know how far along you are. Example:

Task Progress:
- [ ] Step 1: Initialize the run directory
- [ ] Step 2: Collect data
- [ ] Step 3: Score content
- [ ] Step 4: Generate report
Dynamic context injection

SKILL.md also supports a special syntax that runs a command before the document is sent to Claude, then injects the command output into the document.

If you build a code-review Skill, for example, you can write "fetch the diff for the current PR" inside the doc. The system runs that command first, splices the result into the doc, and what Claude sees is a fully contextualized brief.

Useful for PR review, environment detection — anywhere you need real-time info baked in.

Complexity ramp

Start minimal, grow on demand:

  1. Just SKILL.md — one file, the whole Skill
  2. Steps overflow → add a workflow/ directory, split out step docs
  3. Need to call APIs → add scripts/ and credentials/
  4. Need reference material → add reference/
  5. Need configuration → add config/

Progressive enhancement. No premature scaffolding.

Multi-mode Skills

When a Skill needs to support more than one execution mode (say, Clone mode and Timeline mode), the workflow folder gets sub-folders by mode:

Folder Purpose
workflow/clone/ Clone-mode steps
workflow/timeline/ Timeline-mode steps
workflow/shared/ Steps shared between modes

Each mode has its own workflow table. The user picks a mode at trigger time, and the Skill follows that mode's sequence.

2.2 Why it's designed this way

Why force flat steps?

The spec bans sub-step numbering (step02a, step02-1, that kind of thing). Sub-steps blur the flow — Claude reads "step02a" and isn't sure whether it's part of step02 or its own thing.

Flat numbering treats every step as one independent, complete unit of work. Like an assembly line — each station does one thing and hands off to the next.


3. Platform constraints — the hard limits you have to know

3.1 What the spec says

200,000 tokens is your basic life support. Every line of doc you write is competing for that budget.

You can be brilliant inside the laws of physics, but you can't break them. Platform constraints are Claude Code's laws of physics — design freely within them, never against them.

Skill install paths
Scope Path Notes
User (default) personal skills/ directory Personal use, cross-project
Project .claude/skills/ inside the repo Team-shared, repo-specific
Enterprise system-level path Admin-deployed

Precedence: enterprise > project > user.

How to deploy: in Claude Code, drop the folder into the skills directory; on Claude.ai, zip the Skill folder and upload. The filename SKILL.md must match exactly (case-sensitive).

Tool hard limits
Tool Key limit Consequence
File read ~25,000 tokens per call Big files must be chunked
File edit Must read before editing Or you get an error
File write Overwriting an existing file requires a prior read Same
Shell command 30,000-character output cap, default 2-min timeout Long-running commands get killed
External tools ~25,000-token output cap Big payloads need pagination
SubAgents Up to 10 concurrent, no nesting (spec recommends 2 per round) Nesting fails

Here's something I've noticed: people who genuinely understand the context window write Skills that run circles around people who don't.

Why? Because they know they're managing a scarce resource. Like good programmers understand memory, like good writers understand reader attention — good Skill designers understand the token window.

That awareness is most of the moat.

Context window — the constraint that matters most

The current top-tier and balanced Claude models (the opus and sonnet aliases) ship with a 1M-token context window at standard pricing — roughly 500,000–750,000 English words, depending on language mix. The lighter haiku tier still uses a 200K window. Always check the official model docs for current limits — I use tier aliases (opus / sonnet / haiku) instead of pinned versions so the advice stays valid as new models ship.

Usable space is generous, but the system still reserves some for itself.

Context overflow = early conversation gets auto-compacted, and key details may go missing. Less of a red line at 1M than at 200K, but the discipline still pays off — frugal context management is a good habit at any window size.

Where your tokens go
Source Cost How to control it
System instructions ~5,000 (fixed) Out of your control
SKILL.md ~2,000–5,000 Trim doc length
Step docs ~1,000–3,000 each Load on demand
File reads ~100–25,000 each Chunk
SubAgent returns accumulates Minimal returns + compact
Conversation history accumulates Periodic compaction

Recommended token budget split:

Use Budget
System instructions 5,000
Skill docs 10,000
File reads 30,000
SubAgent returns 50,000
Conversation history 85,000
Total 180,000

My rule of thumb: at 70% (~126,000) start compacting actively, at 85% (~153,000) force a compaction.

Supported runtimes
Runtime Package manager Banned
Python uv (modern package manager) pip / poetry / conda
Node.js pnpm (high-performance npm alternative) npm / yarn
Deno (newer JS runtime) built-in
Bash (shell scripts)

Banned command patterns: interactive editors (vim / nano / less), interactive operations (git rebase -i), interactive interpreters (Python REPL — the interactive command line), infinite loops. Claude Code doesn't support interactive input.

3.2 Why it's designed this way

Why does the spec recommend only 2 SubAgents per round?

Claude Code supports up to 10 concurrent SubAgents. My spec caps it at 2 per round. Why?

Picture six SubAgents finishing at once, each carrying ~10,000 tokens of execution history. That's 60,000 tokens injected into the main context in a single beat — a third of your usable space, gone.

Round-based is much safer:

  1. Round 1: SubAgents 1 and 2 finish → compact (release ~20,000 tokens)
  2. Round 2: SubAgents 3 and 4 finish → compact
  3. Round 3: SubAgents 5 and 6 finish → compact

Always under control, never blown out.

Why ban pip / npm?

pip has no real lockfile (today's install and tomorrow's may differ); npm's node_modules bloats wildly. uv and pnpm are the modern replacements — faster, more reproducible.

Why must read precede edit / write?

Safety. It blocks you (and Claude) from accidentally clobbering a file you were about to edit. Forcing a prior read is like checking the original contract before redlining it.


4. Step documents — the soul of a workflow

Step documents — isometric staircase of numbered step cards with executor badges and flow arrows

4.1 What the spec says

Step documents live under workflow/. Each file maps to one step in the workflow.

Filename format: two-digit number + action verb, e.g. step01-init.md, step02-collect.md. Numbering starts at 01. Sub-step numbering is banned.

A step document has six sections:

  1. Title and metadata: step number, action, executor, where input comes from, where output goes
  2. Execution narrative: what to do, in plain prose
  3. Input file list: which files to read, from which step
  4. Output file list: what to produce, in what format
  5. Validation checkpoint: how to confirm the step is done correctly
  6. Next-step pointer: where to go after completion
The four executor types
Executor Best for Context impact
Script Deterministic ops (collection, batching, merging) Zero
MCP tool (Model Context Protocol — external tool interface) Web fetching Medium
SubAgent Anything needing AI judgment (eval, analysis, generation) High
Main Agent Light coordination, reading config Cumulative

MCP tool reference format: when referring to MCP tools inside a Skill, use the fully qualified name: ServerName:tool_name. For example BigQuery:bigquery_schema, GitHub:create_issue. Without the server prefix, Claude can't disambiguate when multiple MCP servers are loaded — this is explicit in Anthropic's best-practice guide.

If you can use a script, don't use a SubAgent. That's the first principle of Skill design.

Scripts cost zero context — Chapter 5 will do the math.

Four operating principles

Progressive disclosure — load one step, execute one step. No pre-reading the full doc set. Ten step docs read up front is 20,000 tokens of pure waste.

Minimal returns — when a SubAgent finishes, it returns one sentence: "done, processed 30 records, results at <path>." Never return file contents. The contents are already in the file. Echoing them in the return value is a duplicate.

Five-layer validation — file exists → format valid → fields complete → values in range → business rules pass. Layer by layer, like a physical exam.

Round-based scheduling — my spec caps SubAgents at 2 per round (the platform allows more, but capping is safer), with a compaction between rounds.

AskUserQuestion limits (the user-prompting interactive component)

Item Limit
Questions per call 1–4
Options per question 2–4
Header length ≤ 12 characters
Custom option The system always appends "Other"

Put the recommended option first and tag its label with "(Recommended)" to nudge users.

Error recovery strategies
Error type Handling
Network timeout / 5xx Exponential backoff (1s, 2s, 4s — max 3 tries)
Rate limited (429) Wait the cooldown the server tells you, then retry
Invalid key / 401 / 403 Stop, prompt the user to check credentials
Single batch failed Skip, continue with other batches
Critical step failed Stop, write a checkpoint

4.2 Why it's designed this way

Why progressive disclosure?

Ten steps × 2,000 tokens each = 20,000 tokens of context burned just to "read ahead." Load on demand instead — read the current step, execute it, compact. Context only ever holds what the current step needs.

Why minimal returns?

The SubAgent's analysis is already written to disk. Repeating it in the return value means storing the same data twice — once in the file (durable), once in context (volatile, will overflow).

Why limit option counts?

UI limits. Options render as labeled chips, and beyond 4 the layout breaks. If you need more options, load them from a preset file and let the user use "Other" to customize.

A good step doc is like a good recipe: anyone who follows it ends up with the same dish.

Try this: open your ~/.claude/skills/ directory and look at how existing Skills wrote their SKILL.md.


Part 2 — Execution layer (where the work happens)

Execution layer — isometric factory floor with scripts, prompt templates, and variable placeholders stitched together

Real story. Not a hypothetical.

Last year I built a Xiaohongshu collection Skill — 6-step workflow, every step a SubAgent. By step 4, Claude had compacted half of the earlier conversation. The execution detail from steps 1–3 was gone. Variable names, paths, partial state — all of it. The downstream steps started misfiring.

I ran the math afterward. Six SubAgent steps, each injecting roughly 3,000 tokens of execution history into the main context — 18,000 tokens just from the SubAgent overhead. Plus the conversation accumulation. The window blew.

That crash taught me one thing: not every job needs the AI brain.

If Part 1 was the skeleton, Part 2 is the muscle. The question that organizes everything below: how many steps in your workflow actually need an AI to think?


5. Script spec — let machines do the labor

Script spec — isometric server rack with Python, Node, and Shell runtime boxes and HTTP cabling

5.1 What the spec says

One-line definition: scripts are the manual labor of a Skill. Anything deterministic, anything that doesn't need AI judgment, hand it to a script.

What does "deterministic" mean? Same input, same output, no thinking required.

Examples: pulling data from an API, splitting 150 records into 5 batches, converting JSON into Markdown, uploading a file to cloud storage. None of that needs Claude's brain. It needs Claude's hands — and ideally not even that.

If a Skill is a restaurant, SubAgents are the chefs (creativity, judgment) and scripts are the dishwashers and runners — no decisions, just reliable execution of a defined task.

Let me show you the contrast in numbers. Every time a SubAgent runs, its conversation history gets injected into the main context — typically a few thousand tokens. A script, by contrast, is a black box from the main conversation's point of view. No matter how much data it processes internally or how many APIs it hits, the main conversation only sees one return line:

"Done. 150 records processed. Results at step01-collect/data.json."

That line is roughly 50 tokens. Compared to the 3,000+ a SubAgent would have spent, that's a 60× reduction.

Standard directory structure

Scripts live under scripts/, organized by runtime:

Directory Purpose
scripts/python/ Python scripts (primary language)
scripts/python/shared/ Shared modules (run-directory helpers, etc.)
scripts/python/pyproject.toml (Python dependency manifest) Dependency declaration
scripts/python/.venv/ Virtualenv (auto-generated)
scripts/node/ Node.js scripts (optional)
scripts/shell/ Shell scripts (optional)
scripts/deno/ Deno scripts (optional)

Why a sub-folder per runtime? Because Python has its venv and pyproject; Node has its node_modules and package.json — mixing them creates dependency interference. Per-runtime folders keep dependencies isolated, like keeping different reagents in different cabinets.

Python is the default workhorse — its ecosystem covers API calls, data wrangling, and file ops most ergonomically. Other runtimes get added on demand. Don't create what you don't use.

Four key design points for a standard script

Every Python script in my system follows the same skeleton, like a fast-food kitchen running standardized prep. Whichever location, the same flow:

  1. The main function returns a status dict, not a printed mess. The return is a single clean status object, period.
  2. Error messages truncated to 100 characters. Stops a stack trace from blowing out the output.
  3. Paths come in as command-line args. The run directory is passed in from outside; the script hard-codes nothing.
  4. Exit codes are semantic. 0 = success, 1 = failure, so the main Agent knows what to do next.
Return-value spec

The return value is the wire protocol between the script and the main Agent:

Field Required Notes
ok yes Whether execution succeeded (boolean)
count no How many records processed
output no Relative path to the output file
total_batches no How many batches
uploaded_url no Upload destination
err required on failure Error description

Notice: the return is a status summary, not the data itself.

Those 150 collected records are already on disk. The return value just says "I'm done, 150 of them, written to step01-collect/data.json." If you returned all 150 records inline, that's tens of thousands of tokens of context waste.

How the main Agent calls a script

The main Agent calls scripts via the command line. It first cds into the script directory (so the venv and dependency files are found), then runs the script with uv run (the modern Python package runner), passing the run directory as an argument.

Once the main Agent has the status back, the typical loop is:

  1. Parse the status, confirm success
  2. If needed, read the output file to inspect results
  3. Update the progress file
  4. Move to the next step
HTTP API call spec

Scripts very often hit external APIs. Standard practice:

  • Library choice: prefer the Python standard library (zero-dep). For more complex needs, httpx (modern HTTP client).
  • Timeouts: always set a timeout (30–120 seconds is the common range). A timeout-less request can hang forever.
  • Retries: 5xx errors → exponential backoff (1s, 2s, 4s), max 3 tries; 4xx → don't retry (the request itself is wrong).
  • Rate limiting: read the server's Retry-After header, wait the indicated duration.

What does exponential backoff mean? Wait 1s after the first failure, 2s after the second, 4s after the third. Doubling intervals avoid hammering a struggling server. Like knocking on a door — you knock, wait a bit, knock again, wait longer. Not knock-knock-knock-knock.

Script vs. SubAgent vs. MCP — how to choose
Dimension Script SubAgent MCP tool
Context cost Zero (one status line) High (thousands of tokens) Medium
Best for Deterministic ops AI judgment needed Web fetching
Build cost Medium (you write code) Low (you write a Prompt) Low (use a ready-made tool)
Rate-limit control Precise (code-level) None Depends on the server
Debuggability High (run locally, log freely) Low (AI behavior is hard to predict) Medium

A simple decision: does this operation require thinking?

  • No → script
  • Yes (semantic understanding, judgment, creative writing) → SubAgent
  • Need fresh web information → MCP tool

Anthropic's guidance puts it well: many useful Skills run entirely on Claude's built-in capabilities — writing, analysis, code generation. MCP integration is optional and incremental. In other words: don't dismiss a Skill idea just because there's no MCP tool for it.

Hybrid execution — the core token-saving trick

In a typical 6-step workflow, only 1–2 steps actually need a SubAgent. The rest can be scripts. Let's run the numbers:

Step Executor Token cost
Step 1: collect Script +50
Step 2: batch Script +30
Step 3: score content SubAgent +8,000
Step 4: merge results Script +30
Step 5: upload Script +50
Step 6: notify Script +30
Total 8,190

If everything were a SubAgent? 6 × 3,000 ≈ 17,500 tokens. The hybrid mode saves 53%.

When you add batches (say 6 batches of 6 steps = 36 operations) the gap goes vertical:

  • All-SubAgent: 36 × 3,000 = 108,000 tokens — over 60% of usable space, gone
  • Hybrid: 6 SubAgent calls + 30 script calls = 6 × 3,000 + 30 × 50 = 19,500 tokens

108,000 down to 19,500. 82% saved. That's the math behind "use a script if you can."

A more visceral picture: imagine a 100-square whiteboard (the 200K window). A SubAgent draws a 3-square block (a fat marker). A script makes a tiny dot (a pencil tick). Six SubAgent steps fill 18 squares. Hybrid mode barely fills one. Same work, one whiteboard nearly full, the other nearly untouched.

Dependency management

Declare Python dependencies in pyproject.toml, manage them with uv.

Banned: requirements.txt (no real lockfile) and pip install (use uv sync).

Shared modules

When multiple scripts share logic, put it in scripts/python/shared/. The most common shared module exposes run-directory helpers:

Function Purpose
init_run_dir Create a run directory
get_latest_run Get the most recent run
complete_run Mark a run finished
Multi-mode script organization

When the Skill supports multiple modes, scripts split by mode too:

Folder Notes
scripts/python/clone/ Clone-mode scripts
scripts/python/timeline/ Timeline-mode scripts
scripts/python/shared/ Shared modules
scripts/python/merge.py Shared script (root-level)

Rule: mode-specific scripts go in the mode folder; shared scripts go at the root or in shared/.

Banned
Banned Why
Calling LLMs from inside a script Deterministic ops don't need AI; mixing one in destroys the "zero context" advantage
Heavy stdout logging The main Agent captures stdout; heavy logging = context pollution
Decision-making inside a script "Which path to take next" belongs to the Agent; scripts walk paths, they don't choose them
Hard-coded paths All paths arrive as parameters; hard-coding means "works on my machine"
Calling external tools from scripts External tools belong to the workflow layer; scripts are pure code

Two more from Anthropic's best-practice guidance:

Solve, don't punt: when a script hits an error, it should handle it itself (create a default file, fall back to an alternative), not bail out and force Claude to guess.

No "magic constants": every config value (timeout, retry count, etc.) needs a comment explaining why that value. TIMEOUT = 47 is bad — why 47? TIMEOUT = 30 # HTTP requests usually finish under 30s is good. Same gospel as "code is for humans first."

5.2 Why it's designed this way

The math is already on the table — 108,000 down to 19,500, 82% saved; minimal returns; pass paths, not contents. Not going to rederive it. Three operational "whys" worth a closer look:

Why uv run instead of plain python?

uv run activates the venv, installs deps, then runs the script. Plain python may use the system interpreter — which has none of your project deps, so the script fails with ModuleNotFoundError. uv run is the kind butler who sets the table for you before serving.

Three script potholes I've stepped in

Pothole 1: __pycache__ serving stale code. I edited a script, ran it, and behavior didn't change — Python was running cached bytecode. Habit fix: rm -rf __pycache__ while debugging.

Pothole 2: Script output flooded the context. Early on I print-debugged everything; the main Agent dutifully captured all of it and injected it into context. Fix: log to a file, only return the JSON status line on stdout.

Pothole 3: Forgot to set HTTP timeout, script froze. An API server got slow once and my script hung for 10 minutes before the system killed it — no timeout= parameter set. Fix: every HTTP call gets timeout=30 (or longer). I'd rather fail fast and retry than wait forever.

Scripts are the unsung heroes of a Skill — 80% of the work, 0% of the context.


6. Prompt templates — directing the SubAgent

Prompt templates — isometric director chair pointing at SubAgent with path-first speech bubble and folder

6.1 What the spec says

One-line definition: a Prompt template is the brief you write for a SubAgent — who you are, what to do, how to do it, how to report.

If scripts are the manual labor, SubAgents (sub-agents — Claude's "junior selves") are the brain workers — anything that needs semantic understanding, judgment, or creative writing. The Prompt template is your project brief to that brain worker.

A good brief lets an intern ship a project unsupervised. A bad brief leaves a PhD lost.

Location: reference/prompts/

Naming: semantic prefixes that telegraph intent at a glance:

Prefix Use Example filename
batch- Batch-processing tasks prompt-batch-analysis.md
init- First-pass generation (from zero) prompt-init-persona.md
iterate- Incremental updates (new data folded into old) prompt-iterate-merge.md
final- Final output prompt-final-report.md
eval- Scoring prompt-eval-quality.md
merge- Merge processing prompt-merge-results.md
prepare- Preparation phase prompt-prepare-data.md
Template doc structure

Each Prompt file isn't raw Prompt text — it's a complete document with two parts:

Metadata block: tells the developer which step this Prompt is used in and which parameters to launch the SubAgent with. Includes purpose, applicable step, SubAgent type, model, whether to background-run.

Prompt body: the actual instructions sent to the SubAgent.

The 6 ingredients of a working SubAgent Prompt

A complete Prompt looks like a military order. Six parts:

  1. Role: who you are ("You are a content quality reviewer")
  2. Run paths: where input lives, where output goes
  3. Steps: what to do, 1-2-3
  4. Standard: how to judge (scoring rubric, classification rules)
  5. Output format: what the result should look like
  6. Return format: only return a minimal status line

I once made a painful mistake — I embedded 30 records of note data directly inside a Prompt. By the time the SubAgent finished, the main context had ballooned by 7,600 tokens. After 3 batches Claude said: "Approaching context limit."

That's when I made this rule iron:

Path-first principle — the most important Prompt rule

Pass paths, not contents.

The single most important rule in Prompt design.

Imagine asking a colleague to review a report. Would you paste the full PDF into Slack? Of course not — you'd say "the file's at this shared-drive path, take a look."

Same here. A Prompt should never embed large data blocks. Compare:

Dimension Pass contents (wrong) Pass paths (right)
Main context Bloats ~7,600 tokens per batch Stays clean
SubAgent flexibility Passive — data already injected Active — reads what's needed
Maintainability Template and content coupled Decoupled
Token transit count 3 (inject → process → echo) 0 (data only flows inside the SubAgent)

Pass-by-path is one short line (~20 tokens). The SubAgent reads the file with its own Read tool. The contents stay inside the SubAgent's context, never flow back to the main conversation.

It's "self-serve buffet" vs. "table service." Self-serve, you take what you want, no waste.

Allow-list design

When you set behavior boundaries for a SubAgent, prefer an allow-list (white-list) over a "do not" list (black-list):

Just list the tools the SubAgent is allowed to use — read, write, list directory, run command. Anything outside that list is forbidden by default.

Why allow-list beats deny-list:

Deny-list reads "you can't do X, can't do Y, can't do Z" — easy to leave gaps and the list grows forever. Allow-list reads "you can do A, B, C only" — short, sharp boundary.

A practical pothole: if you only allow read and write, the SubAgent can't list a directory (needs the list-directory tool) or run a script (needs the command tool). Read + write + list-directory + run-command is the battle-tested minimum kit.

Instruction freedom

Not every Prompt needs to micromanage. Match the freedom to the task's "fragility":

Freedom Use case Style
High Creative generation (many valid outputs) Direction only
Medium Eval/analysis (preferred framework) Framework + room to vary
Low File operations (one wrong move and it's over) Exact commands

Quick fragility check:

  • Output format strict? → fragile
  • Path operations precise? → fragile
  • Downstream depends on it? → fragile
  • Retry cost high? → fragile

3+ "yes" → low freedom (exact commands). 1- "yes" → high freedom (direction only).

Anthropic uses a great analogy — narrow bridge vs. open plain:

  • Narrow bridge (cliffs on both sides): one safe path → strict instructions, hard rails (low freedom). Example: a database migration that must run in exact order.
  • Open plain (no obstacles): many roads to Rome → give direction, trust Claude to find the best path (high freedom). Example: code review, where the best approach depends on context.
Iteration-style Prompts

When a task needs multiple rounds (new data folding into an existing analysis), use iteration mode. Core flow:

  • Round 1 (initial generation): start from scratch, create the first version
  • Round 2+ (incremental): fold new data into the old version, don't rewrite from zero

Folding has a priority order — new findings > consensus reinforcement > core points > edge details.

In plain English: brand-new insights (not in the prior version) get folded first; data that confirms existing claims comes second; edge details last.

Token budget grows elastically too — 3% per round, capped at 60% growth. A baseline of 4,000 tokens reaches 5,080 by round 10 and tops out at 6,400 by round 21. Like writing an essay — first draft 4,000 words, each revision can stretch a bit, but not balloon forever.

Hard bans
Banned Why
Embedding large data blocks in a Prompt Pass a path, let the SubAgent read it
Variable arithmetic (e.g. version - 1) No engine evaluates that; use a "latest pointer" file instead
Reading a directory directly Triggers an error; list files first, then read each

6.2 Why it's designed this way

The path-first math is already done — main-context burn drops from 7,600+ to under 100, and minimal returns let the SubAgent just say "done." Skipping the rederivation, jumping to the practical takeaways.

Three golden rules for writing Prompts

Rule 1: Paths first, instructions second. The first three lines of a Prompt should be: where is the input, where does the output go, where are the references. The SubAgent reads the paths and pulls the rest itself — 60× more efficient than stuffing contents in.

Rule 2: Use allow-lists instead of deny-lists. Don't write "don't do X, don't do Y" — you'll never finish the list. Write "you can use read, write, list-directory, and run-command." One line, sharp boundary.

Rule 3: The return value says three words: "done." Don't let the SubAgent re-narrate its analysis in the return. The result's already in a file. The return value only needs: success/failure + count + path. I've watched too many people wreck themselves here — SubAgent returns a wall of analysis, main context blows in one breath.

The whole secret of writing a good Prompt fits in five words: pass paths, not contents.


7. Variable placeholders — the glue that strings everything together

7.1 What the spec says

One-line definition: variable placeholders are the messengers between every file in a Skill — step docs reference the run directory, Prompt templates reference input paths, scripts receive parameters. Variables connect it all.

Think of a movie script. It says "the lead enters [LOCATION]." On set, [LOCATION] becomes "the coffee shop." Variable placeholders are the script's [LOCATION] — written as a placeholder, replaced at runtime.

Two variable systems

A Skill has two distinct variable systems — two languages, two purposes:

System Syntax Source When replaced Examples
Workflow variables Single curly braces Generated by the main Agent at runtime When executing a step run_dir, batch_id
Platform official variables Dollar prefix Injected by the platform when loading a Skill When SKILL.md loads ARGUMENTS

Don't mix them. Workflow variables are a spec convention — you write them in step docs and Prompts, the main Agent substitutes the real path during execution. Platform variables are a system feature — Claude Code itself does string substitution at load time.

Core workflow variables — quick reference

The ones you'll touch most when writing step docs and Prompts:

Variable Meaning Example value
skill_dir Skill install directory The Skill's root
run_dir Current run directory skill_dir/runs/<this run>/
batch_id Batch number (starts at 1) 1, 2, 3
batch_count Total batches 6
count Items in this batch 30
input_path Input file path A step's output under run_dir
output_path Output file path run_dir/output/
mode Mode name (multi-mode Skills) clone, timeline
keyword Runtime keyword claude-code
timestamp Timestamp 2026-01-23T10:30:00Z

The two that matter most: skill_dir and run_dir.

  • skill_dir = the Skill's "home" — code, config, references all live here
  • run_dir = this run's "workspace" — all runtime data lives here

Their relationship is "factory" vs. "work order." The factory is fixed (skill_dir); each new order opens a new ticket (run_dir).

Platform official variables
Variable Meaning Use case
ARGUMENTS Args passed when invoking the Skill Dynamic context injection
CLAUDE_SESSION_ID The current session's unique ID Log tracing, temp file naming

ARGUMENTS is the workhorse. Build an Issue-review Skill, and a user typing /awp-issue-reviewer 42 makes ARGUMENTS resolve to 42. The doc becomes "review Issue #42" and the system can also auto-run a command to pull issue details.

The platform substitutes variables first, runs commands second, injects results into the doc, hands the whole thing to Claude. One line of config, real-time context auto-loaded.

keyword generation rules

keyword is the heart of run-directory naming. It's extracted from the user input and standardized.

User input keyword Run directory
"Claude Code Tutorial" claude-code claude-code-20260123-103000/
"@some-design-creator" design-creator design-creator-20260123-103000/
"https://example.com/user/test" test test-20260123-103000/
"React 19 Features" react-19 react-19-20260123-103000/

Standardization is a filter:

  1. Lowercase
  2. Spaces and special chars → hyphen
  3. Non-Latin scripts dropped (only ASCII letters/digits kept)
  4. URLs → keep last meaningful segment
  5. 32-char cap

Multi-mode Skills add a mode prefix to the run directory: clone-design-creator-20260123-103000.

Substitution rules summary
Rule Notes
Workflow variables Used in step docs and Prompt templates; substituted by the main Agent during execution
Platform variables Used in SKILL.md; substituted by the platform on load
Paths must be absolute Relative paths may not expand inside a SubAgent
No variable arithmetic "version − 1" won't be evaluated
Don't pass content variables Pass paths, not content

7.2 Why it's designed this way

Why two variable systems?

Because they speak to different "readers."

Workflow variables are written for the main Agent. You write run_dir in a step doc; the main Agent reads context and substitutes the real path. That's a semantic convention.

Platform variables are written for the Claude Code platform. When a Skill is triggered, the platform does source-level string substitution. That's a system mechanism.

What if you mix them? Workflow variables in SKILL.md — the platform doesn't recognize them, no substitution. Platform variables in step docs — no platform pre-processing, Claude reads them as plain text. Each goes home to the wrong house.

You may think "variable" sounds technical. You use variables every day. Your name is a variable, pointing at "you." Your phone number is a variable, pointing at your phone. Skill variables are the same — they happen to point at file paths.

Why ban variable arithmetic?

"version − 1" looks convenient — "I want to read the previous version." Reality is harsh: no engine parses that as math.

How do you read "the previous version"? Use a latest-pointer file.

Example: create feedback_latest.md, update it after each iteration to point at the newest version. The Prompt says "read feedback_latest.md under run_dir" — the SubAgent gets the latest version without ever knowing the version number.

Like the "new arrivals" shelf at a library — you don't memorize the latest call number; you walk to the shelf and pick up whatever's there.

Why must paths be absolute?

SubAgents run in isolated environments. Their "current directory" may not be what you assume. Relative paths can fail to expand in some environments.

Only absolute paths are deterministic — wherever they execute, they always point at the same file.

Like shipping a package — write "Leo's house" and the courier has no clue. Write the full address and you're golden.

Why standardize keyword?

Because keyword becomes a directory name, and directory names have hard rules:

  • No spaces (the shell splits paths on spaces)
  • No special characters (@ # / have meaning to the filesystem)
  • Case sensitivity differs (macOS default insensitive, Linux sensitive)
  • Length limits (the OS caps total path length)

Standardizing "Claude Code Tutorial" to claude-code gives you a clean, safe, cross-platform directory name. Keep the raw input in the progress file so you can show users the original when needed. Best of both.

Variables in motion across a real workflow

Concrete example. Trace one variable from birth to death:

Step 01 init → user enters "Claude Code Tutorial," keyword becomes claude-code, run_dir is generated, directory structure created, progress file written. run_dir is born.

Step 02 collect (script) → main Agent builds the command, passes run_dir as a script argument. Script runs, data lands in run_dir/step02-collect/. run_dir traveled to the script.

Step 03 evaluate (SubAgent) → main Agent builds the Prompt, fills run_dir and batch_id into file paths. SubAgent reads files, runs the eval, returns a status line. run_dir + batch_id traveled to the SubAgent.

Step 04 generate → final results land in run_dir/output/. The completion report references the output path.

See it? run_dir is a thread that ties init, script call, SubAgent Prompt, and final output together. Without that thread, every step is an island — script doesn't know where to write, SubAgent doesn't know where to read, final output has nowhere to go.

Variables are glue. Without them, every step is an island.

Try this: pick one task you currently run via SubAgent. Look at the steps. Which ones could be scripts?


Part 3 — Data layer (the foundation under runtime)

Data layer — isometric storage vault with runs, config, schema drawers and progress.json checkpoint tape

Quick question.

Your Skill has been running for 20 minutes. Data collection: done. Analysis: done. Scoring: done. Then Step 4 hits an API 429 (too many requests).

Start over? That's 20 minutes of work in the trash.

Worse — if you didn't persist intermediate files, you don't even know where to start over from.

This is not hypothetical. I hit it every month.

The data layer solves exactly this. It's not a nice-to-have. It's disaster recovery.


8. Run-data spec — the on-site record of every execution

8.1 What the spec says

Picture yourself as a detective. Every crime scene gets photos, video, notes. If you tossed every case's evidence into one box, querying any single case becomes a nightmare.

The run-data spec solves that. Every Skill execution is an independent "case file" — orderly logging and storage required.

The runs/ directory

All run data lives under runs/ at the Skill root. Each sub-folder is one independent run.

A typical Skill might have these:

Run folder Meaning
chatgpt-20260219-143052/ The 14:30:52 run on 2026-02-19
react-hooks-20260218-091530/ Run from the day before
openai-api-20260217-200015/ An older run
Run-folder naming

Each name = keyword + timestamp.

The keyword half is extracted from user input via the standardization rules:

Rule Sample input Extracted
Lowercase ASCII "ChatGPT Tutorial" chatgpt-tutorial
Strip special chars "React.js & Vue!" reactjs-vue
Drop non-Latin scripts "Learn Python Basics 学习" learn-python-basics
URL → key segment "https://github.com/openai/gpt" openai-gpt
32-char cap very long text… truncated to 32 chars

The other half is a second-precision timestamp (YYYYMMDD-HHMMSS), so even running the same keyword back-to-back doesn't collide.

Think of it as a tracking number: front half tells you who it's about, back half guarantees uniqueness.

Multi-mode Skills prepend the mode: clone-design-creator-20260123-103000.

Inside a run folder

Each run has fixed and dynamic sub-folders.

Fixed folders — only two, and required for any Skill:

Folder Purpose
state/ Progress: "where am I"
output/ Final output: what gets handed to the user

Dynamic folders are defined by the workflow table, in stepNN-action/ form:

Folder Purpose
step01-fetch/ Raw data from step 1
step02-analyze/ Intermediate results from step 2
step03-generate/ Drafts from step 3

Like an assembly line — each station has its own work-in-progress; only the final output ships from output/.

progress.json — the heartbeat of a run

state/progress.json is the most important file in the entire run-data spec. It's the live execution state. Key fields:

Field Notes
keyword / keyword_raw Standardized keyword / original user input
created_at / updated_at Created / last update timestamp
step Which step is current
step_status Per-step state: pending / running / done / failed
Resume hint A memo for restoring state after a context compaction

The resume hint is a clever bit — it records "which executor", "what constraints", "where to continue from." When a context compaction kicks in (the conversation got too long and had to free space), a fresh round can read this file and pick up where the last one stopped.

progress.json is the sticky note on your front door — "laundry's still spinning, milk in the fridge expires today."

You may think "progress file" sounds engineering-heavy. It's a JSON file recording three things: where you are, what worked, where to resume. That's it.

Two progress modes
Mode Use case Analogy
Batch mode Large data, processed in chunks Moving 500 boxes, 100 at a time, "I've done 2 batches"
Item mode Each item tracked independently A teacher grading homework, status per student

Batch mode records: total, batch size, completed batches, current batch number.

Item mode records: per-item state (pending, done, failed and retry count).

Cleanup policy
Run state Retention count Retention time
Successful Latest 5 7 days
Failed Latest 10 30 days
Important (.keep marker file) Forever Forever

Drop an empty .keep file in a run folder to mark it for permanent retention.

Resume flow

After a context compaction, the new Agent reads progress.json → checks the resume hint → continues from the breakpoint (full resume flow detailed in Chapter 18).

8.2 Why it's designed this way

Why one folder per run?

Isolation. If runs shared a folder, the second run would clobber the first run's intermediate files. Independent folders are full snapshots — replayable, individually deletable, mutually inert.

Why include keyword in the folder name?

Pure timestamps are unique but illegible. With 50 sub-folders, you can't tell which is which. The keyword is the tag on the folder.

Why are state/ and output/ the only fixed folders?

Minimum common subset. No matter what the Skill does, it needs to know "where am I" (state) and "what got produced" (output). Everything else is workflow-defined.

Why retain more failed runs?

Successful runs all look alike — 5 is plenty for reference. Failed runs are different in interesting ways — keeping more helps you spot patterns. Maybe every Step 3 failure is the same API timeout.

A real resume story

I once ran a Xiaohongshu collection Skill, 6-step workflow. At Step 4 (content scoring) the API returned 429. SubAgent retried 3 times, marked failed.

Without a progress file, the only option is "start over" — Step 1 init, Step 2 collection (150 notes, 5 minutes of waiting), Step 3 batching, all wasted.

But because progress.json recorded the breakpoint, in a fresh session I just said "resume the last run." Claude read the progress file, saw Steps 1–3 done and Step 4 failed at batch 3. It picked up at Step 4 batch 3 — not a second wasted.

That's what the data layer is for. Not a nice-to-have. Disaster recovery.

The progress file is your save point. Game Over → load → continue.


9. Parameter config spec — the three-layer config model

9.1 What the spec says

If you've used a camera, you already understand this model:

  • Press the shutter, dial the aperture by hand — per-shot manual settings
  • The camera has defaults — works without tweaking
  • Scene modes — landscape / portrait / night, predefined bundles for quick switching

Skill parameter config is the same three layers:

Layer Analogy Source Notes
L1 — Interactive Adjust on the shutter Asked at every run Core params
L2 — Config Camera defaults Global default config file Advanced params
L3 — Preset Scene modes Predefined option sets Provides options for L1
L1 interactive — asked at every run

Collect core params via the interactive component, store in config.json under the run directory.

Typical fields: keyword, language, output format, creation timestamp.

Key principle: ask the minimum. 3–4 core params at most. If a param is selected the same way 90% of the time, it doesn't belong in L1 — push it to L2 as a default.

Picture walking into a coffee shop. The barista asks: "What kind of coffee? Large or medium?" Not: "What water temperature? Paper cup or ceramic?"

L2 config — global defaults

Lives at config/default.json, grouped by functional module, single level of nesting allowed.

Typical groups:

Module Params
api timeout, retries, request interval
processing batch size, max items
output language, output format

Core principle: sensible defaults — runs without modification, like a new laptop out of the box.

L3 preset — predefined option sets

Lives at reference/presets/. Provides options for L1 prompts.

A "markets" preset, for instance, contains "US English," "China Chinese," "Japan Japanese," with "US English" marked as default. The user sees a dropdown; the data behind it comes from the preset file.

Param precedence

Closer to the moment of execution wins:

L1 runtime > L2 default > script-internal default

Like CSS: inline > class selector > tag selector.

If the user says "set timeout to 60 seconds" right now, that's because they know this API will be slow today. That's more reliable than the default I set three months ago.

Domain config (optional)

config/domain.json holds business-logic config (scoring weights, content filtering rules) — separate from technical params. Because changing scoring weights is a product call; changing API timeout is a technical call. Different decision-makers, different change cadence.

9.2 Why it's designed this way

I designed a Skill once with all 12 params in L1. The user had to answer 12 questions every run. After the third use they said something I'll never forget: "Can you stop asking me so much? I just want to push a button."

After that day, "L1 ≤ 3-4 params" became iron law.

Why three layers?

Different params change at different rates. L1 changes every run (search keyword). L2 changes every few months (API timeout). L3 is fixed at release (which languages we support).

Mixing change rates is like throwing daily essentials and annual decorations into the same drawer.

Why minimize L1?

Every extra question is one more chance to annoy the user. 3–4 core params is the sweet spot validated in practice.

If you need 10 params to run, the abstraction is wrong — split into multiple Skills, or push more to L2.

Good defaults let 90% of users start with zero config, while 10% of power users tune freely.

Why only one level of nesting in L2?

One level is unambiguous — api.timeout is obvious.

Allow deep nesting and you get "api.retry.strategy.backoff.initial_delay" — dizzying. One level is the sweet spot between readability and expressiveness — keep grouping benefits, dodge the nesting maze.


10. Parameter Schema spec — the blueprint for dynamic forms

10.1 What the spec says

Filled out a government form? The boxes are pre-printed: name, ID, phone. Each box has format hints ("11-digit mobile only").

A parameter Schema is that form template — it doesn't contain the data, it defines what to fill, how to fill, what valid looks like.

Location: config/params.schema.json

Basic structure

A Schema file has a version, source identifier, and field list. Each field defines:

Property Required Notes
key yes Unique param identifier, supports dot path (e.g. processing.limit)
label yes Human-readable display name
type yes Data type (one of seven)
required no Whether mandatory
default no Default value
preset no Points to a preset file (when type is preset)
ui no Render hints (placeholder, helper text, min/max)
Dot path alignment

Schema key supports dot path, aligning with the nested structure of the default config file.

Write key: "processing.limit" and it maps to the limit field under the processing module in defaults. Like a mailing address — "Beijing.Chaoyang.Some Road" maps to the actual hierarchy.

With 20 params, a nested-Schema definition becomes vertigo. Dot path flattens the nesting — one dot expresses hierarchy, structure preserved, nesting hell avoided.

Seven data types
Type Notes Typical use
string Single-line text Keyword, name, URL
integer Whole number Count, page
number Float Ratio, weight
boolean Yes/No Toggle
text Multi-line text Prompt template
preset (preset reference) Points to a preset file Bridges Schema and presets
json (free structure) Complex data Escape hatch — fits anything

First five are basic types; preset is the bridge between Schema and presets; json is the escape hatch.

Why exactly 7? Fewer than 7 forces you into manual type conversion. More than 7 hits a learning cliff. Four basic scalars + one extended text + one reference + one escape hatch = the minimum set covering all common needs. Seven Lego pieces — looks simple, builds anything.

Core principle: Schema describes, doesn't supply values

Schema is the blueprint, not the bricks.

Schema says "this field is keyword, string, required" — it doesn't say keyword is "ChatGPT." Actual values come from the three-layer config system.

Runtime injection

The system injects params via two environment variables into scripts:

Variable Notes
SKILL_PARAMS_JSON L1 raw — what the user provided this run
SKILL_PARAMS_RESOLVED Merged — L1 + L2 + built-in defaults, layered

99% of the time use the resolved version. Use raw only when you need to distinguish "user-chosen" from "system-default."

10.2 Why it's designed this way

Ever filled out a form and learned mid-way that the format was wrong? I have. Uploaded a PDF — the system says "JPG only." Re-upload, "file too large." Schema exists to kill that experience — tell the user the rules before filling, not after.

Why a separate Schema file?

With Schema, the system can auto-generate interactive forms, auto-validate params, auto-generate documentation. That's the power of declarative design — you say "what I need," the system handles "how to do it."

Why an array of fields, not an object?

Params have order — keyword first, then language, then count. JSON object keys are unordered in theory. Arrays are ordered by nature. Field order is the user-facing order.

Why separate Schema from values?

Changing a default shouldn't change the Schema. Schema is a "structural contract"; defaults are an "operational decision." They have different change cadences and different approval flows. Physical separation is the right design.

Seven types, seven Lego pieces — looks simple, builds anything.

Try this: add a progress.json to one of your Skills. The next time it dies mid-run, resume from the checkpoint.


Part 4 — Resource layer (the polish)

Resource layer — isometric toolbox with credentials, constants, presets, and HTML template compartments

One case I've seen: someone hard-coded an API key inside a Prompt, pushed the code to GitHub, and a scraper picked it up overnight. $3,000 in API quota burned in one night.

Another classic: three step docs each hard-coded "professional," "Professional," and "pro" — same concept, three spellings. The SubAgent treated them as three separate categories.

The resource layer kills these. Parts 1–3 made the Skill runnable. The resource layer makes it safely runnable, consistently runnable, gracefully runnable.

The resource layer at a glance — four resources, one table

Before diving into each, the global picture:

Resource Folder For whom Core role One-line distinction
Credentials credentials/ Scripts Safe API key storage Keyring — opens doors
Constants reference/definitions/ System / developer Kill hard-coding, unify the data dictionary Menu's flavor categories
Presets reference/presets/ Users Data source for interactive choices Today's recommendations
Templates reference/templates/ Output layer Turn data into pretty pages The typesetter — makes things look good

How they relate: credentials let scripts hit APIs to fetch data; constants define the legal values for that data; presets give users choice surfaces; templates turn the result into something nice to look at. Each has its own role; together they make a Skill feel "good to use."


11. Credential management — safety first

11.1 What the spec says

One-line definition: a credential file is a Skill's keyring — it stores all API keys (the access tokens for external services), tokens, and service account info, so scripts can call external services safely.

You may think: I have one API key, what does it matter where it lives? Answer: wrong place can really hurt.

To enter a corporate building you need a badge. For a Skill to call an external API, it needs an API key. The credential file is where the badge lives — the spec ensures every badge sits in its assigned slot, not loose.

Location: credentials/

The only allowed format: JSON

Iron rule. Credential files are JSON only. No Markdown, no .env (environment-variable file), no YAML.

Why so strict? Scripts need to parse credentials reliably. JSON parses with the standard library of any language. If multiple formats were allowed, scripts would need branching: "if YAML do this, if .env do that."

One format unifies the world. All branches disappear.

Standard credential structure

Each credential file has these key fields:

Field Required Notes
schema_version yes Version, currently "1.0"
name yes Service identifier, lowercase (e.g. tikhub, openai)
kind yes Authentication type (table below)
auth yes Auth info (structure varies by kind)
status no Status flag (active / expired)
description no Service purpose
kind types (auth types)

The kind field is the "model number" of a key — different model, different lock:

kind When Typical services
api_key Single-token call OpenAI, platform open APIs, Brave
oauth1 OAuth (open authorization protocol) 1.0a Legacy social platform APIs
oauth2 OAuth 2.0 Google, GitHub Apps
username_password Username + password login Legacy web services
ssh_key SSH (secure shell) + Token GitHub SSH
multi_account Account collection Multiple accounts on the same service
reference Pointer to external info (no secret) Server lists, doc links

Different kinds have different auth shapes, like different key teeth:

  • api_key (most common, simplest): one auth method + one token
  • oauth1 (e.g. legacy social APIs): API key + key secret + access token + access token secret + Bearer Token — five fields
  • oauth2 (e.g. Google): client ID + client secret + access token + refresh token + token endpoint URL
Credential independence — every Skill self-contained

The most important design principle in credential management: each Skill keeps all credentials inside its own credentials/, no external paths.

What's "no external paths"? Your scripts must never reference a credential file outside the Skill folder.

Why? Because credential independence = Skill portability. Imagine sharing your Skill with a teammate — if the credential lives in your personal external folder, their machine doesn't have that folder, and the Skill won't run. With credentials inside the Skill, they only need to drop in their own API key.

Standard script credential read

Three steps: locate file → parse JSON → extract token. Concise, reliable, unambiguous.

Safety measures
Measure What it looks like
.gitignore excludes real keys Credential files in the ignore list
Don't log secrets Scripts never print tokens to output
Don't hard-code secrets Read from file, never inline
Don't redact during review When auditing a Skill, leave configured real keys alone

11.2 Why it's designed this way

Why JSON only?

The answer hides in the "kill the branches" design philosophy.

If we allowed three formats — JSON, YAML, Markdown — that's 3× the parsing logic, 3× the edge cases, 3× the bug surface. YAML's indentation rules trip people; Markdown table parsing needs regex. JSON is the lingua franca — zero deps, zero ambiguity.

Why credential independence?

Thought experiment. Suppose your Skill depends on an external credential path:

Scenario What happens
Share with a teammate They don't have that path; Skill errors on import
Move to another machine External path may not exist
External credential format changes Your Skill's parser silently breaks
Debugging an error Hard to tell if the bug is in the Skill or the external config

With credential independence, all those risks vanish. The Skill is a self-contained "app" — drop it in, fill in the keys, run.

Why does auth structure vary by kind?

Because the underlying auth methods differ that much. Single-token only needs a token. OAuth 2.0 needs client ID + secret + access token + refresh token + token endpoint — five fields.

Force them into one shape and you either lack fields or have lots of empty ones — like using the same form for "key card number" and "bank account info." Naturally different formats.

The kind field is a type tag; the script reads it and knows which fields to look for. Type tags keep parsing crisp.


12. Constant definitions — kill the magic strings

12.1 What the spec says

One-line definition: a constant definition file is a Skill's data dictionary — collect hard-coded strings scattered across the code into one place, give each value a name and a description.

What's a "magic string"? A hard-coded value that just appears in code with no context.

Say your Skill judges tone style and the code has "professional", "casual", "friendly" sprinkled around. Where did those come from? What are all the valid values? Who defined them? If you wanted to add "humor", how many places would you change?

The constant-definition file is the antidote — every legal value lives in one place.

Location: reference/definitions/

Common types
Type Filename Use
Format format-definitions.json Output formats
Tone tone-definitions.json Tone styles
Category category-definitions.json Category labels
Scoring scoring-definitions.json Scoring dimensions (with weights and scales)
Status status-definitions.json Status enums
Standard structure

Each definition file has version, type, purpose, and a list of definition items. Each item has:

Field Required Notes
id yes Unique identifier, lowercase + hyphens
name yes Display name (e.g. "Professional")
description recommended Detailed explanation
example optional Example
default optional Whether default
weight optional Weight (for scoring definitions)
scale optional Range (for scoring definitions)

Scoring definitions are the most complex type. Each item is more than a label — it's a complete evaluation standard, with weights (0–1, summing to 1 across all items) and scoring range. The SubAgent reads it and is ready to judge.

Usage

In a Prompt, reference the path (don't pass content) — tell the SubAgent where the scoring rubric is and let it read it. Context stays clean.

12.2 Why it's designed this way

Why kill magic strings?

The name itself is telling — "magic" string. Like real magic, you don't know where it came from, why it's there, or what blows up if you change it.

Magic strings are the petri dish of code rot. Real scenario:

Your Skill uses tone in three places — Step 01 user choice, Step 03 Prompt template, Step 05 output formatting. Hard-coded everywhere. Now requirements change: rename "professional" to "formal". You search-replace across three files. Miss one and you've got a bug.

With a definition file: all three reference tone-definitions.json. Change once, global effect. Zero misses, zero inconsistency.

That's the single source of truth principle — define a concept once; everything else references that single definition.

Why separate from presets?

Most-asked beginner question — definitions and presets look similar, why two folders?

One-line distinction: definitions are "internal system constants," presets are "user-facing option sets."

Dimension definitions presets
For whom System and developers Users
Typical use Type checks in scripts, standards in Prompts Source of interactive choices
Examples Tone types, scoring dimensions, output formats Persona presets, target markets, keyword sets

Simple analogy: definitions are the menu's "cuisine categories" (Sichuan, Cantonese, Shandong); presets are "today's recommendations" (Kung Pao Chicken, White-Cut Chicken, Sweet & Sour Carp). Categories are internal logic; recommendations are user choices.

Change one place, the whole world updates — that's the power of single source of truth.


13. Preset configs — the data source for user choices

13.1 What the spec says

One-line definition: a preset file is a "menu" — when a Skill needs to ask "which one do you want?", the choices are loaded from preset files.

Walking into a tea shop. The clerk doesn't say "tell me anything" (decision paralysis); they hand you a menu: "Bubble milk tea, taro milk tea, Yang Zhi Gan Lu — today's pick is Yang Zhi." That menu is the preset file's job.

Location: reference/presets/

Common types
Type Filename Use
Persona persona.json AI persona configs
Markets markets.json Target market list
Keywords keywords.json Predefined keyword sets
Templates templates.json Content templates
Topics topics.json Topic categories
Standard structure

Base structure mirrors definitions (version + item list), but each preset item can carry arbitrary extension fields.

Because a preset is essentially a "config bundle" — each option is a group of params. A persona preset, beyond id and name, carries tone, vocabulary style, sentence preferences, and other detailed traits.

Relationship with user interaction — the main use

Presets exist mainly to feed user interactions. The init step in a workflow typically asks the user a few questions; the options shouldn't be hard-coded in the step doc — they should load from preset files.

Key constraints
Constraint Value Why
Option count 3–6 Too few is meaningless, too many is a maintenance burden
Interactive display Up to 4 UI limit, beyond which options don't render
Default option At least one Fallback when the user picks nothing
Hard cap ≤ 20 Past 20, both users and maintainers struggle
Three-layer config relationship

Data flow: L3 preset → loaded as L1 options → user picks → written to run config.

13.2 Why it's designed this way

Why load options from a file instead of hard-coding?

Same single-source-of-truth principle. Suppose you hard-code four market options in a step doc, and later need to add "Korea Korean" — you find the doc, edit the options, ensure formatting. If multiple steps reference the market list, multiple edits.

From a preset file: edit once, every reference reflects it.

Why 3–6 options instead of more?

Choice overload is real. The classic experiment: a supermarket displaying 24 jam flavors got more tastings but fewer purchases; 6 flavors got fewer tastings but more purchases.

Same for Skills. 3–6 carefully picked options + an "Other" fallback beats 20 options every time.

Why at least one default?

Because "the user might not pick anything." Without a default, the Skill either errors on empty selection or silently uses the first option. Explicit default = predictable behavior. In the UI, default is usually marked "(Recommended)" to nudge a quick decision.

Six curated options beat twenty raw ones, every time.


14. HTML (web markup) templates — the pretty output

14.1 What the spec says

One-line definition: HTML templates are a Skill's typesetter — when you need to generate a polished HTML report, email, or card, the data drops into a template.

If scripts handle "calculate," templates handle "look." Raw data is structured info; templates turn it into a professional report with titles, tables, color schemes — the difference between an Excel sheet and a slide deck.

Location: reference/templates/

Typical directory layout
File / Folder Use
report.html Report template
email.html Email template
card.html Card template
shared/ Shared style folder
shared/base.css Base styles
shared/components.css Component styles
Variable placeholder syntax

Templates use double curly braces to mark substitution points:

Syntax Notes Use
Double-brace name Simple variable substitution Title, name
Double-brace dotted path Nested fields Username, nested data
List loop syntax Iterate an array List items
Conditional render Show/hide on condition Score highlighting

Note: template double braces and step-doc single braces are different systems. Single braces are workflow vars (substituted by the main Agent). Double braces are template vars (substituted by the rendering engine).

Rendering options

Two ways, choose by need:

  • Simple substitution: for low-variable templates, just string replacement
  • Template engine rendering: for templates with loops and conditions, use a real templating engine
Shared styles

Styles shared across templates go to shared/: base reset, fonts, containers, headings, paragraphs in base.css; cards, badges, tables, tags and other reusable components in components.css.

Inline styles win — important rule. If the template generates emails, external CSS files won't load in mail clients; only styles inlined on HTML tags work. Email templates must inline; web templates can reference external CSS.

Banned
Banned Why
External CDN (content delivery network) links Offline environments can't reach them
JavaScript (web scripting language) logic Templates display only; logic belongs in scripts
Hard-coded sensitive data Inject as variables at render

14.2 Why it's designed this way

Why a template instead of stringing HTML in scripts?

Stringing HTML inside scripts is like writing an essay with print — quotes, escapes, indents everywhere; changing one style means trawling a wall of code.

Separating templates lets a designer change HTML without touching code, and a programmer change logic without touching style. Crisp roles.

Why ban JavaScript?

Templates are a "pure presentation layer" — receive data, render the page, done. JavaScript turns a template into an "app."

All logic should live in scripts. Data gets prepared; the template just makes it look right. Same root as "a function does one thing."

Why ban external CDNs?

A Skill might run offline, on an internal network, or on a plane. If the template references external stylesheets, broken network = broken layout. Bundle everything in shared/. Renders perfectly offline.

Same family as credential independence — no external dependencies, everything self-contained.

Data carries the truth; the template makes it presentable. Different jobs.

Try this: scan your Skill code. Any hard-coded strings? Move them to definitions/.


Part 5 — Engineering (the craft)

Engineering layer — isometric workshop with wrench, guide scroll, changelog ledger, and error-triage diagram

A Skill I'd built half a year ago, run hundreds of times, dead stable. One day I switched laptops — and it wouldn't run.

The error was: ModuleNotFoundError: No module named 'httpx'

Took me 20 minutes to remember: this laptop didn't have the venv set up. If I'd written a setup.md back then, 2 minutes.

That's the value of engineering. The previous parts made the Skill run and run well; this part makes it run reliably. Environment init lets others reproduce your environment. The user guide lets non-technical users get going. Version history records the trajectory. Troubleshooting leaves a paper trail when things break.

You may think this is the icing. Trust me, finishing the code is the start. Letting other people use it, maintain it, and debug it is what separates a hack from a craft.

Picture cooking a great dish. Without a recipe, no one can reproduce it. Without an ingredients list, the wrong ingredient ruins it. Without notes on pitfalls, the next cook trips over the same rake. The engineering docs are your recipe, ingredients list, and pitfall notes.


15. Environment init — writing setup.md

15.1 What the spec says

One-line definition: setup.md is a Skill's emergency manual — when a user hits a technical error, they open it and follow the recipe to fix the environment.

You may think setup.md is "writing nobody reads." Until the day a Skill you haven't touched in three months errors out and you fix it in two minutes by opening setup.md — that's when you'll thank yourself.

Note the keyword: "errors out." setup.md is not a usage tutorial, not a feature intro — it's a problem-driven repair guide.

How it differs from guide.md
File Role For whom When opened
setup.md Fix the machine Technical users When you hit ModuleNotFoundError, API 401
guide.md Teach usage Non-technical users When the question is "how do I use this thing"

Memory aid: setup fixes the machine, guide teaches usage.

When do you need setup.md?
Condition Needed?
No external deps (pure-doc Skill) No
Has script deps Yes
Needs API credentials Yes
Has special environment requirements Yes

Simple test: if your Skill is just SKILL.md + workflow/, no scripts/ or credentials/, you don't need setup.md.

Standard chapter structure

A complete setup.md has five chapters:

1. Install location — where the Skill can be installed (user, project, enterprise) and the precedence.

2. Runtime environment — runtime requirements: Python version, package manager, etc.

3. Dependency install — installing the package manager, entering the script directory, installing deps, running scripts. Full step list.

4. Credential config — which credentials are required, file locations, mandatory or not, how to obtain them.

5. Error troubleshooting — the heart of setup.md. Table form: error class, symptom, possible cause, fix, verify.

Six-layer error classification

Troubleshooting uses a layered diagnostic model, bottom up:

Layer Class Typical errors
L1 Runtime Python version too old, missing env var
L2 Dependencies Module not found, version conflict
L3 Credentials 401 unauthorized, key format wrong
L4 Network Connection timeout, rate limited
L5 Path File not found, permission denied
L6 Progress State lost, resume failed

Diagnostic order: bottom up. Like fixing a computer — check power first (L1), then hardware (L2), then software (L3–L6). If the power cable's out, no point looking elsewhere.

15.2 Why it's designed this way

Why split setup.md from guide.md?

Different readers, different problems.

A non-technical user opening setup.md and seeing terminal commands and error stacks is confused — they just want to know how to call the Skill. A technical user opening guide.md during an error sees "how to trigger" and "where output goes" — they wanted fix commands.

Splitting means each doc serves one audience. Higher info density, faster lookup.

Why layer the troubleshooting?

Errors have dependencies. Wrong runtime version (L1) → dependencies fail (L2) → no point looking at higher layers.

Layering imposes order — clear the lowest layer first, climb up. Saves time on phantom upper-layer hunts. Same idea as the OSI network model's layered debug: physical → link → network → application. Lower layer broken means upper layer broken.

Six-layer error classification is a body scan — bones to brain, layer by layer.


16. User guide — writing guide.md

16.1 What the spec says

One-line definition: guide.md is the Skill's product manual — for first-time non-technical users, simplest language, answers "what is this, how do I use it, where's the output."

When is it needed?
Condition Needed?
Single-step Skill No (SKILL.md is enough)
Workflow ≥ 3 steps Recommended
SKILL.md > 300 lines Recommended
Built for non-technical users Mandatory
Core chapters

A guide.md answers four questions:

1. What does this Skill do? Two or three sentences on function and value, from the user's POV — focus on "what you'll get." Crucially, add a "for example" — concrete scenarios beat abstract descriptions every time.

Example: "Type @some-design-creator and 10 minutes later you'll have a report covering their post cadence, top topics, and writing style."

2. How do you call it? Three call methods: slash command (recommended), natural-language trigger, parameterized invocation.

3. Inputs and outputs — table form: what's the input, where's the output file, what format.

4. Usage flow — numbered, one thing per step.

Term consistency — call them "Skills" everywhere (not "skills," not "plug-ins"); call it "invoke" everywhere (not "execute," not "run").

Examples must be runnable — give real values, copyable straight into the terminal.

16.2 Why it's designed this way

Why does guide.md only answer "what / how", not technical detail?

Because the audience is non-technical. They don't care how many steps you have inside, what scripts, or how the SubAgent Prompt is written. They care about three things: what does this do for me, how do I start it, where's the output.

Technical detail belongs in SKILL.md (for developers). Environment problems belong in setup.md (for ops). guide.md is "the manual a PM would write," not "the manual an engineer would write."

A good doc feels elegant to the smart and simple to the new.


17. Version history — writing changelog.md

17.1 What the spec says

One-line definition: changelog.md is the Skill's growth diary — every version's changes recorded, so anyone can trace the evolution.

Versioning: SemVer

Format: vX.Y.Z

Position Name Triggered by Example
X Major Incompatible architectural change v2.0.0 (full rewrite)
Y Minor New features, backward compatible v1.2.0 (new step)
Z Patch Bug fix v1.2.1 (fix bug)
Quick lookup
What you did Version bump
Added a workflow step Minor +1
Added a parameter (backward compatible) Minor +1
Removed / renamed a parameter Major +1
Bug fix Patch +1
Performance improvement Patch +1
Full rewrite Major +1

See v1.2.0 → v1.3.0, you know it's a new feature, backward compatible, safe to upgrade. See v1.3.0 → v2.0.0, you know there are breaking changes, read the changelog before upgrading.

Version numbers carry meaning. Better than incrementing integers.

Change classes

Each version groups changes by type: New features, Improvements, Fixes, Changes, Removed.

Writing rules
Rule Notes
Reverse chronological Latest version on top
Explicit dates Format YYYY-MM-DD
Clear classification Group by the types above
User-facing Describe impact, not implementation

17.2 Why it's designed this way

Why reverse chronological?

Because users care most about the latest version. They open the changelog to know "what just changed," not to read from the v1.0.0 origin story. Latest on top, one glance gets it. Like news headlines — the freshest is the headline.

Why describe in user-facing terms?

Compare:

  • Developer-facing (bad): "Refactored merge_results in batch_processor.py, moved time complexity from O(n²) to O(n log n)"
  • User-facing (good): "Optimized batch merge performance; large data set processing 50% faster"

Users don't care which function you touched or which algorithm you used — they care what it means for them.

Version numbers carry meaning — see v2.0.0, you know to be careful.


18. Troubleshooting — writing troubleshoot.md

18.1 What the spec says

One-line definition: troubleshoot.md is the Skill's repair manual — broader and more systematic than setup.md, covering every error a user might hit at runtime.

Difference vs. setup.md: setup.md focuses on "environment init" (install, configure); troubleshoot.md covers "runtime errors" (problems hit during execution). If setup.md is the renovation guide, troubleshoot.md is the daily repair manual.

Keep the six-layer classification from setup.md (L1 runtime → L6 progress, see Chapter 15) and list the runtime-specific errors here: context overflow, Agent stuck, Schema validation failure, pagination losing data, paths not expanded, corrupt progress file, etc.

"Skill loaded but Claude ignores instructions" (official diagnosis)

Anthropic's guidance specifically addresses "the Skill loaded but Claude isn't following instructions" with four common causes:

Cause Fix
Instructions too verbose Stay concise, use bullets and numbers, push detail to references/
Critical instructions buried Put them at the top under a ## Critical heading
Vague language Replace "make sure to validate" with a concrete checklist
Model laziness Add "do not skip the validation step" in the user prompt (not SKILL.md)

Pro tip: for critical validation, use a script instead of natural-language instructions — code is deterministic, language understanding isn't. This dovetails with my "use a script if you can" principle.

Five-layer validation

For critical outputs, validate layer by layer:

Layer Check On failure
1 File exists and non-empty Retry
2 Format valid (parses) Retry
3 Required fields present Retry
4 Field values in range Mark anomaly
5 Business rules pass Mark failed

Layers 1–3 can auto-retry — file missing or format wrong is usually transient. Layer 4 onward, the data itself may be the problem; auto-retry is pointless, mark and let a human look.

Retry strategy

See the error-recovery table in Chapter 4 (exponential backoff, 429 wait, 4xx no retry). Add one runtime-specific rule: timeout error → bump the timeout, retry.

Cross-step recovery

When a Skill dies mid-step, how do you resume?

  1. Read the progress file state/progress.json
  2. Inspect each step's status
  3. All steps complete → workflow done
  4. Current step running → continue from current
  5. Current step failed → judge if retryable (timeout / rate limit yes; 401 / format error no, needs rollback)
  6. Earlier step failed → restart from that step

Context-rebuild on resume:

  1. Read the progress file (where am I)
  2. Check the resume hint (which executor)
  3. Read the corresponding step doc (how to do it)
  4. Verify earlier steps' output files exist (deps intact)
  5. Continue from breakpoint

Resume isn't a tech feature. It's respect for the user's time.

18.2 Why it's designed this way

Why five layers, not just generic error catching?

Generic catching only tells you "something failed." Five-layer validation is a body scan — bones, blood, heart, lungs, brain — pinpointed to a system.

When layer 3 (field completeness) fails, you know the file format is fine and parsing is fine, but some fields are missing — likely an upstream output template change. Compared to a vague "parse error," "missing fields: score, category" lands you on the bug instantly.

Why no retry on 4xx?

4xx means "client error" — your request itself is wrong. 401 = invalid key, 403 = no permission, 404 = resource doesn't exist. Retry the same invalid request, same result. Forever.

5xx means "server error" — server momentarily struggling. Wait and retry; the server may recover.

Why cross-step recovery?

A 6-step workflow dies at Step 4 — network timeout, rate limit, context overflow, many causes. Without recovery, the user reruns from Step 1 — three steps of work wasted.

With a progress file and recovery flow, the rerun starts at Step 4. Earlier outputs are still on disk; no rework.

Try this: write a guide.md for your most-used Skill — in the language a non-technical friend would understand.


OK.

Eighteen chapters in. I've taken my Skill spec apart down to the bones. You should have an architecture diagram in your head now — five floors, skeleton to QC.

But I know what you're thinking:

"All this talk — when do I actually build something?"

Hold on. The next 20 minutes, you're going to ship your first Skill with your own hands.

And the moment it runs, you'll really get it: getting AI to do good work doesn't need black magic. It needs an instruction set that's clearly written.


Hands-on: build your first Skill from scratch

Hands-on first Skill — isometric 20-minute timer above five numbered sticky cards and three minimal documents

The 18 chapters covered the "what" and "why." This chapter is the "how" — five steps, real runnable Skill, from zero.

I picked an example simple enough to follow yet broad enough to touch the core: a one-shot article translation Skill. Input: a Markdown file path. Output: the translated version.

Step 1: define the requirement (1 min)

One-line spec: given a path to an English Markdown article, translate it to your target language, preserve original formatting, write the result to the run directory.

Executor analysis: translation needs semantic understanding — brain work → SubAgent. Whole Skill is 2 steps: init + translate. Simple enough for a first build.

Step 2: create the directory structure (2 min)

In your Claude Code skills directory, create:

awp-content-article-translating/
├── SKILL.md                          # Entry file (required)
└── workflow/
    ├── step01-init.md                # Init
    └── step02-translate.md           # Translate + output

Three files. No scripts, no credentials, no config — because this Skill doesn't need them. Remember progressive enhancement: if you don't need it, don't create it.

Step 3: write SKILL.md (5 min)

---
name: awp-content-article-translating
description: Translates an English Markdown article to the target language, preserving original format and structure. Triggers on "translate article", "translate this", "convert to ZH/EN".
---

# Article Translation Skill

## Workflow

| Step | Role | Executor | Doc | Input | Output |
|------|------|----------|-----|-------|--------|
| 01 | Initialize | Main Agent | step01-init.md | User-provided file path | state/ |
| 02 | Translate output | SubAgent | step02-translate.md | Source file path | output/ |

## Execution rules

- Progressive disclosure: load one step, run one step
- SubAgent returns minimal status; translated text written to file

Notice the four-part name: xiangyu (prefix) - content (domain) - article (object) - translating (action).

Frontmatter only has the two required fields: name and description. The description includes trigger conditions — when the user says "translate article," Claude knows to call this Skill.

The workflow table is 6 columns, 2 steps. Glanceable.

Step 4: write the step docs (10 min)

step01-init.md (init):

# Step 01 — Initialize

- Executor: Main Agent
- Input: User-provided file path
- Output: state/progress.json

## What to do

1. Receive the user-provided Markdown file path
2. Verify file exists and is `.md`
3. Create the run directory under the Skill: runs/{keyword}-{timestamp}/
4. Create state/progress.json with the source path and current state
5. Create the output/ directory

## Validation

- [ ] Source file exists and is readable
- [ ] runs/ has the new run directory
- [ ] state/progress.json exists

## Next

→ step02-translate.md

step02-translate.md (translate + output):

# Step 02 — Translate output

- Executor: SubAgent (general-purpose)
- Input: source file path (from state/progress.json)
- Output: output/translated.md

## What to do

Launch a SubAgent with this Prompt:

> You are a professional translator.
>
> Read the file at {input_path} and translate it to the target language.
>
> Translation requirements:
> 1. Preserve all Markdown formatting (headings, lists, code blocks, links)
> 2. Keep technical terms in English with parenthetical translations
> 3. Natural prose, not "translation-ese"
>
> Write the result to {output_path}/translated.md.
>
> When done, return a single line: "Translation done. N paragraphs. Output: {output_path}/translated.md"

## Validation

- [ ] output/translated.md exists and is non-empty
- [ ] File is valid Markdown
- [ ] Heading hierarchy matches the source

## Next

End of workflow. Report the output path to the user.

Three key design points:

  1. Pass paths, not contents — the Prompt has {input_path}; the SubAgent reads the file itself
  2. Minimal returns — only one status line
  3. Validation checkpoints — explicit way to confirm the step worked

Step 5: run and verify (2 min)

In Claude Code:

/awp-content-article-translating ~/Documents/some-english-article.md

Or natural language:

Translate this article for me: ~/Documents/some-english-article.md

Claude follows the workflow table: Step 01 init → Step 02 launch SubAgent → return result path.

Verify the output: check runs/{keyword}-{timestamp}/output/translated.md for existence, format, and translation quality.

Common issues

Problem Cause Fix
Skill not in slash menu Wrong directory or bad SKILL.md format Confirm it's under ~/.claude/skills/, check Frontmatter syntax
SubAgent didn't read the file Path was relative Switch to absolute path
Translation result is empty SubAgent generated text but didn't write the file Check that the Prompt explicitly says "write to file"
Run directory not created Step 01 init logic missing Confirm Step 01 includes the directory-creation instruction

Where to go from here

This 2-step Skill covers ~80% of the core: entry file, workflow table, step docs, variable placeholders, minimal returns.

Want to level up? Try these extensions:

  • Add a script: a Python script that counts source vs. translated word counts (Chapter 5)
  • Add config: let users pick the target language (Chapter 9)
  • Add presets: "academic," "conversational," "technical" translation presets (Chapter 13)
  • Add a template: generate a side-by-side bilingual HTML report (Chapter 14)

Each layer maps to a chapter you've already read. Flip back and you'll find — the spec isn't a cage. It's the scaffolding that helps you build better.

3 files, 20 minutes — your first Skill is alive.

Going deeper: Anthropic's five official design patterns

Anthropic's guide distills five validated Skill design patterns. When you graduate from beginner to designing more complex Skills, these are your reference frame:

# Pattern When to use Core technique
1 Sequential workflow orchestration Multi-step flows that must run in a specific order Explicit step deps, per-step validation, rollback on failure
2 Multi-MCP coordination Workflows spanning multiple external services Stage separation, MCP-to-MCP data passing, centralized error handling
3 Iterative refinement Output quality needs progressive improvement Clear quality bars, validation scripts, knowing when to stop
4 Context-aware tool selection Same goal, different tools depending on context Decision trees, fallbacks, explain choice to user
5 Domain-expert intelligence Skill provides expertise beyond tool use Compliance up front, audit trails, domain rules embedded in logic

The workflow design in my spec maps mainly to Pattern 1 (sequential) and Pattern 3 (iterative). If your Skill coordinates multiple MCP tools or embeds domain expertise, Patterns 2 and 5 will be your reference.

Detailed material: Anthropic's official guide, Chapter 5: Patterns and troubleshooting.

Going deeper: eval-driven development — test before you write

Anthropic best practice proposes a counterintuitive method: build evaluations first, write the doc second.

Most people: write a wall of doc → test → find problems → revise. Anthropic flips it:

  1. Identify the gap: don't write the Skill yet. Have Claude attempt your target task. Note where it fails, what context it lacked.
  2. Build evals: turn those failures into 3 test cases.
  3. Establish a baseline: record Claude's performance without the Skill.
  4. Write minimal instructions: just enough to pass the evals — not more, not less.
  5. Iterate: run evals, compare to baseline, refine.

This makes sure you're solving real problems, not imagined ones.

My take: eval-driven development is gold for the "Claude should do this better but I can't articulate what's wrong" scenarios. Quantify the gap first, then patch it precisely.

Going deeper: Claude A/B iterative development

Anthropic's recommended Skill development rhythm is a dual-instance loop:

  • Claude A (designer): helps you design and refine the Skill doc
  • Claude B (user): loads the Skill and executes real tasks

Loop:

  1. Without the Skill, complete a task with Claude A normally. Note which context you keep re-providing.
  2. Have Claude A bundle that context into a Skill.
  3. Audit for brevity — strip what Claude already knows.
  4. Test with Claude B (fresh conversation + Skill loaded) on similar tasks.
  5. Watch where Claude B drifts; bring specific issues back to Claude A.
  6. Loop 4–5 until satisfied.

Why it works: Claude A understands the agent's needs; you bring domain expertise; Claude B exposes gaps via real use. Three-way complementarity, every iteration grounded in observation rather than guesswork.

Going deeper: cross-model testing

Anthropic best practice reminds you: Skills behave differently on different models. If you need to run on multiple models, test each:

Model Test focus
Haiku (fast, cheap) Does the Skill provide enough guidance? Smaller models may need more explicit instruction.
Sonnet (balanced) Is the Skill clear and efficient?
Opus (deep reasoning) Does the Skill avoid over-explaining? Big models don't need hand-holding.

A Skill perfect for Opus may be too sparse for Haiku. If your Skill needs to span models, the goal is instructions that work for all target models.


Appendix


Appendix A — directory structure quick reference

The three forms of Skill directory, minimal to complete. Build only what you need.

Minimum viable (one file):

File Notes
SKILL.md Entry file (the only required file)

Typical light (3-step workflow):

File / Folder Notes
SKILL.md Entry file
workflow/step01-init.md Step 1: init
workflow/step02-process.md Step 2: process
workflow/step03-output.md Step 3: output

Typical full (with scripts and config):

Folder / File Use
SKILL.md Entry file (required)
workflow/ Workflow step docs
workflow/step01-init.md Init
workflow/step02-collect.md Collect
workflow/step03-analyze.md Analyze
workflow/step04-eval.md Score
workflow/step05-output.md Output
workflow/step06-report.md Report
workflow/clone/ Multi-mode: Clone steps
workflow/timeline/ Multi-mode: Timeline steps
workflow/shared/ Multi-mode: shared
scripts/python/ Python scripts (primary)
scripts/python/collect.py Collect script
scripts/python/batch.py Batch script
scripts/python/merge.py Merge script
scripts/python/shared/ Shared modules
scripts/python/pyproject.toml Dep manifest
scripts/node/ Node scripts (optional)
scripts/shell/ Shell scripts (optional)
config/default.json Defaults (L2 config layer)
config/params.schema.json Param Schema
credentials/ Credentials
credentials/.gitignore Ignore real keys
reference/definitions/ Constant definitions
reference/presets/ Preset configs
reference/prompts/ Prompt templates
reference/templates/ Output templates
reference/templates/shared/ Shared styles
runs/ Run-data folder (auto-generated)
runs//state/ State
runs//output/ Final output
runs//batches/ Batch data
docs/setup.md Environment init
docs/guide.md User guide
docs/changelog.md Version history
docs/troubleshoot.md Troubleshooting
.gitignore Git ignore rules

Appendix B — spec file index

The 17 spec files in my Skill methodology, split into core and supplemental.

Core specs (8)
# Name One-liner
1 SKILL.md spec Entry-file naming, structure, metadata, workflow-table conventions
2 Step doc spec Step naming, executor selection, validation checkpoints
3 Script spec Script directory, standard template, return value, API call rules
4 Platform constraints Claude Code's hard limits: 200K default context, tool limits, runtimes
5 Run-data spec Run directory layout, progress file, keyword extraction
6 Param config spec Three-layer config (interactive / config / preset)
7 Param Schema spec Param structure, field types, dot path, preset binding
8 Prompt template spec SubAgent prompt structure, path-first, allow-lists
Supplemental specs (9)
# Name One-liner
9 Variable placeholders Single source of truth for workflow and platform variables
10 Credential management JSON-only credential format, auth types, independence principle
11 Constant definitions Domain-constant org rules, kill magic strings
12 Preset configs User-choice data source format and writing rules
13 HTML templates Output template variable syntax, shared styles, render modes
14 Environment init setup.md standard: runtime config, deps, six-layer error classes
15 User guide guide.md standard: feature overview and use flow for non-technical readers
16 Version history changelog.md standard: SemVer, change classes
17 Troubleshooting Six-layer error classes, five-layer validation, retry strategy, cross-step recovery
Best lookup path

Not sure which spec to read? Locate by what you're writing or changing:

You're writing / changing Read
SKILL.md entry file SKILL.md spec
workflow/ step doc Step doc spec
scripts/ Script spec
reference/prompts/ Prompt template spec
config/ Param config + Param Schema spec
credentials/ Credential management
reference/definitions/ Constant definitions
reference/presets/ Preset configs
reference/templates/ HTML templates
docs/ Corresponding doc spec
Not sure Start from the spec index

Closing notes

Closing notes — isometric action checklist clipboard and compass pointing to a four-lane roadmap of the 100 Skills series

I spent six months grinding this Skill spec. The whole point was to solve one problem: how do you get AI to complete complex tasks reliably, repeatably, predictably?

Now that the article is done, I want to talk about something bigger.

Why are we writing operating manuals for AI?

On the surface, to make AI complete tasks better. But underneath — we're encoding our own thinking into runnable instructions.

The act of writing a Skill is doing something genuinely scarce: turning tacit knowledge into explicit knowledge.

Your gut sense of "what makes a good article" becomes a quantifiable, repeatable, teachable workflow. Your experience of "what makes a good product" becomes a Skill anyone can run.

Naval said: code and media are leverage with zero marginal cost. Every Skill you build is one act of code leverage — write once, reuse infinitely. You're not using the AI; you're creating a digital twin that's faster than you.

If you could only build one Skill, what would it be?

There's an old Chinese military saying: "Build solid camps and fight stupid wars." (结硬寨,打呆仗 — Zeng Guofan's principle: instead of brilliant maneuvers, dig in deep, build defenses, win through patient discipline.) That's the spirit of this Skill spec, too. Don't chase fancy Agent architectures or showy multi-turn dialog. Just clearly write each step. Kill every branch you can. Put every piece of data exactly where it belongs.

Patient work is the smartest work.

The spec isn't a leash on creativity. The spec is the infrastructure for creativity.

What part of AI workflow is the hardest for you?


A whirlwind of the whole thing:

Part 1, I told you what a Skill is — an SOP you write for AI, document-driven not code-driven.

Part 2, I told you how a Skill executes — scripts do labor (zero context cost), SubAgents do brain work (only when thinking is required), variables string the parts together.

Part 3, I told you where data lives — one folder per run, the progress file as heartbeat, three-layer config separating params by change rate.

Part 4, I told you how to make a Skill good to use — safe credentials, magic-string-free constants, elegant user choices, beautiful output templates.

Part 5, I told you how to make a Skill reliable — environment init, user guide, version history, troubleshooting.

Code is the crystallization of thought. Architecture is the embodiment of philosophy. Every line of code is one more re-understanding of the world; every refactor is one more approximation of the essence.

If you read this far, here's what I want to say to you: you've already passed 99% of AI users.

Not because you're smarter than other people. Because you're willing to spend time understanding AI's "operating system" — instead of rolling dice in a chat box.

You're no longer "an AI user." You're "an AI workflow designer."

That identity shift decides whether your relationship with AI is "I sometimes use it" or "I systematically create value with it."

Action checklist

Reading without doing equals not reading. Four concrete next steps:

  1. Run the hands-on: go back to the Hands-on chapter, spend 20 minutes building your first translation Skill. Hands-on beats theory.
  2. Convert one repetitive task: think about something you do every day — weekly reports, note clean-up, format conversions. Pick one and turn it into a Skill using this spec. The best learning is solving real problems.
  3. Join AI Workflow Pro membership: if you want a full library of production Skills, configuration templates, and the unabridged 78,000-word spec, the AWP membership covers that — built around Claude Code, with complete source for video automation, content publishing, SEO, e-commerce, and more.
  4. Explore the official resources: visit Anthropic's Skills repo (GitHub: anthropics/skills) for official examples. Join the Claude Developers Discord for dev exchange. Anthropic has also published deep-dive blog posts on Skill design — from frontend optimization to agent equipment guides — worth a read.

Simplification is the highest form of complexity. Branches that can disappear are always more elegant than branches you can write correctly.

FAQ

Q: Can a non-coder build Skills?

Absolutely. The simplest Skill is one SKILL.md file — pure Markdown, no code involved. You only need scripts when you call APIs or process data, and Claude can write those for you. I've seen plenty of non-technical people build very effective pure-doc Skills.

Q: What's the difference between a Skill and an MCP?

MCP (Model Context Protocol) is the standard interface that lets AI call external tools — search engines, browsers, databases. A Skill is the operating manual that tells AI how to follow a fixed workflow to complete a task. They're complementary: Skill steps can call MCP tools. Analogy: MCP tools are the hammer and screwdriver; the Skill is the renovation manual that tells you when to use which.

Anthropic uses an even sharper analogy — the kitchen model: MCP tools are the oven and mixer (appliances), Skills are the recipes, Claude is the chef who can read recipes and use appliances.

Q: How many steps can a Skill have?

No hard cap, but I keep it to 6–8. Past 10 steps, ask whether it should split into multiple Skills. More steps = harder debugging, larger context cost. Remember: simplification is the highest form of complexity.

Q: How many Skills do you have right now?

At the time of writing, 40+ production Skills across content creation, social ops, video production, dev assist, data analysis. Each one ground through hundreds of real runs.

What's next

I'm working on a series — "Building 100 Skills, in the open" — covering four directions:

Skill infrastructure — spec guides, self-healing QA, Skill search engine
Content creation — full blog auto-publishing, viral optimization engine, AI illustration workflow, smart short-form video editing
Vertical industries — an 8.5-million-word legal library AI assistant, TikTok Shop product analysis system
Tool integration — turning n8n workflows into Skills

Each one a deep teardown of a production Skill — design rationale, architectural decisions, real pitfalls — with full prompts so you can clone the pattern.


Source pack: the AWP Skill Development Spec (18 files)

You can build a Skill from the 30,000 words above. But there's a real difference between reading the rules and owning the runnable spec your AI checks against.

The 18-file pack is what I drop into every new project's .claude/skills/ directory before I write a single line of Skill code. Claude reads it on demand and designs new Skills against the same constraints I use myself — I stop having to re-explain "wait, what was the rule for X again?" in every conversation.

Why bother downloading it instead of just re-reading this article?

  • Stop re-explaining. The article is for you to read once. The pack is for Claude to read every time. Drop it once and Claude designs Skills inside the lines automatically — for the next 50 Skills you build.
  • Saves ~6 hours per Skill. Real measurement, not marketing. That's the time I used to lose to "go look up the constraint, come back, re-explain it, re-prompt." Pack lives next to your project; the loop disappears.
  • Field-tested against 40+ shipped Skills. Every edge case I hit got folded back into the spec. It's the literal version I check my own work against before any release.
  • Updated for the 2026 Claude Code surface. Includes the recent additions a lot of older guides miss: file-patterns activation, shell selector, trigger-context, Hook if filtering with agent_id/agent_type, @-mentioned sub-agents, the ExitWorktree tool, plugin monitors, the 500-line SKILL.md ceiling, the YAML single-line description rule that prevents the indexer bug.
  • 18 files, all cross-referenced. Self-contained. No mystery dependencies. CLAUDE.md at the root tells Claude exactly which file to read for which question.
  • Cross-model verified. Same pack tested on Haiku / Sonnet / Opus — designed so a smaller model still has enough guidance and a larger model isn't drowned in over-explanation.
  • One-line install, one-line reference. Unzip into ~/.claude/skills/, add a single pointer in your project's CLAUDE.md, done.

If you've ever felt the difference between reading a recipe and handing the cookbook to someone who's actually going to cook tonight — that's the difference between this article and the pack.

What's in it (47K English words, 350 KB unzipped, 18 files):

# File What it nails down
1 CLAUDE.md Spec index — Agent entry point
2 skill-development-spec.md Design philosophy + organizing framework + reading paths
3 skill-md-spec.md SKILL.md naming / directory / frontmatter / workflow definition (all 16 fields, including file-patterns, shell, trigger-context)
4 step-documents.md stepNN-{action}.md structure, executor selection, parameter collection (free-text Q&A), brand-experience spec
5 context-management.md Four-tier context acquisition (direct read / fixed path / MCP snippet / hybrid), eight knowledge dimensions, three loading modes
6 script-spec.md Multi-runtime layout, HTTP API rules, MCP fully-qualified naming, "Solve, don't punt"
7 platform-constraints.md Hard tool limits, model matrix, 21-event Hook matrix, Agent Teams, Worktree isolation
8 run-data-spec.md runs/ directory, progress.json format, keyword rules, resume_hint recovery
9 parameter-config-spec.md Three-layer config (L1/L2/L3), preset & constant definitions
10 prompt-template-spec.md SubAgent Prompt structure, path-first principle, instruction-freedom design
11 variable-placeholders.md Single source of truth for workflow + Claude Code official variables
12 credential-management.md Markdown-only credentials, dual-mode loading (L0 env / L1 Skill / L2 KB)
13 html-template-spec.md Output template structure, variable placeholders
14 environment-setup.md setup.md standard: uv environment, dependencies, six-layer error classification
15 getting-started.md guide.md standard: end-user onboarding template
16 troubleshooting.md Six-layer error classification, five-layer validation, retry strategy, cross-step recovery
17 testing-spec.md EDD, Claude A/B iteration, cross-model testing, full release checklist
18 multi-mode-spec.md Eleven design patterns (P1–P11), anti-pattern checklist

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to AI Workflow Pro.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.