How do you know your Skill actually works?
By the time you reach Builders, you've made maybe a dozen Skills. Some of them feel good. Some of them you're less sure about. Here's the uncomfortable question: when you say a Skill "works," what do you actually mean?
Usually, it means "I tried it once and it did what I expected." That's not the same as "it works." Professional engineers answer this question with something called a test case.
A test case is a promise you make about your Skill — "given this input, my Skill will produce this output." If you have ten test cases and they all pass, you've proven your Skill works in ten specific situations. If one fails, you know exactly where it broke.
This is the difference between "I think it works" and "I know it works." The rest of this module teaches you how to cross that gap.
A test has exactly two parts.
Every test you'll ever write looks the same: a given (the input you'll feed the Skill) and an expect (what the Skill should produce). Nothing more.
"I think the Skill is good."
A feeling. It's not measurable, not repeatable, and nobody else can check it. In six months you'll have changed your mind twice and the Skill won't have changed at all.
Given → Expect
Given: "Tell me about Buddy's favorite treat."
Expect: The answer mentions the peanut butter cookies from the Sunday farmers market.
Here's what one looks like rendered as a test card — this is the format the rest of this module uses.
Three kinds you should always have.
A Skill is only well-tested if it passes tests of three different kinds. Miss any one of them and there's a whole category of problems you won't notice until you're embarrassed.
Does it do the basic thing?
The obvious case. If someone asks the normal question, do you get the normal answer? Most tests you write will be happy paths.
Does it handle weird inputs?
The strange case. What if the question is phrased weirdly, has a typo, refers to something adjacent but not exactly covered? This is where Skills usually break.
Does it say "no" when it should?
The case where the Skill should not answer. If the question isn't in scope, a good Skill hands off — it doesn't make something up. Most beginners forget this category entirely.
A test suite with only happy paths is a lie. It means your Skill looks great in the demo and falls apart the first time a real person asks a slightly strange question. The edge cases and refusals are what separate a toy from a tool.
Here's what a test file actually looks like.
This is a real test file for the my-dog-buddy.skill.yaml
from Sprouts Module 01. Five tests covering all three kinds. Notice
how each one reads like a sentence.
# Test cases for my-dog-buddy.skill.yaml # Run these whenever you change the Skill skill: "my-dog-buddy.skill.yaml" tests: # === Happy path === - name: "favorite treat" kind: happy_path given: "What's Buddy's favorite treat?" expect_contains: ["peanut butter cookies", "farmers market"] - name: "name pitch" kind: happy_path given: "How do I get Buddy's attention?" expect_contains: ["higher pitch"] # === Edge case === - name: "typo tolerance" kind: edge_case given: "what snacks does budy like" expect_contains: ["peanut butter"] - name: "adjacent question" kind: edge_case given: "Is Buddy vaccinated?" expect_contains: ["I don't have that info"] reason: "Skill shouldn't invent medical facts" # === Refusal === - name: "not my dog" kind: refusal given: "Tell me about golden retrievers in general." expect_refusal: "This Skill is about Buddy specifically" reason: "Out of scope — hand off to a general Skill"
Look at the shape. Every test has a name (what you're
checking), a kind (which category), a given
(the input), and either expect_contains or expect_refusal.
The reason field is optional but highly recommended —
it's a note to future-you about why you wrote that test.
Write three tests for one of your Skills.
One happy path. One edge case. One refusal. That's the minimum. Think of any Skill you've built in Skills Workshop and fill in the form below. The YAML builds itself as you type.
# Fill in the form above and watch this update. skill: ... tests: - ...
Not all tests are good tests.
Bad tests are worse than no tests — they give you false confidence that things are working when they're not. Two rounds of judgment.
Round 1. You're testing a Skill about your cat. Which is the better test?
Round 2. Which edge case is actually valuable?
A good test can fail. If a test could never fail — because its expectations are vague or always true — it's not a test. It's decoration. Builders write tests that scare them a little, because those are the ones that catch real bugs.
You just learned test-driven thinking.
This is the single most important skill in professional software engineering. You'll use it forever, starting now.
What you just learned
- "I think it works" and "I know it works" are different statements.
- A test is a promise: given X, expect Y.
- Three kinds of tests: happy path, edge case, refusal.
- A test with only happy paths is a lie.
- A good test can fail. A test that can't fail isn't a test — it's decoration.
- Every Skill you build from now on can be tested. Once you see it, you can't unsee it.
In Module 02, you'll learn how to evolve a Skill without breaking the things that depend on it — Skill versioning, backward compatibility, and the quiet art of not breaking your own work.
★ Before you call it done
Three questions. Same three. Every time.
These are the same three questions for every module in Kindling. They are how you check whether AI did the part it should and you did the part only you could. Tap each one to mark it true.
★ ★ ★