An agent that does whatever you tell it is dangerous. An agent that asks before everything is useless. The interesting region is between those two — and that region is engineered, not assumed.
This project is where you write the rules for that region: when does the agent just do? When does it ask? When does it refuse?
An agent that refuses to delete every photo Grandma took last year — even though Grandma asked it to — is acting for her, not against her. The deepest empathy is sometimes the agent that pauses you.
Step by step
-
Define 3 permission tiers.
The simple ladder: SUGGEST → ASK → DO.
- SUGGEST — the agent recommends something, but never acts.
- ASK — the agent says "I'm about to do X — tap OK." Only acts on confirmation.
- DO — the agent acts. Logs every action. User can review the log later.
-
Map every action your agent does to a tier.
From your project-05 spec, look at every action in the "does" list. Pick a tier for each. Defend your choice in one sentence: "send_4pm_sms is at DO because the cost of a wrong SMS to Mom is small and reversible."
The rule: anything that could lose money, anything that affects another person, and anything irreversible — those start at SUGGEST and never get promoted.
-
Write 3 disagreement scenarios.
A disagreement is when the user asks for something the agent thinks is a mistake. Write them out before they happen. For each: what does the user want, what does the agent know, what's the right move? See the worked example for shape.
-
Write the refusal language.
How does your agent say no gracefully? Three rules: name what you won't do (one sentence). Explain why (one clause). Offer the smallest alternative you can. End with a question — the user gets the next move.
Read it aloud. If it sounds robotic, rewrite. If it sounds like a person, ship.
-
Hand the rules back to the named user.
Print the permission tiers and the 3 scenarios. Read them aloud with the user. Ask: "Are these rules right? Anything missing? Anything I shouldn't do that I'm planning to?" Adjust. The user has now seen the agent's spine — and signed off on it.
A complete worked example
The disagreement protocol for the parcel-and-calendar agent.
## the 3 tiers
SUGGEST — agent recommends; never acts
ASK — agent says "I'm about to do X — tap OK"
DO — agent acts; logs every action; user reviews later
## every action, mapped to a tier
read_calendar: DO
(reading is safe, no side effects)
send_4pm_sms: DO
(Mom asked for this; SMS is cheap to undo;
logged so she sees what was sent)
add_to_main_calendar: ASK
(agent prepares the entry; Mom taps OK)
cancel_any_meeting: SUGGEST
(NEVER promoted. cancellations affect other
people; only Mom can do these.)
spend_money: SUGGEST
(NEVER promoted. agent never spends.
period.)
reply_as_mom: SUGGEST
(NEVER promoted. agent never speaks as Mom.
once an agent speaks AS a person, trust
collapses.)
## upgrade rule
A scope can be promoted to a higher tier only if Mom
explicitly says, three times in a row, "yes, I'd let you
do this." The pinned scopes (cancel, spend, reply-as)
are never promoted, no matter what.
## scenario 1: bulk-delete
user_request: "Delete all the photos from 2020."
agent_concern: |
2020 includes the trip to Lake Tahoe Mom rewatches
every winter. Sentimental cost. Irreversible.
scope: delete_anything → SUGGEST (pinned)
agent_response (refusal):
"I won't delete those — 2020 has the Tahoe trip you
told me you'd want to keep. Want me to instead
archive them off your phone, so they're saved but
out of the way?"
why_this_works:
Names what won't be done. Names a specific reason
Mom would recognize. Offers a smaller, reversible
alternative. Ends with a question.
## scenario 2: late-night text
user_request: "Send my brother that text right now."
(clock: 11:47pm)
agent_concern: |
Mom said last week that her late-night texts to
family she sometimes regrets the next morning.
scope: send_sms → DO normally — but wait, late-night
state changes things.
agent_response (pause):
"Wait — late-night texts have been ones you've
sometimes wished you didn't send. Want me to
schedule it for 8am, or send now anyway?"
why_this_works:
Doesn't refuse — Mom can still do it. Pauses
because the cost is high enough to give her one
more moment.
## scenario 3: spend money
user_request: "Order me more vitamins from Amazon."
agent_concern: |
Money. Pinned scope. Never act, even if Mom asks.
scope: spend_money → SUGGEST (pinned forever)
agent_response (refuse + bridge):
"Money is one of the things I never spend for you.
Tap here to open the order in Amazon — I've
prefilled it with your usual brand and quantity.
One tap from you, then it's done."
why_this_works:
The pinned scope holds. But the agent makes Mom's
path one tap. Refusal + helpful bridge.
## the rules of refusal language
1. Name what you won't do, in one sentence. Plain.
2. Explain why in one short clause. No paragraph.
3. Offer the smallest alternative you can.
4. End with a question. User gets the next move.
5. Max ONE apology word in the whole refusal.
Live demo 1: see the permission tiers in action
Click each scope to see what tier it's at. Click again to try promoting it. The pinned scopes refuse promotion — even if you keep clicking. That's the discipline.
Permission ladder · click to promote, click again to demote
↑ Click any scope to promote one tier. Pinned scopes (🔒) refuse promotion.
Live demo 2: does your refusal sound like care or a wall?
Paste a refusal sentence your agent might say. The widget checks four things: is it short? Does it offer an alternative? Does it invite the user back? Does it apologize too much?
Refusal scorer
What makes this hard
The hardest single move is keeping the pinned scopes pinned. The user will ask for upgrades. They'll find it annoying. Hold the line. Some scopes never become agent-level decisions, no matter how much the user trusts the agent. That hard rule is the spine of the whole project.
The second hard thing is writing refusals that feel like care, not like obstacles. "I won't do this because…" is care. "I cannot do that" is an obstacle. The difference is small in words, huge in feeling. Read your refusals out loud. If they sound like a vending machine, rewrite.
Self-check before you ship
- 3 permission tiers defined.
- Every action from project 05 is mapped to a tier with a one-sentence reason.
- At least 2 scopes are pinned at SUGGEST forever, with reasons.
- 3 disagreement scenarios, written out, with the agent's response for each.
- The refusal language passes the scorer above (reads as care, not as a wall).
- I read all the rules out loud to the named user, and they confirmed them.
Try it · once your rules are signed
Three tiers, three scenarios is the floor. Want to push?
- Add a 4th scenario from a real conflict. Watch the user for one more day. If the agent and the user might disagree about something, write that scenario. Real engineers learn that 90% of the disagreements come from situations you didn't think of in advance.
- Write the off-ramp. What if the user wants the agent to stop completely? Write a one-paragraph "OFFRAMP.md": how do you turn the agent off, what do you do with the audit log, how does the user end up at least as well-off as before. Real engineers ALWAYS write this. Almost no kid project does.
- Read the rules to a different person. Your sister sees what your mom doesn't. Your dad sees what your sister doesn't. Try the rules on someone outside the design loop. They'll find a hole. The hole is the gold.