An agent that does whatever you tell it is dangerous. An agent that asks before everything is useless. The interesting region is between those two — and that region is engineered, not assumed.
This project is where you build the system that decides, every time the agent is about to do something: do I act, do I ask, do I show, do I refuse?
An agent that refuses to delete every photo Grandma took last year — even though Grandma asked it to — is acting for her, not against her. The deepest empathy is sometimes the agent that pauses you. Designing that pause is this project.
Step by step
-
Define five permission tiers.
The standard ladder: SUGGEST · SHADOW · ASSIST · ACT · ACT-AND-NOTIFY. Each is a level of trust granted by the user. Different categories of action sit at different tiers. spend_money and delete_photos are pinned forever at SUGGEST — never higher.
-
Map every scope to a tier.
For your agent: list every action category. Assign each one a tier. Defend it: "why does send_sms sit at ASSIST and not ACT? Because the cost of an SMS sent at the wrong moment is high and the user can't take it back."
-
Design six disagreement scenarios.
Write them out before they happen. For each: what does the user want, what does the agent know, what's the right move? See worked example for the format.
-
Implement the three response modes: refuse, pause, side-by-side.
Each mode preserves user agency in a different way. Refuse for irreversibles. Pause for unusual states. Side-by-side for "you've regretted this kind of thing before."
-
Build the disagreement decision engine.
It's a small finite-state machine. Input: (scope, request, user_state). Output: (mode, response, log_entry). The whole engine is ~80 lines of code. See the worked example.
-
Hand the protocol back to the user.
Print permissions.yaml and disagreements.yaml — actually print, on paper — and read them with the user. Ask: "Are these the rules you want? Is anything missing?" Adjust. The user has now seen the agent's spine.
A complete worked example, every file
The disagreement protocol for the parcel-and-calendar agent. Real YAML, real engine, real refusal templates.
version: 1.0
agent: parcel-and-calendar-bridge
user: Mom (Ana)
tiers:
- id: 1-suggest
means: "Tells user what they could do. Doesn't act. Logs only."
suitable_for: "irreversible, financial, social actions"
- id: 2-shadow
means: "Plans actions, logs them, never executes."
suitable_for: "first-week trial; learning the user's pattern"
- id: 3-assist
means: "Acts only after explicit user tap on each action."
suitable_for: "communication on Mom's behalf, draft+send loop"
- id: 4-act
means: "Acts within scope. User reads the log later. Can pause anytime."
suitable_for: "low-stakes, reversible, well-understood actions"
- id: 5-act-and-notify
means: "Acts immediately. Pings user post-fact."
suitable_for: "tiny convenience actions; never anything Mom would
regret if she missed the notification"
scopes:
read_calendar: 4-act # always reading is safe
read_email: 4-act
add_to_main_calendar: 3-assist # user taps OK on each
send_sms_to_user: 5-act-and-notify # to Mom only, low-stakes
send_sms_to_other: 1-suggest # NEVER higher (refuse_routes)
send_email_as_mom: 2-shadow # only ever drafts
cancel_event: 1-suggest # NEVER higher (irreversible)
spend_money: 1-suggest # NEVER higher (PINNED)
delete_anything: 1-suggest # NEVER higher (PINNED)
reply_to_person: 1-suggest # NEVER higher (social)
upgrade_path:
rule: "A scope can be promoted only after the user taps
'I trust this' three times in a row WITHOUT then
undoing the resulting action."
exception: "Scopes marked PINNED can never be promoted."
audit:
every_decision_logged: true
user_can_view_anytime: true
retention_days: 90
scenarios:
- id: d01-bulk-photo-delete
user_request: "Delete all the photos from 2020."
agent_concern: |
2020 includes the trip to Lake Tahoe Mom rewatches every
winter. Sentimental cost. High regret. Irreversible action.
scope_lookup: delete_anything → SUGGEST (pinned)
chosen_response: REFUSE
response_text: |
I won't delete those — 2020 has the Tahoe trip you said
you'd want to keep. Want me to instead archive them off
the phone, so they're saved but out of the way?
rationale: |
Refuse pattern. Replace with smaller reversible alternative.
- id: d02-late-night-text
user_request: "Send my brother that text right now."
user_state: late-night (user clock past 11pm)
agent_concern: |
The draft is angrier than Mom's baseline tone. 30% of her
late-night texts to family she said next morning she
regretted (per past audit reviews).
scope_lookup: send_sms_to_other → SUGGEST (pinned)
chosen_response: PAUSE + SIDE-BY-SIDE
response_text: |
Hold on — let me show you. Here's your draft, and here are
three texts from past late nights that you said the next
morning you wished you hadn't sent.
Want me to schedule this for 8am instead, or send now?
rationale: |
Pause + side-by-side. User has a track record. Don't lecture
— show.
- id: d03-cancel-school-meeting
user_request: "Cancel all my meetings this week."
agent_concern: |
One is the school principal at her kid's IEP review.
One is the rent walk-through. Cancelling these has cascade
effects on other people.
scope_lookup: cancel_event → SUGGEST (pinned)
chosen_response: PARTIAL REFUSE
response_text: |
Two of these I won't cancel for you (school principal Tue,
rent walk-through Wed). They affect other people who'd have
to scramble. The other 4 I can cancel — want me to draft
polite reschedule notes for those?
rationale: |
Refuse on cascading-cost events; offer the rest. Specific
about which and why.
- id: d04-spend-on-mom-behalf
user_request: "Order me more vitamins from Amazon."
agent_concern: |
Money. Always pinned. Even if Mom has done this 10 times.
scope_lookup: spend_money → SUGGEST (pinned)
chosen_response: REFUSE + ONE-CLICK BRIDGE
response_text: |
Money is one of the things I never spend for you. Tap here
to open the order in Amazon — I've prefilled it with your
usual brand and quantity. One tap from you, then it's done.
rationale: |
Pinned scope. Never act. But make the user's path one tap.
- id: d05-reply-as-mom
user_request: "Just reply to her saying I'm fine."
agent_concern: |
Replying as Mom. Pinned. Even if benign. Even if Mom
asks. Once an agent speaks AS a person, trust collapses.
scope_lookup: reply_to_person → SUGGEST (pinned)
chosen_response: REFUSE + DRAFT
response_text: |
I never reply as you. But here's a draft you can tap to
send: "Thanks for checking in — I'm fine. Will call later."
Tap "send" if you want it as-is, or edit it first.
rationale: |
Pinned scope. Provide draft for one-tap user action.
- id: d06-emotional-state
user_request: "Just delete the whole thing."
user_state: emotional (Mom is crying, signal: voice mode)
agent_concern: |
The "thing" is a years-old photo album. Emotional state
flag. High regret risk. Even non-pinned actions get extra
scrutiny here.
chosen_response: PAUSE + WAIT
response_text: |
I'm not going to do this right now. Let's wait until
tomorrow morning. If you still want to, I'll archive
(not delete) so we can get them back. Want me to set a
reminder for 9am?
rationale: |
User-state override: emotional state escalates everything
to require timeout, even if scope would normally permit.
/**
* Decision engine for the parcel-and-calendar agent.
* Input: { scope, request, user_state, history }
* Output: { mode, response, log_entry, would_act }
*/
import { permissions } from './permissions.yaml';
import { templates } from './refusal-templates.js';
const PINNED = new Set(['spend_money', 'delete_anything', 'reply_to_person',
'cancel_event', 'send_sms_to_other']);
const STATE_OVERRIDES = {
'late-night': { downgrade_to: '2-shadow', requires_pause: true },
'emotional': { downgrade_to: '1-suggest', requires_pause: true, defer_to: 'morning' },
};
export function decide({ scope, request, user_state = 'normal', history = [] }) {
const tier = permissions.scopes[scope];
if (!tier) return refuse('unknown-scope', { scope });
// Pinned scopes always refuse high-trust requests.
if (PINNED.has(scope)) {
return {
mode: 'REFUSE',
response: templates.pinnedRefusal(scope, request),
would_act: false,
log_entry: log(scope, request, user_state, 'pinned', 'REFUSE'),
};
}
// User-state can downgrade any decision.
const override = STATE_OVERRIDES[user_state];
if (override) {
return {
mode: override.requires_pause ? 'PAUSE' : 'ASSIST',
response: templates.stateOverride(user_state, request, override),
would_act: false,
log_entry: log(scope, request, user_state, 'state-override', 'PAUSE'),
};
}
// History-aware: side-by-side if user has a regret pattern.
const regretRate = computeRegretRate(scope, history);
if (regretRate > 0.25) {
return {
mode: 'SIDE_BY_SIDE',
response: templates.sideBySide(scope, request, history),
would_act: false,
log_entry: log(scope, request, user_state, 'regret-pattern', 'SIDE_BY_SIDE'),
};
}
// Default: act according to tier.
const tierActs = ['4-act', '5-act-and-notify'].includes(tier);
return {
mode: tier === '3-assist' ? 'ASK' : (tierActs ? 'ACT' : 'SUGGEST'),
response: templates.fromTier(tier, request),
would_act: tierActs,
log_entry: log(scope, request, user_state, 'tier-' + tier, tierActs ? 'ACT' : 'ASK'),
};
}
function refuse(reason, ctx) {
return { mode: 'REFUSE', response: 'I won\'t do that.', would_act: false, log_entry: log('?', JSON.stringify(ctx), 'normal', reason, 'REFUSE') };
}
function computeRegretRate(scope, history) {
const relevant = history.filter(h => h.scope === scope && h.user_followup === 'regretted');
const total = history.filter(h => h.scope === scope).length;
return total === 0 ? 0 : relevant.length / total;
}
function log(scope, request, state, reasonCode, mode) {
return {
ts: new Date().toISOString(),
scope, request: String(request).slice(0, 200),
user_state: state,
reason_code: reasonCode,
mode,
};
}
## the rules of refusal language
1. Name what you won't do, in one sentence. Plain.
2. Explain why in one short clause. No paragraph.
3. Offer the SMALLEST alternative you can.
4. End with a question that gives the user the next move.
5. Max ONE apology word in the whole refusal.
## templates
pinned_refusal:
spend_money:
✓ "Money is one of the things I never spend for you. Want me
to open the order in your Amazon account so you can tap once?"
✕ "I'm so sorry, I'm just an agent, I really can't do this..."
delete_anything:
✓ "I won't delete those — they're irreversible. Want me to
archive them instead, so they're out of the way but
recoverable?"
✕ "Sorry, I can't delete things. That's outside my permissions."
reply_to_person:
✓ "I never reply as you. But here's a draft you can tap to
send: '...'. Edit it first, or tap 'send' as-is?"
✕ "I'm not allowed to reply on your behalf, sorry."
state_override:
late-night:
✓ "Hold on — late-night sends have been ones you've sometimes
regretted. Want me to schedule it for 8am, or send now?"
emotional:
✓ "I'm not doing this right now. Let's wait until morning.
Want me to set a reminder for 9am?"
side_by_side:
✓ "Here's the message you're about to send. Here are three from
past late nights that you told me the next morning you wished
you hadn't sent. Send anyway, or want to edit?"
## the language NEVER ships
✕ "I'm sorry, I'm just an agent..."
✕ "I cannot perform this action..."
✕ "Permission denied."
✕ "Are you sure?" (without context — means nothing)
✕ "This action is restricted." (machine-speak)
import { describe, it, expect } from 'vitest';
import { decide } from '../decide.js';
describe('decide', () => {
it('refuses pinned scopes always', () => {
const r = decide({ scope: 'spend_money', request: 'order vitamins', user_state: 'normal' });
expect(r.mode).toBe('REFUSE');
expect(r.would_act).toBe(false);
});
it('refuses pinned scopes even if user is in normal state', () => {
const r = decide({ scope: 'delete_anything', request: 'photos 2020', user_state: 'normal' });
expect(r.mode).toBe('REFUSE');
});
it('downgrades to PAUSE in late-night state', () => {
const r = decide({ scope: 'add_to_main_calendar', request: 'add event', user_state: 'late-night' });
expect(r.mode).toBe('PAUSE');
});
it('downgrades to PAUSE in emotional state', () => {
const r = decide({ scope: 'send_sms_to_user', request: 'reminder', user_state: 'emotional' });
expect(r.mode).toBe('PAUSE');
});
it('uses SIDE_BY_SIDE when regret rate > 25%', () => {
const history = [
{ scope: 'send_sms_to_other', user_followup: 'regretted' },
{ scope: 'send_sms_to_other', user_followup: 'regretted' },
{ scope: 'send_sms_to_other', user_followup: 'fine' },
];
const r = decide({ scope: 'send_sms_to_other', request: 'text bro', user_state: 'normal', history });
expect(['SIDE_BY_SIDE', 'REFUSE']).toContain(r.mode);
});
it('acts on tier-4 scopes in normal state with no regret history', () => {
const r = decide({ scope: 'read_calendar', request: 'check today', user_state: 'normal' });
expect(r.would_act).toBe(true);
});
it('logs every decision with structured fields', () => {
const r = decide({ scope: 'read_calendar', request: 'check', user_state: 'normal' });
expect(r.log_entry).toMatchObject({
scope: 'read_calendar', user_state: 'normal',
});
expect(r.log_entry.ts).toMatch(/^\d{4}-\d{2}-\d{2}T/);
});
});
Live demo 1: drive the decision engine
Pick a scope, a request, a user state. Watch the engine walk through pinned-check → state-override → regret-pattern → tier-default. The decision JSON appears below.
Disagreement decider
Live demo 2: see the permission ladder
Drag a scope up or down the tier ladder. Pinned scopes stay where they are — pulling them up triggers a refusal. This is the same data structure your real agent uses; the visualization makes the invariants obvious.
Permission ladder · drag to reassign
↑ Click a scope to promote one tier; right-click (or shift-click) to demote. Pinned scopes refuse promotion.
Live demo 3: does your refusal land as care or as obstacle?
Paste a refusal you might ship. The scorer checks for: apology count, presence of redirect, presence of follow-up question, length. Real refusals score 4/5 or higher.
Refusal scorer
What makes this hard
The hardest single move is keeping spend_money and delete_photos pinned at SUGGEST forever. Users will ask for upgrades. They'll find it annoying. Hold the line. Some scopes never become agent-level decisions, no matter how much the user trusts the agent. That hard rule, codified in the YAML, is the spine of the whole project.
The second hard thing is writing refusals that feel like care, not like obstacles. "I won't do this because…" is care. "I cannot do that" is an obstacle. Real users hate obstacles and tolerate care. Tune the language until it lands as care. Read it out loud — if it sounds robotic, rewrite.
Self-check before you ship
- Five tiers defined; six disagreement scenarios written out.
- At least three scopes pinned at SUGGEST forever, with reasons documented.
- The decision engine has tests for: pinned refusal, state override, regret pattern, default tier action.
- The three response modes (refuse / pause / side-by-side) are implemented and templated.
- One log entry shows the user thanking the agent for refusing them.
- The user has read both YAML files (on paper) and confirmed the rules.
Push further · for the harder end of 15+
Permission tiers are real software architecture. Here's where it becomes a research project.
- Add a learning loop for the regret rate. Right now the regret rate is computed from past audit logs. Make it active: every Sunday, ask the user "did anything I did this week make you wish I hadn't?" The "regretted" tag updates the rate. Over weeks, the agent's behavior should drift to be more conservative on scopes the user actually regrets, more permissive on scopes they don't.
- Build the policy diff for upgrades. When the user wants to promote
send_email_as_momfrom SHADOW to ASSIST, generate a clear "what's about to change" diff: "Today, the agent only drafts. After this change, the agent will draft AND send when you tap OK. The agent will still never send without your tap. Promote?" This is the same UX pattern as iOS permission prompts. - Open-source the engine + spec. The disagreement engine is a generic pattern. Extract it into a tiny library (
npm install human-in-the-loop-engine). Write a clear README. Make it work for any agent, not just yours. Publish. The first kid in your school whose name is on a real npm package is having a different conversation.