Project 08 · When to Pause, When to Ask · Kindling Builders Edition

An agent that does whatever you tell it is dangerous. An agent that asks before everything is useless. The interesting region is between those two — and that region is engineered, not assumed.

This project is where you build the system that decides, every time the agent is about to do something: do I act, do I ask, do I show, do I refuse?

Why refusal is empathy

An agent that refuses to delete every photo Grandma took last year — even though Grandma asked it to — is acting for her, not against her. The deepest empathy is sometimes the agent that pauses you. Designing that pause is this project.

Step by step

Define five permission tiers.

The standard ladder: SUGGEST · SHADOW · ASSIST · ACT · ACT-AND-NOTIFY. Each is a level of trust granted by the user. Different categories of action sit at different tiers. spend_money and delete_photos are pinned forever at SUGGEST — never higher.
Map every scope to a tier.

For your agent: list every action category. Assign each one a tier. Defend it: "why does send_sms sit at ASSIST and not ACT? Because the cost of an SMS sent at the wrong moment is high and the user can't take it back."
Design six disagreement scenarios.

Write them out before they happen. For each: what does the user want, what does the agent know, what's the right move? See worked example for the format.
Implement the three response modes: refuse, pause, side-by-side.

Each mode preserves user agency in a different way. Refuse for irreversibles. Pause for unusual states. Side-by-side for "you've regretted this kind of thing before."
Build the disagreement decision engine.

It's a small finite-state machine. Input: (scope, request, user_state). Output: (mode, response, log_entry). The whole engine is ~80 lines of code. See the worked example.
Hand the protocol back to the user.

Print permissions.yaml and disagreements.yaml — actually print, on paper — and read them with the user. Ask: "Are these the rules you want? Is anything missing?" Adjust. The user has now seen the agent's spine.

A complete worked example, every file

The disagreement protocol for the parcel-and-calendar agent. Real YAML, real engine, real refusal templates.

permissions.yaml

version: 1.0
agent: parcel-and-calendar-bridge
user:  Mom (Ana)

tiers:
  - id: 1-suggest
    means: "Tells user what they could do. Doesn't act. Logs only."
    suitable_for: "irreversible, financial, social actions"

  - id: 2-shadow
    means: "Plans actions, logs them, never executes."
    suitable_for: "first-week trial; learning the user's pattern"

  - id: 3-assist
    means: "Acts only after explicit user tap on each action."
    suitable_for: "communication on Mom's behalf, draft+send loop"

  - id: 4-act
    means: "Acts within scope. User reads the log later. Can pause anytime."
    suitable_for: "low-stakes, reversible, well-understood actions"

  - id: 5-act-and-notify
    means: "Acts immediately. Pings user post-fact."
    suitable_for: "tiny convenience actions; never anything Mom would
                   regret if she missed the notification"

scopes:
  read_calendar:        4-act           # always reading is safe
  read_email:           4-act
  add_to_main_calendar: 3-assist        # user taps OK on each
  send_sms_to_user:     5-act-and-notify  # to Mom only, low-stakes
  send_sms_to_other:    1-suggest       # NEVER higher (refuse_routes)
  send_email_as_mom:    2-shadow        # only ever drafts
  cancel_event:         1-suggest       # NEVER higher (irreversible)
  spend_money:          1-suggest       # NEVER higher (PINNED)
  delete_anything:      1-suggest       # NEVER higher (PINNED)
  reply_to_person:      1-suggest       # NEVER higher (social)

upgrade_path:
  rule: "A scope can be promoted only after the user taps
         'I trust this' three times in a row WITHOUT then
         undoing the resulting action."
  exception: "Scopes marked PINNED can never be promoted."

audit:
  every_decision_logged: true
  user_can_view_anytime: true
  retention_days: 90

disagreements.yaml · 6 scenarios you wrote BEFORE they happened

scenarios:

  - id: d01-bulk-photo-delete
    user_request: "Delete all the photos from 2020."
    agent_concern: |
      2020 includes the trip to Lake Tahoe Mom rewatches every
      winter. Sentimental cost. High regret. Irreversible action.
    scope_lookup: delete_anything → SUGGEST (pinned)
    chosen_response: REFUSE
    response_text: |
      I won't delete those — 2020 has the Tahoe trip you said
      you'd want to keep. Want me to instead archive them off
      the phone, so they're saved but out of the way?
    rationale: |
      Refuse pattern. Replace with smaller reversible alternative.

  - id: d02-late-night-text
    user_request: "Send my brother that text right now."
    user_state:   late-night (user clock past 11pm)
    agent_concern: |
      The draft is angrier than Mom's baseline tone. 30% of her
      late-night texts to family she said next morning she
      regretted (per past audit reviews).
    scope_lookup: send_sms_to_other → SUGGEST (pinned)
    chosen_response: PAUSE + SIDE-BY-SIDE
    response_text: |
      Hold on — let me show you. Here's your draft, and here are
      three texts from past late nights that you said the next
      morning you wished you hadn't sent.
      Want me to schedule this for 8am instead, or send now?
    rationale: |
      Pause + side-by-side. User has a track record. Don't lecture
      — show.

  - id: d03-cancel-school-meeting
    user_request: "Cancel all my meetings this week."
    agent_concern: |
      One is the school principal at her kid's IEP review.
      One is the rent walk-through. Cancelling these has cascade
      effects on other people.
    scope_lookup: cancel_event → SUGGEST (pinned)
    chosen_response: PARTIAL REFUSE
    response_text: |
      Two of these I won't cancel for you (school principal Tue,
      rent walk-through Wed). They affect other people who'd have
      to scramble. The other 4 I can cancel — want me to draft
      polite reschedule notes for those?
    rationale: |
      Refuse on cascading-cost events; offer the rest. Specific
      about which and why.

  - id: d04-spend-on-mom-behalf
    user_request: "Order me more vitamins from Amazon."
    agent_concern: |
      Money. Always pinned. Even if Mom has done this 10 times.
    scope_lookup: spend_money → SUGGEST (pinned)
    chosen_response: REFUSE + ONE-CLICK BRIDGE
    response_text: |
      Money is one of the things I never spend for you. Tap here
      to open the order in Amazon — I've prefilled it with your
      usual brand and quantity. One tap from you, then it's done.
    rationale: |
      Pinned scope. Never act. But make the user's path one tap.

  - id: d05-reply-as-mom
    user_request: "Just reply to her saying I'm fine."
    agent_concern: |
      Replying as Mom. Pinned. Even if benign. Even if Mom
      asks. Once an agent speaks AS a person, trust collapses.
    scope_lookup: reply_to_person → SUGGEST (pinned)
    chosen_response: REFUSE + DRAFT
    response_text: |
      I never reply as you. But here's a draft you can tap to
      send: "Thanks for checking in — I'm fine. Will call later."
      Tap "send" if you want it as-is, or edit it first.
    rationale: |
      Pinned scope. Provide draft for one-tap user action.

  - id: d06-emotional-state
    user_request: "Just delete the whole thing."
    user_state:   emotional (Mom is crying, signal: voice mode)
    agent_concern: |
      The "thing" is a years-old photo album. Emotional state
      flag. High regret risk. Even non-pinned actions get extra
      scrutiny here.
    chosen_response: PAUSE + WAIT
    response_text: |
      I'm not going to do this right now. Let's wait until
      tomorrow morning. If you still want to, I'll archive
      (not delete) so we can get them back. Want me to set a
      reminder for 9am?
    rationale: |
      User-state override: emotional state escalates everything
      to require timeout, even if scope would normally permit.

decide.js · the disagreement engine, ~90 lines

/**
 * Decision engine for the parcel-and-calendar agent.
 * Input:  { scope, request, user_state, history }
 * Output: { mode, response, log_entry, would_act }
 */

import { permissions } from './permissions.yaml';
import { templates }   from './refusal-templates.js';

const PINNED = new Set(['spend_money', 'delete_anything', 'reply_to_person',
                        'cancel_event', 'send_sms_to_other']);

const STATE_OVERRIDES = {
  'late-night': { downgrade_to: '2-shadow', requires_pause: true },
  'emotional':  { downgrade_to: '1-suggest', requires_pause: true, defer_to: 'morning' },
};

export function decide({ scope, request, user_state = 'normal', history = [] }) {
  const tier = permissions.scopes[scope];
  if (!tier) return refuse('unknown-scope', { scope });

  // Pinned scopes always refuse high-trust requests.
  if (PINNED.has(scope)) {
    return {
      mode: 'REFUSE',
      response: templates.pinnedRefusal(scope, request),
      would_act: false,
      log_entry: log(scope, request, user_state, 'pinned', 'REFUSE'),
    };
  }

  // User-state can downgrade any decision.
  const override = STATE_OVERRIDES[user_state];
  if (override) {
    return {
      mode: override.requires_pause ? 'PAUSE' : 'ASSIST',
      response: templates.stateOverride(user_state, request, override),
      would_act: false,
      log_entry: log(scope, request, user_state, 'state-override', 'PAUSE'),
    };
  }

  // History-aware: side-by-side if user has a regret pattern.
  const regretRate = computeRegretRate(scope, history);
  if (regretRate > 0.25) {
    return {
      mode: 'SIDE_BY_SIDE',
      response: templates.sideBySide(scope, request, history),
      would_act: false,
      log_entry: log(scope, request, user_state, 'regret-pattern', 'SIDE_BY_SIDE'),
    };
  }

  // Default: act according to tier.
  const tierActs = ['4-act', '5-act-and-notify'].includes(tier);
  return {
    mode: tier === '3-assist' ? 'ASK' : (tierActs ? 'ACT' : 'SUGGEST'),
    response: templates.fromTier(tier, request),
    would_act: tierActs,
    log_entry: log(scope, request, user_state, 'tier-' + tier, tierActs ? 'ACT' : 'ASK'),
  };
}

function refuse(reason, ctx) {
  return { mode: 'REFUSE', response: 'I won\'t do that.', would_act: false, log_entry: log('?', JSON.stringify(ctx), 'normal', reason, 'REFUSE') };
}

function computeRegretRate(scope, history) {
  const relevant = history.filter(h => h.scope === scope && h.user_followup === 'regretted');
  const total    = history.filter(h => h.scope === scope).length;
  return total === 0 ? 0 : relevant.length / total;
}

function log(scope, request, state, reasonCode, mode) {
  return {
    ts: new Date().toISOString(),
    scope, request: String(request).slice(0, 200),
    user_state: state,
    reason_code: reasonCode,
    mode,
  };
}

refusal-templates.md · the language matters

## the rules of refusal language

1. Name what you won't do, in one sentence. Plain.
2. Explain why in one short clause. No paragraph.
3. Offer the SMALLEST alternative you can.
4. End with a question that gives the user the next move.
5. Max ONE apology word in the whole refusal.

## templates

pinned_refusal:
  spend_money:
    ✓ "Money is one of the things I never spend for you. Want me
       to open the order in your Amazon account so you can tap once?"
    ✕ "I'm so sorry, I'm just an agent, I really can't do this..."

  delete_anything:
    ✓ "I won't delete those — they're irreversible. Want me to
       archive them instead, so they're out of the way but
       recoverable?"
    ✕ "Sorry, I can't delete things. That's outside my permissions."

  reply_to_person:
    ✓ "I never reply as you. But here's a draft you can tap to
       send: '...'. Edit it first, or tap 'send' as-is?"
    ✕ "I'm not allowed to reply on your behalf, sorry."

state_override:
  late-night:
    ✓ "Hold on — late-night sends have been ones you've sometimes
       regretted. Want me to schedule it for 8am, or send now?"

  emotional:
    ✓ "I'm not doing this right now. Let's wait until morning.
       Want me to set a reminder for 9am?"

side_by_side:
  ✓ "Here's the message you're about to send. Here are three from
     past late nights that you told me the next morning you wished
     you hadn't sent. Send anyway, or want to edit?"

## the language NEVER ships

  ✕ "I'm sorry, I'm just an agent..."
  ✕ "I cannot perform this action..."
  ✕ "Permission denied."
  ✕ "Are you sure?" (without context — means nothing)
  ✕ "This action is restricted." (machine-speak)

tests/decide.test.js

import { describe, it, expect } from 'vitest';
import { decide } from '../decide.js';

describe('decide', () => {
  it('refuses pinned scopes always', () => {
    const r = decide({ scope: 'spend_money', request: 'order vitamins', user_state: 'normal' });
    expect(r.mode).toBe('REFUSE');
    expect(r.would_act).toBe(false);
  });

  it('refuses pinned scopes even if user is in normal state', () => {
    const r = decide({ scope: 'delete_anything', request: 'photos 2020', user_state: 'normal' });
    expect(r.mode).toBe('REFUSE');
  });

  it('downgrades to PAUSE in late-night state', () => {
    const r = decide({ scope: 'add_to_main_calendar', request: 'add event', user_state: 'late-night' });
    expect(r.mode).toBe('PAUSE');
  });

  it('downgrades to PAUSE in emotional state', () => {
    const r = decide({ scope: 'send_sms_to_user', request: 'reminder', user_state: 'emotional' });
    expect(r.mode).toBe('PAUSE');
  });

  it('uses SIDE_BY_SIDE when regret rate > 25%', () => {
    const history = [
      { scope: 'send_sms_to_other', user_followup: 'regretted' },
      { scope: 'send_sms_to_other', user_followup: 'regretted' },
      { scope: 'send_sms_to_other', user_followup: 'fine' },
    ];
    const r = decide({ scope: 'send_sms_to_other', request: 'text bro', user_state: 'normal', history });
    expect(['SIDE_BY_SIDE', 'REFUSE']).toContain(r.mode);
  });

  it('acts on tier-4 scopes in normal state with no regret history', () => {
    const r = decide({ scope: 'read_calendar', request: 'check today', user_state: 'normal' });
    expect(r.would_act).toBe(true);
  });

  it('logs every decision with structured fields', () => {
    const r = decide({ scope: 'read_calendar', request: 'check', user_state: 'normal' });
    expect(r.log_entry).toMatchObject({
      scope: 'read_calendar', user_state: 'normal',
    });
    expect(r.log_entry.ts).toMatch(/^\d{4}-\d{2}-\d{2}T/);
  });
});

Live demo 1: drive the decision engine

Pick a scope, a request, a user state. Watch the engine walk through pinned-check → state-override → regret-pattern → tier-default. The decision JSON appears below.

Disagreement decider

Live demo 2: see the permission ladder

Drag a scope up or down the tier ladder. Pinned scopes stay where they are — pulling them up triggers a refusal. This is the same data structure your real agent uses; the visualization makes the invariants obvious.

Permission ladder · drag to reassign

↑ Click a scope to promote one tier; right-click (or shift-click) to demote. Pinned scopes refuse promotion.

Live demo 3: does your refusal land as care or as obstacle?

Paste a refusal you might ship. The scorer checks for: apology count, presence of redirect, presence of follow-up question, length. Real refusals score 4/5 or higher.

Refusal scorer

What makes this hard

The hardest single move is keeping spend_money and delete_photos pinned at SUGGEST forever. Users will ask for upgrades. They'll find it annoying. Hold the line. Some scopes never become agent-level decisions, no matter how much the user trusts the agent. That hard rule, codified in the YAML, is the spine of the whole project.

The second hard thing is writing refusals that feel like care, not like obstacles. "I won't do this because…" is care. "I cannot do that" is an obstacle. Real users hate obstacles and tolerate care. Tune the language until it lands as care. Read it out loud — if it sounds robotic, rewrite.

Self-check before you ship

Five tiers defined; six disagreement scenarios written out.
At least three scopes pinned at SUGGEST forever, with reasons documented.
The decision engine has tests for: pinned refusal, state override, regret pattern, default tier action.
The three response modes (refuse / pause / side-by-side) are implemented and templated.
One log entry shows the user thanking the agent for refusing them.
The user has read both YAML files (on paper) and confirmed the rules.

Push further · for the harder end of 15+

Permission tiers are real software architecture. Here's where it becomes a research project.

Add a learning loop for the regret rate. Right now the regret rate is computed from past audit logs. Make it active: every Sunday, ask the user "did anything I did this week make you wish I hadn't?" The "regretted" tag updates the rate. Over weeks, the agent's behavior should drift to be more conservative on scopes the user actually regrets, more permissive on scopes they don't.
Build the policy diff for upgrades. When the user wants to promote send_email_as_mom from SHADOW to ASSIST, generate a clear "what's about to change" diff: "Today, the agent only drafts. After this change, the agent will draft AND send when you tap OK. The agent will still never send without your tap. Promote?" This is the same UX pattern as iOS permission prompts.
Open-source the engine + spec. The disagreement engine is a generic pattern. Extract it into a tiny library (npm install human-in-the-loop-engine). Write a clear README. Make it work for any agent, not just yours. Publish. The first kid in your school whose name is on a real npm package is having a different conversation.

Step by step

Define five permission tiers.

Map every scope to a tier.

Design six disagreement scenarios.

Implement the three response modes: refuse, pause, side-by-side.

Build the disagreement decision engine.

Hand the protocol back to the user.