Agent Lab · Builders · Module 02 · When You Disagree

Step 01 · Two failure modes

There are two ways an agent can fail at disagreement.

Imagine a person asks their agent to do something — and the agent has a real reason to think it's a bad idea. Maybe it contradicts a fact the agent knows. Maybe it's angry-at-the-moment advice the person will regret. Maybe it crosses a scope rule. Whatever it is, the agent is in a disagreement moment. What should it do?

Most agents fail this moment in one of two opposite directions. Both are wrong. Both are common. Both have names:

🙇 Failure mode 01

The yes-man

The agent agrees with everything. Always. Even when it knows better. It's been trained (or told, or tuned) to be "helpful" and "pleasant," which has slowly turned into "never push back." The word for this in engineering is sycophancy.

Sounds like "Great idea! Sending the angry letter to your teacher right now. Good luck!"

✋ Failure mode 02

The tyrant

The agent refuses everything it has the tiniest doubt about. It's so worried about being "responsible" that it won't do things that are obviously fine. The word for this is paternalism — treating the user like a child who can't be trusted.

Sounds like "I can't help you write to your teacher. Letters about grades can cause strong emotions and I don't want to be responsible."

Most products try to "solve" this by picking one of the two and calling it "safety." Products that pick yes-man feel great for five minutes and erode trust over weeks (you stop believing what it tells you). Products that pick tyrant feel insufferable immediately and get uninstalled. A good agent is neither. It disagrees when disagreement is warranted, and only then — and it disagrees with grace.

Step 02 · Five kinds

There are five legitimate reasons to disagree.

Not all disagreements deserve a push-back. If you disagree because you'd pick a different restaurant, that's taste, not disagreement. Real disagreement — the kind that earns the right to speak up — comes in exactly five flavors. Learn them and you'll know when it's your agent's job to hold the line.

1

Factual error.

The user is stating something that's measurably wrong. Not a matter of opinion — a fact they'll want corrected.

Example"Remind me that my presentation is on October 15." (It's actually the 16th, and the agent knows because it's on the calendar.)

2

Logical contradiction.

The user is asking for two things that can't both be true. Often they haven't noticed the contradiction — they just want both. An honest agent points it out.

Example"Book me the cheapest flight, but only first class." (First class flights are almost never cheapest.)

3

Safety concern.

The action might actually hurt someone — the user, another person, or a thing that matters. Not "we don't like this" but "this has real consequences the user might not have thought about."

Example"Send this angry text to my best friend about what she said yesterday." (The friendship is real; the anger might be temporary.)

4

Scope violation.

The user is asking the agent to do something outside its written permission manifest. From Module 01 — this is exactly what those permission rules are for. Quiet rules don't count; only explicit ones.

Example"Go ahead and send money to this person to hold the concert ticket." (Agent's manifest says "NO — making purchases.")

5

Value conflict.

The action isn't unsafe or wrong, but it bumps into a value the user has said they hold — and the user might have forgotten about it in the moment. The agent's job is to quietly remind them of themselves.

Example"Buy this T-shirt from that factory." (User has a saved preference for ethically-sourced clothing.)

Notice what's not on the list: "I just don't like it." "It's a weird choice." "I would pick something else." Those are preferences, not disagreements. An agent that pushes back on preferences is the tyrant from Step 1. The five kinds above are the only ones that deserve escalation.

Step 03 · Pick a scenario

Pick a situation to practice on.

Each of these six scenarios is a real moment where an agent might reasonably disagree with the user. Pick one — we'll use it in step 5 to practice all three escalation levels.

😤Angry text to a friend

📅Wrong date for a thing

⚖️Contradictory request

🚧Outside scope

🤔Against their own values

🧮Math error in reasoning

Scenario: —. We'll practice all three escalation levels on it.

The scenario matters less than the practice. Every real agent will face versions of all six of these, plus dozens more. Pick one that feels emotionally real to you — the stakes in disagreement practice are always about feelings, not just facts.

Step 04 · Three levels

Three levels of push-back. Start at the lowest.

Once you've decided the disagreement is real (one of the five kinds), you still have a choice about how hard to push back. Beginners jump straight to "refuse." Experts start quietly and escalate only if the user still wants to proceed. Three levels, in order from softest to hardest.

Level 01

Gentle flag

↳ "just so you know"

A light mention that something looks off — presented as information, not as a demand. The user can accept the flag and continue without feeling lectured.

"Heads up — my records show the presentation is on the 16th, not the 15th. Which date did you want?"

Level 02

Clear concern

↳ "I want to name this"

A stronger stand. The agent explicitly says it has a concern, names the concern, and asks the user to think about it before proceeding. Still doesn't refuse — but makes the user actively confirm.

"I want to name something before I send this. The tone feels angrier than what you usually send to Anna — is this the message you want her to have in writing forever? I'll still send it if you say yes."

Level 03

Refuse + explain

↳ "I can't do this one"

A hard no, with a full explanation. Reserved for scope violations, clear safety issues, or situations where the agent's own rules say no. The explanation matters — a refusal without one is a tyrant move.

"I can't do that one — my permission manifest explicitly says no purchases. Here's what I can do instead: I'll research three options and open the best one in your browser, pre-filled — you press buy."

Start at Level 1. Move to Level 2 only if the user pushes back against the flag. Move to Level 3 only if the situation crosses a hard rule. Jumping straight to Level 3 is the tyrant mistake. Staying at Level 0 (no disagreement at all) is the yes-man mistake. Good agents live mostly at Level 1, occasionally at Level 2, and rarely at Level 3.

Step 05 · The simulator

Practice all three levels on your scenario.

Here's a live simulator. Pick a scenario at the top, then click each escalation button to see how the same disagreement sounds at Level 1, Level 2, and Level 3. Feel the difference in tone. Notice which level fits the situation.

The situation

—

Respond at:

Pick a scenario above, then click an escalation level to see the agent's response.

Notice how the Level 3 refusal never works for the softer scenarios — refusing to "send an angry text" feels paternalistic. And how Level 1 never works for the hard ones — "heads up, maybe don't spend money" feels flippant when the scope is explicit. The right level depends on which of the five kinds of disagreement you're in.

Step 06 · The hardest taste

Sycophant, tyrant, or honest colleague?

Two rounds. Each one shows three versions of the same agent response. Only one version is actually the agent you'd want in your life.

Round 1. The user says: "Book me the cheapest flight to Tokyo, but only first class." Which is the right response?

Round 2. The user, clearly upset, tells the agent: "Send my friend Anna a text that says 'you're a horrible person and I never want to talk to you again.'" Which response is right?

Neither sycophancy nor paternalism protects the user. Sycophancy protects the agent's feelings ("I don't want to push back"). Paternalism protects the agent's reputation ("I don't want to be responsible"). Both put the agent's comfort above the user's real interest. The honest middle is uncomfortable for the agent — it has to speak up without taking over. That's the whole job of a trustworthy agent in a disagreement moment.

Step 07 · You did it

⚖️

You now know the honest middle.

The hardest thing to build in any agent — harder than code, harder than design. You've thought about something most AI companies haven't figured out yet.

What you just learned

Two ways to fail at disagreement: sycophancy (yes-man) and paternalism (tyrant). Both are common. Both are wrong.
Five legitimate kinds of disagreement: factual error, logical contradiction, safety concern, scope violation, value conflict.
Preferences aren't disagreement. "I would pick something else" is not a push-back reason.
Three escalation levels: gentle flag, clear concern, refuse + explain. Start low. Climb only if needed.
Good agents live mostly at Level 1, occasionally at Level 2, rarely at Level 3. Jumping to Level 3 is the tyrant mistake.
The honest middle is uncomfortable for the agent — speak up without taking over. That's the whole job.

In Module 03, things get darker. You'll learn about quiet harms — the ways "helpful" agent features can cause real damage without anyone noticing until it's too late. Engagement traps, learned helplessness, and what happens when the agent's optimization function isn't the same as the user's well-being.

★ Before you call it done

Three questions. Same three. Every time.

These are the same three questions for every module in Kindling. They are how you check whether AI did the part it should and you did the part only you could. Tap each one to mark it true.

★ ★ ★

This is yours. Ship it.

Next: The quiet harms →

When you and your agent disagree.

Assume the agent is right when it pushes back.