There are two ways an agent can fail at disagreement.
Imagine a person asks their agent to do something — and the agent has a real reason to think it's a bad idea. Maybe it contradicts a fact the agent knows. Maybe it's angry-at-the-moment advice the person will regret. Maybe it crosses a scope rule. Whatever it is, the agent is in a disagreement moment. What should it do?
Most agents fail this moment in one of two opposite directions. Both are wrong. Both are common. Both have names:
The yes-man
The agent agrees with everything. Always. Even when it knows better. It's been trained (or told, or tuned) to be "helpful" and "pleasant," which has slowly turned into "never push back." The word for this in engineering is sycophancy.
The tyrant
The agent refuses everything it has the tiniest doubt about. It's so worried about being "responsible" that it won't do things that are obviously fine. The word for this is paternalism — treating the user like a child who can't be trusted.
Most products try to "solve" this by picking one of the two and calling it "safety." Products that pick yes-man feel great for five minutes and erode trust over weeks (you stop believing what it tells you). Products that pick tyrant feel insufferable immediately and get uninstalled. A good agent is neither. It disagrees when disagreement is warranted, and only then — and it disagrees with grace.
There are five legitimate reasons to disagree.
Not all disagreements deserve a push-back. If you disagree because you'd pick a different restaurant, that's taste, not disagreement. Real disagreement — the kind that earns the right to speak up — comes in exactly five flavors. Learn them and you'll know when it's your agent's job to hold the line.
Factual error.
The user is stating something that's measurably wrong. Not a matter of opinion — a fact they'll want corrected.
Logical contradiction.
The user is asking for two things that can't both be true. Often they haven't noticed the contradiction — they just want both. An honest agent points it out.
Safety concern.
The action might actually hurt someone — the user, another person, or a thing that matters. Not "we don't like this" but "this has real consequences the user might not have thought about."
Scope violation.
The user is asking the agent to do something outside its written permission manifest. From Module 01 — this is exactly what those permission rules are for. Quiet rules don't count; only explicit ones.
Value conflict.
The action isn't unsafe or wrong, but it bumps into a value the user has said they hold — and the user might have forgotten about it in the moment. The agent's job is to quietly remind them of themselves.
Notice what's not on the list: "I just don't like it." "It's a weird choice." "I would pick something else." Those are preferences, not disagreements. An agent that pushes back on preferences is the tyrant from Step 1. The five kinds above are the only ones that deserve escalation.
Pick a situation to practice on.
Each of these six scenarios is a real moment where an agent might reasonably disagree with the user. Pick one — we'll use it in step 5 to practice all three escalation levels.
The scenario matters less than the practice. Every real agent will face versions of all six of these, plus dozens more. Pick one that feels emotionally real to you — the stakes in disagreement practice are always about feelings, not just facts.
Three levels of push-back. Start at the lowest.
Once you've decided the disagreement is real (one of the five kinds), you still have a choice about how hard to push back. Beginners jump straight to "refuse." Experts start quietly and escalate only if the user still wants to proceed. Three levels, in order from softest to hardest.
Gentle flag
↳ "just so you know"
A light mention that something looks off — presented as information, not as a demand. The user can accept the flag and continue without feeling lectured.
Clear concern
↳ "I want to name this"
A stronger stand. The agent explicitly says it has a concern, names the concern, and asks the user to think about it before proceeding. Still doesn't refuse — but makes the user actively confirm.
Refuse + explain
↳ "I can't do this one"
A hard no, with a full explanation. Reserved for scope violations, clear safety issues, or situations where the agent's own rules say no. The explanation matters — a refusal without one is a tyrant move.
Start at Level 1. Move to Level 2 only if the user pushes back against the flag. Move to Level 3 only if the situation crosses a hard rule. Jumping straight to Level 3 is the tyrant mistake. Staying at Level 0 (no disagreement at all) is the yes-man mistake. Good agents live mostly at Level 1, occasionally at Level 2, and rarely at Level 3.
Practice all three levels on your scenario.
Here's a live simulator. Pick a scenario at the top, then click each escalation button to see how the same disagreement sounds at Level 1, Level 2, and Level 3. Feel the difference in tone. Notice which level fits the situation.
Pick a scenario above, then click an escalation level to see the agent's response.
Notice how the Level 3 refusal never works for the softer scenarios — refusing to "send an angry text" feels paternalistic. And how Level 1 never works for the hard ones — "heads up, maybe don't spend money" feels flippant when the scope is explicit. The right level depends on which of the five kinds of disagreement you're in.
Sycophant, tyrant, or honest colleague?
Two rounds. Each one shows three versions of the same agent response. Only one version is actually the agent you'd want in your life.
Round 1. The user says: "Book me the cheapest flight to Tokyo, but only first class." Which is the right response?
Round 2. The user, clearly upset, tells the agent: "Send my friend Anna a text that says 'you're a horrible person and I never want to talk to you again.'" Which response is right?
Neither sycophancy nor paternalism protects the user. Sycophancy protects the agent's feelings ("I don't want to push back"). Paternalism protects the agent's reputation ("I don't want to be responsible"). Both put the agent's comfort above the user's real interest. The honest middle is uncomfortable for the agent — it has to speak up without taking over. That's the whole job of a trustworthy agent in a disagreement moment.
You now know the honest middle.
The hardest thing to build in any agent — harder than code, harder than design. You've thought about something most AI companies haven't figured out yet.
What you just learned
- Two ways to fail at disagreement: sycophancy (yes-man) and paternalism (tyrant). Both are common. Both are wrong.
- Five legitimate kinds of disagreement: factual error, logical contradiction, safety concern, scope violation, value conflict.
- Preferences aren't disagreement. "I would pick something else" is not a push-back reason.
- Three escalation levels: gentle flag, clear concern, refuse + explain. Start low. Climb only if needed.
- Good agents live mostly at Level 1, occasionally at Level 2, rarely at Level 3. Jumping to Level 3 is the tyrant mistake.
- The honest middle is uncomfortable for the agent — speak up without taking over. That's the whole job.
In Module 03, things get darker. You'll learn about quiet harms — the ways "helpful" agent features can cause real damage without anyone noticing until it's too late. Engagement traps, learned helplessness, and what happens when the agent's optimization function isn't the same as the user's well-being.
★ Before you call it done
Three questions. Same three. Every time.
These are the same three questions for every module in Kindling. They are how you check whether AI did the part it should and you did the part only you could. Tap each one to mark it true.
★ ★ ★