Most agents are built from someone's imagination of "users." That's why most agents are bad. Agent Lab refuses that move on day one: your agent is designed for one person, after watching them, not for an imagined audience.
The deeper move is restraint. Real engineers know it: before you build, you watch. Before you write code, you write down what you actually saw. The watching IS the project — the spec just records what watching taught you.
An agent is a system that takes action on someone's behalf: schedules things, sends messages, picks options. Not a chatbot. The action is what makes the design decisions matter. An agent that mis-schedules your grandmother is a bigger problem than a chatbot that gets a fact wrong.
Step by step
-
Pick the person. Get permission.
This isn't optional. Watching someone in order to build for them is empathetic. Watching someone without telling them is creepy. Tell the person what you're doing. Ask them what they'd want help with. The answer often surprises you.
Possible people: a grandparent who keeps missing reminders. A younger sibling who can't stay on top of homework. A parent who keeps losing parcel deliveries. A friend who never reads the group chat in time.
-
Watch for 3 days. Take field notes in a strict format.
3 days. Maybe 15 minutes a day. You're not measuring; you're noticing. Field notes have a strict shape: what they did, when, what came before it, what they said about it, what surprised you.
The "what surprised you" line is the most important. It's the part where your assumption from Day 1 collides with what you saw on Day 3. The collision is the gold.
-
Write the agent spec.
Spec has 3 short sections: does, does NOT, audit. The "does NOT" list is the hardest and most important. If you can't write 3 things the agent will not do, you haven't thought hard enough. (Hint: it should never spend money, never delete things, never speak as the user.)
-
Read the spec back to the person you watched.
Out loud. With them sitting next to you. Ask: "Are these the right things? Is anything missing? Is anything I'm planning that you wouldn't want?" Edit. Re-read.
This is consent. Real engineers do this with real users before they build. You're doing it before you'd build, even at the design stage.
-
Submit the spec. (No code yet.)
That's the project. The spec is the deliverable. Designing carefully and stopping there is harder than writing code in haste. Real product managers make their living from this exact discipline.
A complete worked example
The full parcel-and-calendar-bridge design for "Mom (Ana)." Field notes from 3 days, then the spec.
subject: Mom (Ana)
goal: figure out where the calendar pain actually is
period: Mon-Wed, ~15 min/day
day_1:
observation: |
9:02am — got Amazon parcel notification "ready for pickup"
11:15am — read it again, said "ok later"
5:48pm — re-read the notification. asked me where it was.
6:10pm — drove 12 minutes to Whole Foods locker. it was closed.
her_words: |
"I keep doing this. The notification doesn't survive my morning."
what_surprised_me: |
The pain isn't 'too many notifications' — it's 'wrong moment'.
Morning notifications die by 4pm. She needs them at 4pm.
day_2:
observation: |
8:30am — RSVP'd to a school PTA meeting via the school portal
8:32am — closed the tab
7:11pm — asked me "wait, did I say yes to that thing?"
her_words: |
"It's the calendar that already lives somewhere else that
I always lose track of."
what_surprised_me: |
She has 4 calendars: iPhone, Google, school portal, doctor.
None talk. She doesn't trust any one of them as the truth.
day_3:
observation: |
Mid-morning fine. Afternoon she missed two reminders for
medication refill. She had set them — they fired — she
dismissed them mid-task and forgot.
her_words: |
"I always think I'll come back to it. Then I never do."
what_surprised_me: |
She doesn't ignore reminders. She dismisses while busy
and the brain skips the 'come back to it' step.
## summary going into the spec
Real pain (in order):
1. Notifications fire at the wrong moment (the 4pm parcel problem)
2. The 4 calendars don't talk
3. Dismissed reminders die — need a second-chance digest
Pain we will NOT solve:
- merging the 4 calendars (out of scope)
- voice interaction (she doesn't want it)
- replacing paper sticky notes (paper wins, leave it alone)
name: parcel-and-calendar-bridge
built_for: Mom (Ana)
based_on: 3-day field notes
what it does:
- watch incoming parcel notifications
- send Mom ONE SMS at 4pm with the day's deliverable items
- if she snoozes a reminder, send ONE follow-up 30 min later
- log every action so she can see what it did
what it does NOT:
- cancel any meeting (ever)
- reply to anyone on Mom's behalf (ever)
- spend money (ever)
- send more than one follow-up per dose (ever)
- merge the 4 calendars into one (paper wins)
- run between 9pm and 6:30am (ever)
how it shows its work:
- every proposed action goes to a log Mom can read
- log has timestamp, source, what it did, why
- Mom can read it on her phone at any time
permission level (where to start):
SUGGEST ONLY for the first week.
The agent shows what it WOULD have done.
Mom reads the log with me on Sunday and decides
whether to promote it to "actually act" next week.
questions I haven't decided yet:
- what happens if Mom is traveling (different time zones)?
- what happens if she's sick and shouldn't be reminded
of low-priority things?
→ bring these to her on Sunday review and let her decide.
Live demo 1: write a day of field notes
Use this widget every day during your 3-day watch. The structured fields force you to capture the four required pieces: observation, their words, what surprised you, tag.
Field-notes templater
Live demo 2: sketch your agent spec
Once you've got 3 days of notes, write the spec. The widget forces you to fill in BOTH a "does" list AND a "does NOT" list. The "does NOT" list must have at least 3 items — that's the discipline.
Agent-spec sketcher
What makes this hard
The hardest single discipline is the 3 days of just watching. You'll want to start writing the spec on day 1. Don't. Your assumption from Monday will be wrong by Wednesday. That correction is the project. Without the 3 days, you'd write the spec for an imaginary version of Mom.
The second hard thing is the does NOT list. It feels limiting. It IS limiting — and that's the feature. An agent with a long does-NOT list is a trustworthy agent. An agent with a short does-NOT list is a scary one.
The third — most subtle — is reading the spec back. You'll feel awkward. Mom will feel awkward. Do it anyway. The 90 seconds of "okay, does this all sound right?" prevents months of an agent doing things the user didn't actually want.
Self-check before you ship
- Field notes from 3 days, in the prescribed shape (observation / their words / what surprised me).
- The agent spec has both a DOES list and a DOES NOT list — and the DOES NOT list has at least 3 items.
- I can describe one specific thing I assumed Day 1 that I unlearned by Day 3.
- I read the spec aloud to the named person, and they confirmed it.
- The spec includes "questions I haven't decided yet" — at least one — that I want their input on.
Try it · once your spec is signed
A signed spec is the deliverable. Want to push?
- Watch a 4th day, blind. Don't tell the user, but on day 4, just take notes again. (Get their permission first that this is okay.) Compare day 4's notes against your spec. Did the spec already cover what happened? If not, what did you miss?
- Run a "shadow" diary for a week. Pretend the agent is running. Each day, write down what it WOULD have done. Show the diary to the user weekly. They get to review every imaginary action before any real one. This is exactly how engineers ship safety-sensitive AI.
- Bring the spec to one of the next two projects. The Agent Lab finale (Project 06) builds on this exact spec. You'll be glad you did this carefully.