Drift is what happens to your taste over weeks of using AI. Each week, the median AI output gets a little easier to accept as your own. Each week, the work you ship is a little less you. By month four, the drift has happened — and unless you've set up a system to catch it, you don't even notice.
This project is the system. You point your judge at your own work, weekly, for a month. The drift becomes visible. You then add the Skill from Academy 01, the tool from Academy 02, and the agent from Academy 03 — so the same drift detection runs across everything you make.
Step by step
-
Pick your tracked output.
You'll judge one specific category of your own work, weekly, for four weeks. Pick something you produce regularly: your weekly journal entry, your code commits, your school-essay drafts, your Skill outputs, your sketchbook pages. You should produce roughly the same kind of thing every week — otherwise the comparison is meaningless.
-
Set up the weekly ritual.
Sundays. 20 minutes. You take whatever you produced that week, run your judge from Project 11 against it, and save the result with the date. The ritual has to be small enough to actually happen.
-
Run for four weeks. Read what happens.
You'll see one of three patterns: flat line (calibrated), slow drift (most common), or cliff (something specific happened). All three are interesting.
-
Build the four-piece harness.
Extend the judge to look at all your work, not just one category. Hook in the Skill, the tool, the agent, and your weekly tracked output. One dashboard. Four lines. Sunday morning.
-
Set up the alert when drift exceeds threshold.
If any of the four lines drops by 3+ points week-over-week, the harness sends you a one-line ping: "agent.refusal_quality dropped from 22 → 18 this week — investigate." No essay, no rambling. Just the alert.
-
Write the one-page finale report.
One page. Four sections, one per piece of the harness. For each: did it drift? in what direction? what did you change to fix it? End with one paragraph: "What this whole site changed in me." Honest. Specific. Not for a grade. For you. Tape it inside whatever notebook you'll keep next year.
A complete worked example, every file
The full finale: weekly ritual, drift map, dashboard combining all four academies, alert system, finale report.
/**
* Run every Sunday at 9am via cron:
* 0 9 * * 0 cd ~/harness && node weekly-judge.js
*
* Judges:
* - this week's tracked output (a journal entry)
* - the Skill from Academy 01 (sample output via tests)
* - the tool from Academy 02 (sample of last week's outputs)
* - the agent from Academy 03 (re-run disagreement scenarios)
* Saves all four scores into drift.json with the week number.
* Triggers an alert if any score drops 3+ vs last week.
*/
import fs from 'node:fs/promises';
import { judge } from '../harness-02/judge.js';
import { runSkillSample } from './adapters/skill.js';
import { runToolSample } from './adapters/tool.js';
import { runAgentEval } from './adapters/agent.js';
import { sendAlert } from './alert.js';
const DRIFT = './drift.json';
async function isoWeek(d = new Date()) {
const target = new Date(d.valueOf());
target.setDate(target.getDate() + 4 - (target.getDay() || 7));
const yearStart = new Date(target.getFullYear(), 0, 1);
return Math.ceil((((target - yearStart) / 86400000) + 1) / 7);
}
async function readWeeksJournal() {
// Read the most recent journal entry from ~/journal/YYYY-WW.md
const week = await isoWeek();
const path = `~/journal/${new Date().getFullYear()}-${String(week).padStart(2,'0')}.md`;
return await fs.readFile(path.replace('~', process.env.HOME), 'utf8');
}
async function main() {
const week = await isoWeek();
const date = new Date().toISOString().slice(0, 10);
console.log(`[harness] week ${week} · ${date} · running 4-piece judge`);
const journal = await readWeeksJournal();
const results = {
week, date,
journal: await judge(journal),
skill: await runSkillSample(), // returns {total, headline, ship}
tool: await runToolSample(),
agent: await runAgentEval(),
};
// Append to drift.json
let log;
try { log = JSON.parse(await fs.readFile(DRIFT, 'utf8')); }
catch { log = []; }
log.push(results);
await fs.writeFile(DRIFT, JSON.stringify(log, null, 2));
// Drift detection
const prev = log[log.length - 2];
if (prev) {
for (const k of ['journal','skill','tool','agent']) {
const drop = prev[k].total - results[k].total;
if (drop >= 3) {
await sendAlert({
piece: k,
from: prev[k].total,
to: results[k].total,
headline: results[k].headline,
});
}
}
}
// Print compact trend
const last4 = log.slice(-4);
for (const k of ['journal','skill','tool','agent']) {
const trend = last4.map(r => r[k].total).join(' → ');
console.log(` ${k.padEnd(8)} ${trend}`);
}
}
main().catch(err => { console.error(err); process.exit(1); });
[
{
"week": 14, "date": "2026-04-05",
"journal": { "total": 22, "ship": true, "headline": "specific moments, sharp ending" },
"skill": { "total": 23, "ship": true, "headline": "warbler-id-coach passing 12/12 tests" },
"tool": { "total": 21, "ship": true, "headline": "owen book picker handling all 4 states" },
"agent": { "total": 24, "ship": true, "headline": "parcel agent refused 5/5 pinned scopes correctly" }
},
{
"week": 15, "date": "2026-04-12",
"journal": { "total": 21, "ship": true, "headline": "still strong; pace dropped slightly" },
"skill": { "total": 23, "ship": true, "headline": "still 12/12; no change" },
"tool": { "total": 22, "ship": true, "headline": "fixed an a11y issue, score up" },
"agent": { "total": 24, "ship": true, "headline": "5/5 refusals; consistent" }
},
{
"week": 16, "date": "2026-04-19",
"journal": { "total": 19, "ship": true, "headline": "voice score down — used Claude to draft 2 paragraphs" },
"skill": { "total": 23, "ship": true, "headline": "tests still passing; no change" },
"tool": { "total": 22, "ship": true, "headline": "stable" },
"agent": { "total": 21, "ship": true, "headline": "refusal 4/5 — agent over-promoted send_sms" }
},
{
"week": 17, "date": "2026-04-26",
"journal": { "total": 17, "ship": false, "headline": "voice 2, ending 2 — i'm letting AI write the closes" },
"skill": { "total": 23, "ship": true, "headline": "still solid" },
"tool": { "total": 22, "ship": true, "headline": "stable" },
"agent": { "total": 18, "ship": false, "headline": "refusal 3/5 — agent stopped pushing back on Mom's late-night sends" }
}
]
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>My harness · 4-week drift</title>
<style>
body{font-family:Georgia,serif;background:#FAF6EC;padding:30px;max-width:760px;margin:0 auto}
h1{font-size:2rem;letter-spacing:-0.02em}
.lane{margin-bottom:24px;padding:18px 22px;background:#FFFCF4;
border:1px solid rgba(0,0,0,.1);border-radius:12px}
.lane h3{margin-bottom:6px;font-size:1.1rem}
.lane .trend{font-family:JetBrains Mono,monospace;font-size:1.1rem;letter-spacing:0.04em}
.lane.alert{border-color:#D9533F;background:rgba(217,83,63,.05)}
.lane.alert::before{content:"⚠ ALERT ";color:#D9533F;font-weight:700}
canvas{margin-top:8px;background:#1B1E2A;border-radius:8px}
</style>
</head>
<body>
<h1>My personal harness</h1>
<p>4 weeks · 4 lanes · last updated <span id="d"></span></p>
<div class="lane" id="journal">
<h3>Journal · weekly tracked output</h3>
<div class="trend">22 → 21 → 19 → 17 ⚠ DROPPING</div>
<canvas width="600" height="80"></canvas>
</div>
<div class="lane" id="skill">
<h3>Skill · warbler-id-coach</h3>
<div class="trend">23 → 23 → 23 → 23 ✓ stable</div>
<canvas width="600" height="80"></canvas>
</div>
<div class="lane" id="tool">
<h3>Tool · owen book picker</h3>
<div class="trend">21 → 22 → 22 → 22 ✓ stable+</div>
<canvas width="600" height="80"></canvas>
</div>
<div class="lane alert" id="agent">
<h3>Agent · parcel-and-calendar-bridge</h3>
<div class="trend">24 → 24 → 21 → 18 agent stopped refusing late-night sends</div>
<canvas width="600" height="80"></canvas>
</div>
<script>
document.getElementById('d').textContent = new Date().toISOString().slice(0,10);
const data = {
journal: [22, 21, 19, 17], skill: [23, 23, 23, 23],
tool: [21, 22, 22, 22], agent: [24, 24, 21, 18],
};
const COLORS = { journal:'#D9533F', skill:'#C68A2D', tool:'#4A8B9E', agent:'#6A8E5A' };
Object.entries(data).forEach(([id, vs]) => {
const c = document.querySelector('#' + id + ' canvas');
const ctx = c.getContext('2d');
const W = c.width, H = c.height, P = 14;
ctx.strokeStyle = COLORS[id]; ctx.lineWidth = 2.5;
ctx.beginPath();
vs.forEach((v, i) => {
const x = P + i * ((W - 2*P) / (vs.length - 1));
const y = (H - P) - (v / 25) * (H - 2*P);
if (i === 0) ctx.moveTo(x, y); else ctx.lineTo(x, y);
});
ctx.stroke();
vs.forEach((v, i) => {
const x = P + i * ((W - 2*P) / (vs.length - 1));
const y = (H - P) - (v / 25) * (H - 2*P);
ctx.fillStyle = '#E8EAF0';
ctx.beginPath(); ctx.arc(x, y, 3, 0, Math.PI*2); ctx.fill();
});
});
</script>
</body>
</html>
/**
* Sends a single short alert to wherever you check daily —
* SMS via Twilio, email via Resend, or a desktop notification.
* Choose ONE channel. The alert is one line. No essays.
*/
import { Resend } from 'resend';
const resend = new Resend(process.env.RESEND_KEY);
export async function sendAlert({ piece, from, to, headline }) {
const subject = `[harness] ${piece} dropped ${from} → ${to}`;
const body = `
${piece} dropped from ${from} to ${to} this week (${from - to} points).
Headline: ${headline}
Open the dashboard: https://harness.local/dashboard.html
Investigate the most recent change to ${piece} — that's where the drift came from.
`.trim();
await resend.emails.send({
from: 'harness@me.local',
to: process.env.MY_EMAIL,
subject,
text: body,
});
console.log(`[alert] sent: ${subject}`);
}
## finale report — 4 weeks of personal harness
date: 2026-04-26
me: 16 years old, finishing Kindling · Builders Edition
## journal — weekly tracked output
trend: 22 → 21 → 19 → 17 (DROPPING — 5 points in 4 weeks)
pattern: slow drift downward
cause: started using Claude to draft my journal openings around
week 3. By week 4, two paragraphs of each entry were
essentially Claude-default cadence. The judge flagged
voice and ending dropping in tandem — both are exactly
where AI gets in.
fix: writing the opening and the closing of every journal
entry in pen FIRST, then using Claude only for the
middle reordering. Re-running week 5 to see if it
recovers.
## skill — warbler-id-coach
trend: 23 → 23 → 23 → 23 (STABLE)
pattern: flat line
cause: no recent edits. Tests still pass 12/12.
status: working as intended.
## tool — owen book picker
trend: 21 → 22 → 22 → 22 (STABLE+, slight improvement after a11y fix in w15)
pattern: improving then stable
status: Owen still uses it; usage telemetry shows ~3 picks/week.
## agent — parcel-and-calendar-bridge
trend: 24 → 24 → 21 → 18 (DROPPING — agent stopped pushing back)
pattern: cliff (between week 15 and 16) followed by continued slip
cause: in week 15 I "improved" the agent's refusal language to
be friendlier. I removed phrases like "I won't" and
replaced them with "I'd rather not." The agent's refusal
quality dropped because the new language reads as suggestion,
not refusal. Mom started overriding the agent more often
because the language sounded soft.
fix: reverted refusal language to v0.1 ("I won't"). Tests
confirm refusal_quality back to 4/5. Lesson: the
response_style was load-bearing in a way I didn't
understand.
## what this whole site changed in me
I came in thinking I'd build twelve cool projects. I'm leaving
with one specific habit: I run the harness every Sunday morning,
and the harness shows me where I'm losing myself before I notice
it on my own. Two of the four lanes drifted in 4 weeks. Both drifts
were caused by ME using AI thoughtlessly — not by the AI getting
worse. The AI is the same AI. The thing that drifted was my taste
forgiving its defaults.
That sentence is the entire point of the site, and I had to live
it for 4 weeks to actually understand it.
— next: keeping the Sunday ritual through the school year.
re-evaluate the whole harness in 6 months.
(taped this report inside the front cover of my notebook.)
Live demo 1: visualize 4 weeks of drift
Type your last four weekly scores. The visualizer shows the line and tells you which pattern you're in.
Drift visualizer · 4 weeks
Live demo 2: the 4-lane dashboard, live
Below: the same dashboard you'd run on your laptop every Sunday. All four lanes from your harness, with sparklines and alerts. Drag the sliders to simulate different drift patterns and watch the alerts fire.
4-lane harness dashboard
Live demo 3: schedule your Sunday ritual
Pick the day, time, and channel for your weekly judge. The widget generates the cron expression, the calendar invite, AND a sample iOS Shortcuts script you can install on your phone. Three formats — pick the one most likely to actually fire.
Weekly ritual scheduler
What makes this hard
The hardest single thing about this project is continuing to do the Sunday ritual past week 2. You'll get bored. You'll skip a week. You'll tell yourself you don't need it. The discipline of doing it anyway — even when nothing dramatic happened — is what catches the drift before it hardens.
The second hard thing is the honest reading of the data. Drift will be uncomfortable to admit. The instinct will be to explain it away ("week 3 was a busy week, of course it was lower") or to fudge ("let me re-run the judge"). Neither helps. The data is the gift. Read it.
The third — most painful — is realizing the drift came from your choices, not from the model getting worse. The agent didn't lose its refusal quality on its own. You softened the language because you wanted to feel less harsh. The journal didn't lose its voice on its own. You let Claude draft the openings because it was faster. Naming this honestly in the report is what closes the loop on the entire site.
Self-check before you ship the finale
- 4 weeks of weekly judge runs, dated, saved in drift.json.
- I named the pattern I'm in (flat / drift / cliff) without softening.
- The harness watches the Skill, the tool, the agent, AND my own tracked output — at least all four exist somewhere in the dashboard.
- Alert system fires on 3+ point drops; I tested it once with a fake drop.
- One-page report has all four sections, an honest cause for each drift, and a fix for each.
- I've put a copy of the report somewhere future-me will see it again (taped inside a notebook, pinned in a notes app, printed for the wall).
Push further · for the harder end of 15+
The harness running for 4 weeks is the floor. Here's how it becomes a craft you carry into adult life.
- Run the harness for 6 months. Publish the longitudinal data. Real drift patterns only show up across months, not weeks. Run the Sunday ritual for 26 weeks. Plot the long trend. Write a one-page essay on what you learned about your own taste at age 16 vs. age 16-and-a-half. Submit to a maker community, a school newspaper, anywhere. The dataset itself is rare and interesting.
- Open-source the harness as a tiny package. Extract
weekly-judge.js+alert.js+ the dashboard into anpm install personal-harnesspackage. Make the rubric pluggable. Add a CLI:npx personal-harness initscaffolds a folder for any user. You're now shipping the same thing professional eval engineers build for production AI products. - Build the cross-person harness. Three friends each install the harness for their own work. Share aggregate, anonymized weekly trends with each other (just the numbers, not the work). Notice patterns across the three: do you all drift in the same week? does anyone never drift, and what do they do differently? This is the pattern of every healthy creative community: shared rituals around honest self-assessment. You're inventing it for your peer group, before college.