by Leila · Harness Studio, Module 02
The work
Leila built a harness that reads a short piece of writing and scores it on how much it sounds like something an actual child wrote. She trained it on samples from her own classmates. Then she turned it on three different AI models pretending to be kids, and then — the part she says was the hardest — on three of her own recent essays.
The headline finding isn't about AI. It's about her. One of her own essays scored lower than one of the AI models. She left that finding in the report.
The thing itself · Leila's judge, running on 6 samples
The finding Leila left in the report
"My own essay from last month scored lower than Model A. I think it's because I've been writing what I thought my teacher wanted to read instead of what I noticed. The judge can't tell the difference between that and a chatbot. I think my teacher can't either. I'm going to fix my writing before I fix the judge."