by Jae ยท Harness Studio, Module 02
The work
Jae wanted to build an eval for AI-generated animal drawings โ a judge that could look at any image and label it "cute" or "creepy" with some confidence. She ran it on a few hundred generated animals. The judge got "creepy" right almost every time. It struggled with "cute." When she dug in, she realized the problem wasn't the judge. It was the category. Creepy has rules. (Too many teeth, uncanny eyes, wrong limb proportions.) Cute doesn't, really.
The thing itself ยท four animals, judged
Jae's headline finding
"Creepy has a checklist. Cute has a vibe. My judge agrees that specimen #4 is creepy because #4 breaks four specific rules. My judge says specimen #1 is cute because... well, it has to guess. I think this is a big deal. It might mean some things are judgeable and some things aren't, and I can't tell the difference by looking at them. I'm going to keep going."