Skip to content
Treemarks
🌿 Treemarks
Our grading position

Grade against the work

Grade signal is fading. Curving it back is the wrong fix. Grade against the work, and reveal the signal a real assessment already contains.

A grade should track what a student demonstrably earned, measured against a clear standard and applied identically to everyone, not where they happened to land relative to whoever else enrolled, and not a default "A" handed out because rationing credit by hand was too hard. That principle is breaking in both directions at once.

The signal is collapsing, from three directions

Inflation has made the A meaningless, AI is making take-home work converge, and grading at scale across many TAs is inconsistent by the numbers. Each, on its own, erodes what a grade can tell anyone.

Three forces killing grade signal: inflation, AI, scale
Inflation (Harvard: 84% of students get an A/A‑; employers screening on GPA fell 73%→37%), AI (+13pp A's in AI‑exposed courses; LLMs homogenize written work), and scale (the same work drew A+ to F across graders, κ≈0.2). Sources in the notes.

The fixes schools reach for are crude, and they know it

Faced with dead signal, departments curve down (Harvard's A‑cap; Princeton's deflation, repealed after backlash) or bolt on a rank statistic. Curving down has a specific harm: a student who demonstrably meets the standard is marked to a B+ because the quota is full. They get less than they earned. ~85% of Harvard's students oppose the cap for exactly this reason. And the choice is live: departments are weighing whether to curve down right now, which is exactly the moment to put a third option on the table.

The false choice: flat/inflated vs curved-down vs earned
The false choice, and the third way. Everyone‑gets‑an‑A carries no signal; curving down punishes competent students; grading against the work gives each student what they earned, applied identically.

The honest diagnosis: signal lives in the assessment, not the grading

Here is the line we hold even when it costs us a flashier claim. You cannot grade signal into an easy or AI‑trivialized assignment. A task everyone passes has near‑zero power to discriminate, no matter how finely you score it. Granular, consistent grading reveals, records, and audits the signal a discriminating assessment contains; it does not create signal that isn't there. That is why the real answer has two halves, discriminating assessment and granular criterion grading, and why proctored, process‑revealing assessment is returning (Princeton mandates proctoring from July 2026). AI can't homogenize a handwritten in‑class exam.

Why we leave essays to people

The same line draws our hard boundary. An argue‑a‑position essay is written for a reader: the implicit contract of the assignment is that a human audience is being persuaded, and whether the argument lands is a judgment only that audience can make. Point a model at it and you quietly change what the assignment measures. So we grade where an external anchor decides correctness — governing equations, boundary conditions, units, a defined method — and we leave argumentative and open‑ended writing to people. This is the verification‑versus‑interpretation boundary we drew early, anticipating exactly this concern rather than discovering it after a bad demo.

The proof, honestly bounded

The real proof: a genuinely hard exam. A real final exam, graded against an instructor's 18‑part, 131‑tier rubric: real signal in the assessment, and our engine reveals it consistently, earning partial credit when the method is right and the number is wrong, with no false zero recorded on the exam.

What real signal looks like: a hard exam's distribution, 43 to 99.5
What real signal looks like. A differentiating exam, a real class, scores 43–99.5. Granular grading reveals what each student demonstrated; it can't manufacture a spread the exam doesn't contain. Students anonymized.

The honest, modest case: a lenient exam. Re‑grading a near‑ceiling, proctored midterm, we reproduced the instructor's leniency and added only modest resolution, because a near‑ceiling exam has little signal to reveal. That isn't a flaw in the grading; it's the thesis, demonstrated. Grading reveals signal; the assessment has to contain it.

A grade should reflect what a student demonstrably earned, applied identically to everyone. You shouldn't have to choose between "everyone gets an A" and "we cap A's by fiat."

We're bold on the principle and humble on the prescription: your grading policy is yours. We provide the half that never scaled by hand: consistent, criterion‑referenced, granular grading that makes earned signal legible, auditable, and fair, paired with the assessment‑design work to ensure the signal is there to reveal. Grading + assessment design. Signal restored, fairly.