For TAs
Are we going to be replaced?
It is the first thing every TA asks about a grading tool, and it is the right question. Treemarks was built by a teaching assistant who has graded more than a thousand submissions across Stanford engineering courses, so here is the honest answer, with the numbers behind it.
The answer is no.
Treemarks doesn’t run without a person. The human in the loop is essential: to handle the edge cases of rubric design, and to be responsible for the grade that gets posted.
The pressure on TA roles is real, and it predates AI. This is a tool that lets you do the job faster and more thoroughly.
Graduate fellowships that fund many TAs have been cut sharply since 2025, and the TAs who remain carry more students. Being available to them matters more than ever.
Why now
More to grade, fewer to grade it.
Over two decades, the degrees U.S. colleges confer have grown far faster than the full-time faculty who teach and grade them. By 2020, each faculty member accounted for about 16% more graduates than in 2000. The work each grader carries has grown.
We built Treemarks to equip TAs with the tools to grade better and faster, so they can focus on teaching.
Sources: NCES (degrees and faculty counts) · Nature (2025) on the graduate-fellowship cut.
How you would use it
A real grading week, with Leaf.
Leaf is an AI agent that lives in the chat your course already uses. There is no new app to learn. The conversation is the whole interface.
You hand it the work
DM Leaf a PDF of submissions, or say “pull PS4 from Canvas and grade it in my style.” That message is the whole setup.
It does the first pass
It builds the key and rubric from the course materials, grades every submission with a checkable reason, and writes each student feedback worth reading.
You make the calls
Leaf brings you the few decisions that need a human, shows what each one changes, and waits. You approve, and grades write back to Canvas or Gradescope.
If you credit it
Your corrections stick
One call, applied to the whole class.
Move one rubric tier and every student re-scores the same way, including the ones who would never have emailed to argue. You are not clicking through 118 submissions. You are making the handful of decisions that actually need your judgment, and Leaf carries them across the cohort consistently.
On the work itself
Every student gets feedback they can act on.
Not a bare score. A specific note on the student’s own work, tied to the rubric line, with the fix. It is the feedback you would write for every student, if you had the hours.
Student 14 · Q3, step 4
Q̇ = ṁ·cp·ΔT = (2.4)(1.0)(40) = +96 kW
Illustrative example
The apprenticeship
“But grading is how TAs learn to teach.”
Automate the entry rung and you deskill the pipeline that produces the next generation of instructors.
What’s true: the apprenticeship is real, and it matters.
But look at what grading-as-practiced is. Grading does not teach TAs; solving does. Grading as practiced is value-hunting under fatigue. What builds teaching judgment is seeing the whole error landscape and working the genuinely hard calls. That is what you do here: reviewing rubrics, deciding the escalated cases, and crediting valid alternative methods, the ones that today get docked when a tired grader pattern-matches against the official solution. You spend your time on the interesting ten percent instead of the routine ninety.
You see the patterns
A per-question map of where the class went wrong, in aggregate, instead of one paper at a time.
You make the judgment calls
Alternative methods, partial credit, the ambiguous prompt. The decisions that teach you to teach.
You get the hours back
The recovered time goes to office hours and one-on-ones, the highest-leverage teaching there is.
Want Leaf on your course?
Pilots run on one course for a term. Start one, or send this to whoever runs the class. You keep the final say on every grade.