AI grading for engineering courses
The teaching assistant’s assistant.
Treemarks is an AI agent that helps you design and grade assessments for engineering courses.
Leaf works in the chat your course staff already uses: hand it a problem set, or let it pull one from Canvas. It grades in your style, writes feedback worth reading, and shows you where the class needs help. You approve before anything posts.
Built by a Stanford teaching assistant, applied in live courses.
Course knowledge graph
What one grade hides
+ 8 more concepts, each traced to its module
A grade is one number. Treemarks connects scores to class concepts, so you know exactly what to revisit.
Illustrative
What grading costs today
The challenges of grading at scale.
Teachers face a false choice: grade everything by hand and fall behind, or hand it to tools that flatten real engineering work into checkboxes. Treemarks is the third option. It carries the load, and the judgment stays with you.
A dozen TAs, a dozen standards
Grades drift between graders, and grading fatigue drifts them again over a long stack. The same work can earn a different score depending on who marks it, and when.
Feedback arrives too late to matter
Students wait weeks for a problem set, and by the time it comes back the class has moved on. In real classrooms, faster feedback has generally proven better for learning.
Front-end AI leaks private data
To save time, graders paste whole submissions into public chatbots. That can expose private, FERPA-protected data, with no audit trail.
Do the math: a 120-student course can mean 600+ submissions a term. At ~8 minutes each, that’s 80+ hours of marking a term, before a single regrade. (Illustrative.)
Who it’s for
Built for everyone the gradebook touches.
For teachers
See where your class needs help, safely.
The diagnostic an overloaded TA never writes up, student data that stays protected, and your final say on every grade. Plus the hard questions, answered.
The case for teachersFor students
Is a robot grading me?
A person is accountable for your grade, you get real feedback instead of a number, and you are graded on your reasoning, not your handwriting. Your rights, in plain terms.
What this means for youFor TAs
Are we going to be replaced?
The honest answer, with the numbers, from a teaching assistant who built this. Plus how you would actually use Leaf in a grading week.
The case for TAsHow it works
Meet Leaf. It grades. It checks in. You decide.
What a real week looks like. The conversation is the whole interface, with no new dashboard to learn.
It notices the deadline
PS4 closes, and Leaf already knows. It pulls the submissions from Canvas and starts, or just DM it a PDF.
It grades the work
It builds the key and rubric from your materials, grades every submission, and writes each student feedback worth reading.
It DMs you
Leaf reports back, surfaces the calls that need you, and asks how you want them handled. Approve, and grades write back to Canvas or Gradescope.
Private by design. A local model strips names before any cloud call, so the grader sees “Student 14,” never a name. The channel carries the conversation and aggregates; named grades stay in an auditable record behind the link, and student work never trains anyone else’s model.
The evidence
We did the testing.
Treemarks was built inside real courses, graded against real instructors, and measured rather than asserted.
- A full term, graded end to end
- A complete term, end to end: ten problem sets, two midterms, a final. Every grade was approved by a human before it posted.
- Measured against your own graders
- Before Treemarks is trusted on a course, it grades blind against your own TAs and shows you the gap. It has matched an in-house TA team on the large majority of sub-parts on a real midterm, and matched an instructor in a different department closely. Its grading principles are versioned and won under blind A/B testing, not chosen by vibes.
- Real engineering work, and an honest edge
- Handwritten exams, spreadsheets, derivations, graded where physics rather than the reader defines the answer. Where the only anchor is the reader, like argue-a-position essays, we leave it to people, and we say so.
Why now
Why this is possible now.
Instructors already grade with AI
Most do it by pasting student work, names and all, into public chatbots. The privacy problem isn’t coming; it’s already here.
Enrollments up, TA budgets down
Leaner teams, more submissions, the same deadlines. Something has to give, and today it’s feedback quality.
AI can finally do the work
Models can follow derivations, spreadsheets, and reasoning well enough to grade them, and agents can now carry a whole job, not just speed up one step of it. Grading is exactly that kind of job.
The Treemarks network
Join the Treemarks network.
We take a few courses each term as design partners. Leaf joins the chat your staff already uses; you get a written evidence packet at term’s end and a real say in the roadmap. Every course makes the next one sharper: the method and the evidence compound across the network.
- One course, one term
- We agree on what success looks like up front
- Confidential by default: your course never becomes a logo
- You keep the final say on every grade
Thanks, we’ll be in touch.
We read every request personally.