Documentation Index
Fetch the complete documentation index at: https://docs.learningcommons.org/llms.txt
Use this file to discover all available pages before exploring further.
What evaluators do
Evaluators assess the quality of AI-generated educational content by:- Measuring key dimensions of text for pedagogical alignment
- Identifying areas for improvement
When to use evaluators
Whether you’re testing, refining, or scaling, evaluators help you do it better and faster. Here are four ways you can use them effectively.| Use case | Examples | Implementation |
|---|---|---|
| Optimize your product | You are building a vocabulary-focused feature – you want higher vocabulary difficulty and simpler sentence structure. You are creating read-aloud support and want to deprioritize vocabulary complexity. | Set targets for vocabulary and sentence structure against grade-level appropriateness. Run the Sentence Structure Evaluator and Vocabulary Evaluator on your LLM outputs to confirm that they stay in acceptable ranges. |
| Monitor consistency | Your AI output starts to vary unexpectedly after model drift or small system updates. | Run regular regression tests on your LLM outputs and compare scores over time to ensure stable behavior. |
| Select the right model | You need to compare new models on quality, speed, and cost before switching. | Create a gold set with expected scores for key parameters (e.g., grade level, topic, text type). Use evaluators as a standardized benchmark to monitor drift from your baseline. |
| Build trust with your users | Districts and educators ask for evidence that your AI-generated content is high-quality and aligned with learning principles. | Share your evaluation process and results so stakeholders can see the rigor behind your system and trust that your outputs remain consistent and research-aligned. |
Our approach
Learning Commons collaborates closely with pedagogical experts to define, test, and build our evaluators. We follow a research-informed process to develop evaluators that are firmly anchored in learning science:- We build alongside experts in learning science and rubric development (e.g. Student Achievement Partners ↗, CAST ↗, and Achievement Network (ANet) ↗)
- We translate expert insight into ground-truth datasets that reflect real teaching and learning principles.
- We develop, validate, and ship software that evaluates text the way an expert would.