Running the evaluator

Early Release

This evaluator reflects early-stage work. We’re continuously improving its accuracy and reliability.

To run this evaluator, you’ll need:

The system prompt & user prompt.
The formatting instructions.
The text to evaluate.
A model and temperature setting. We found this to be the most accurate.
- Suggested model: Gemini-2.5-pro (specifically Gemini-2.5-pro-preview-06-05)
- Temperature: 0.25
- GOOGLE_API_KEY (see Set up your environment)

Personal data: Use only anonymous, general-purpose, informational text. Never use student information, real-world personal data, or any content subject to privacy regulations such as FERPA, COPPA, HIPAA, or GDPR.
Passage length: The evaluator was built and optimized for using passages up to 1,200 words. It has not been tested on text longer than that.
Text type: It was developed using the CLEAR corpus and Common Core Appendix B, and it will perform best on informational text. It may not be suitable for poetry and other similar text types.

The evaluator is designed to analyze the text’s quantitative, qualitative, and background knowledge dimensions in one step.

The evaluator will return the following:

Target recommended grade band (e.g., Grade 4-5).
An alternate grade band and required scaffolding to make the text suitable for that level.
Rationale that includes assessment of qualitative, quantitative, and background knowledge requirements.

Run multiple times.We recommend running the evaluator 3 times per passage and using a voting mechanism to increase consistency.