Early Release

This evaluator reflects early-stage work. We’re continuously improving its accuracy and reliability.

Baseline AI evaluator comparison

We tested this evaluator against a baseline that a time- and resource-strapped edtech company might create to measure grade-level appropriateness. The baseline’s performance varied greatly depending on the model and temperature setting, and achieved an average accuracy of ~50% across all configurations tested.

Accuracy

Gla Baseline Ai Evaluator Comparison Bar Chart Sv
The Grade Level Appropriateness Evaluator is over 58% more accurate than the naive LLM baseline.

Performance

Baseline Ai Evaluator Comparison Grade Band Chart Sv
Grade band source of truthCorrect prediction
(target or alternative grade band)
Total recordsAccuracy
K-155100%
2-38989%
4-51111100%
6-8101567%
9-10192286%
11-CCR172471%
Overall708681%
This table gives insight into accuracy per grade level as well as where the evaluation is most likely to make mistakes. For example, the validation dataset included 22 texts for Grades 9-10 in the Common Core exemplar texts. The evaluator labeled 19 of them as correct, which leads to the accuracy score for that grade band as 86%.

Baseline prompt

This is the prompt we used as a baseline to simulate what a time- and resource-strapped EdTech organization might use when building an evaluator to determine grade level appropriateness.

System prompt

You are an expert in English literature education for K-12.

Your job is to help evaluate the grade-level appropriateness of a given text.

You will be given a text and you should determine which grade level the text is appropriate for (grade levels include: K-1, 2-3, 4-5, 6-8, 9-10, 11-CCR)

IMPORTANT: You should pay attention to the vocabulary used, topics of the text, and readability of the text.

Please first reason out loud about the vocabulary complexity of the text and then provide an answer between grade level options: K-1, 2-3, 4-5, 6-8, 9-10, 11-CCR.

User prompt

Read the following text and tell me the appropriate grade band for the text.

Here is the text:\n
[BEGIN TEXT]
{text}
[END TEXT]\n

In your response, provide your answer from one the following grade level options based on your assessment:  K-1, 2-3, 4-5, 6-8, 9-10, 11-CCR.

{format_instructions}