> ## Documentation Index
> Fetch the complete documentation index at: https://docs.learningcommons.org/llms.txt
> Use this file to discover all available pages before exploring further.

# About this evaluator

> Reference documentation for the Grade Level Appropriateness Evaluator.

export const EarlyAccessCallout = ({children}) => <div className="eyebrow-callout not-prose rounded-xl border border-gray-200/80 p-5 dark:border-white/10" style={{
  marginBottom: "1rem",
  borderRadius: "4px"
}}>
    <div className="mb-3">
      <Badge color="green" size="md" icon="flask">
        Early access
      </Badge>
    </div>
    <div className="callout-body text-[15px] leading-relaxed text-gray-700 dark:text-gray-300">{children}</div>
    <style>{`.callout-body a { text-decoration: underline; text-decoration-color: #178251; }`}</style>
  </div>;

[Evaluator last updated September 23, 2025.](#evaluator-release-history)

<EarlyAccessCallout>
  This functionality is actively evolving. Changes may occur as we expand capabilities and improve accuracy and reliability. Email [support@learningcommons.org](mailto:support@learningcommons.org) ↗ with your feedback or issues.
</EarlyAccessCallout>

## At a glance

|                      |                    |
| :------------------- | :----------------- |
| **Input type**       | Informational text |
| **Supported grades** | K–12               |

The Grade Level Appropriateness Evaluator assesses whether AI-generated text is suitable for independent reading at a specified grade band. The evaluator considers:

* Flesch-Kincaid grade level
* Word count
* Text structure (complexity of organization, connections between ideas, role of text features etc.)
* Language features (vocabulary, sentence complexity, use of figurative or abstract language)
* Purpose (how explicitly stated and how concrete or abstract)
* Knowledge demands (discipline-specific knowledge required, references, and allusions)
* Student background knowledge (what a student at a given grade level would already know)

## Model and prompt

For instructions on running the evaluator, see [Running an evaluator](/evaluators/using-evaluators/running-evaluators).

|                     |                                                                                                                   |
| :------------------ | :---------------------------------------------------------------------------------------------------------------- |
| **Model used**      | Gemini-2.5-pro (gemini-2.5-pro-preview-06-05)                                                                     |
| **Temperature**     | 0.25                                                                                                              |
| **Prompts**         | [View prompts](https://github.com/learning-commons-org/evaluators/blob/main/evals/prompts/gla_prompts.py) ↗       |
| **Python Notebook** | [View Notebook](https://github.com/learning-commons-org/evaluators/blob/main/evals/grade_level_evaluator.ipynb) ↗ |

<Note>
  Other configurations will produce different results and may have lower accuracy.
</Note>

## Inputs

| Requirement            | Supported                                             | Required |
| :--------------------- | :---------------------------------------------------- | :------- |
| **Target grade level** | Enables grade context evaluation                      | Yes      |
| **Text type**          | Informational text<br />Optimal length \< 1,200 words | Yes      |

## Output

| Field                  | Description                                                                    |
| :--------------------- | :----------------------------------------------------------------------------- |
| **Grade**              | The text is appropriate for independent reading within this grade band         |
| **Reasoning**          | A synopsis of the reasoning used by the evaluator to arrive at the grade level |
| **Alternative grade**  | The text is appropriate for supported reading (e.g., read-aloud).              |
| **Scaffolding needed** | The supports needed for supported reading (e.g., pre-teaching of vocabulary).  |

## Interpreting results

| Output                                 | What it means                                                                                                                                                        | How to use it                                                                                                                                    |
| :------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------- |
| Grade + Reasoning                      | The grade band where a student can read the text independently, with a breakdown of why — quantitative score, qualitative features, and assumed background knowledge | Validate that your LLM prompts produce grade-appropriate content; aggregate reasoning across runs to diagnose and fix systemic complexity issues |
| Alternative grade + Scaffolding needed | A lower grade band where the text can still work with targeted support                                                                                               | Surface scaffolding suggestions (e.g., vocabulary pre-teaching, read-aloud) to help educators adapt content for mixed classrooms                 |

Use grade and reasoning together to evaluate and improve the complexity of your AI-generated content. Use the alternative grade and scaffolding recommendation together to help educators adapt that content for a wider range of learners.

## Accuracy and validation

<Note>
  This evaluator is provided as Early access. Comprehensive accuracy measures are not yet available. Validation testing is ongoing.
</Note>

We assessed performance against an expert-annotated dataset of Common Core exemplar texts.

| Metric                                                                                                                                        | Result                                                                                                                                                                    |
| :-------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| <Tooltip tip="How accurately the evaluator determines grade level appropriateness compared to expert annotations.">Overall accuracy</Tooltip> | 81% (70 correct out of 86 texts)                                                                                                                                          |
| <Tooltip tip="How the evaluator's accuracy compares to a simple, unrefined grade-level appropriateness prompt.">Baseline comparison</Tooltip> | 58% more accurate than a naive LLM baseline                                                                                                                               |
| Dataset source                                                                                                                                | [CLEAR Corpus](https://www.commonlit.org/blog/introducing-the-clear-corpus-an-open-dataset-to-advance-research-28ff8cfea84a/) ↗ and Common Core Appendix B exemplar texts |

## Evaluator release history

| Date               | Changed        |
| ------------------ | -------------- |
| September 23, 2025 | First release. |
