Skip to main content

Early access
This functionality is actively evolving. Changes may occur as we expand capabilities and improve accuracy and reliability. Email support@learningcommons.org with your feedback or issues.

At a glance

Input typeInformational text
Supported gradesK–12
Most recent version1.0.0
The Grade Level Appropriateness Evaluator assesses whether AI-generated text is suitable for independent reading at a specified grade band. The evaluator considers:
  • Flesch-Kincaid grade level
  • Word count
  • Text structure (complexity of organization, connections between ideas, role of text features etc.)
  • Language features (vocabulary, sentence complexity, use of figurative or abstract language)
  • Purpose (how explicitly stated and how concrete or abstract)
  • Knowledge demands (discipline-specific knowledge required, references, and allusions)
  • Student background knowledge (what a student at a given grade level would already know)

Model and prompt

For instructions on running the evaluator, see Running an evaluator.
Model usedGemini-2.5-pro (gemini-2.5-pro-preview-06-05)
Temperature0.25
PromptsView prompts
Python NotebookView Notebook
Other configurations will produce different results and may have lower accuracy.

Inputs

RequirementSupportedRequired
Target grade levelEnables grade context evaluationYes
Text typeInformational text
Optimal length < 1,200 words
Yes

Output

FieldDescription
GradeThe text is appropriate for independent reading within this grade band
ReasoningA synopsis of the reasoning used by the evaluator to arrive at the grade level
Alternative gradeThe text is appropriate for supported reading (e.g., read-aloud).
Scaffolding neededThe supports needed for supported reading (e.g., pre-teaching of vocabulary).

Interpreting results

OutputWhat it meansHow to use it
Grade + ReasoningThe grade band where a student can read the text independently, with a breakdown of why — quantitative score, qualitative features, and assumed background knowledgeValidate that your LLM prompts produce grade-appropriate content; aggregate reasoning across runs to diagnose and fix systemic complexity issues
Alternative grade + Scaffolding neededA lower grade band where the text can still work with targeted supportSurface scaffolding suggestions (e.g., vocabulary pre-teaching, read-aloud) to help educators adapt content for mixed classrooms
Use grade and reasoning together to evaluate and improve the complexity of your AI-generated content. Use the alternative grade and scaffolding recommendation together to help educators adapt that content for a wider range of learners.

Accuracy and validation

This evaluator is provided as Early access. Comprehensive accuracy measures are not yet available. Validation testing is ongoing.
We assessed performance against an expert-annotated dataset of Common Core exemplar texts.
MetricResult
81% (70 correct out of 86 texts)
58% more accurate than a naive LLM baseline
Dataset sourceCLEAR Corpus and Common Core Appendix B exemplar texts

Version history