Early Release

This evaluator reflects early-stage work. We’re continuously improving its accuracy and reliability.
The Grade Level Appropriateness Evaluator was developed specifically for AI-generated informational texts for grades K–11, and its performance should be understood within those parameters. These limitations reflect the evaluator’s validated testing conditions, known performance boundaries, and intended usage guidelines.
  • The evaluator is calibrated for use within a defined configuration environment. Its performance reliability has not been established outside these boundaries, and use with alternate model settings or prompt formats may lead to inconsistent results.
  • It may not perform reliably for texts aimed at lower or higher grade levels than the intended grades K–11.
  • The GLA validation dataset is drawn from Common Core Appendix B, with more samples in Grade 2 onwards. As a result, performance estimates may be more precise where the sample sizes are larger.
  • The evaluator was tested on Common Core Appendix B exemplars, with additional expert validation on CLEAR Corpus texts. Performance has not been formally tested on longer passages of over 1,200 words or on texts outside the above-mentioned datasets.
  • Some variability is inherent in LLM outputs. Occasional inconsistencies in sentence labeling or complexity scoring may occur between runs.
  • The evaluator is intended for exploratory use only. It is not validated for formal instructional placement, assessment, or other high-stakes educational decisions.
  • Results should be interpreted with human judgment, especially when informing curriculum development, educational interventions, or product design.
  • The evaluator is intended for de-identified, general-purpose text inputs only. Users should not submit student information, personally identifiable data, or any content subject to privacy regulations such as FERPA, COPPA, HIPAA, or GDPR.