Early Release
This evaluator reflects early-stage work. We’re continuously improving its accuracy and reliability.
- The evaluator is calibrated for use within a defined configuration environment. Its performance reliability has not been established outside these boundaries, and use with alternate model settings or prompt formats may lead to inconsistent results.
- It may not perform reliably for texts aimed at lower or higher grade levels than the intended grades K–11.
- The GLA validation dataset is drawn from Common Core Appendix B, with more samples in Grade 2 onwards. As a result, performance estimates may be more precise where the sample sizes are larger.
- The evaluator was tested on Common Core Appendix B exemplars, with additional expert validation on CLEAR Corpus texts. Performance has not been formally tested on longer passages of over 1,200 words or on texts outside the above-mentioned datasets.
- Some variability is inherent in LLM outputs. Occasional inconsistencies in sentence labeling or complexity scoring may occur between runs.
- The evaluator is intended for exploratory use only. It is not validated for formal instructional placement, assessment, or other high-stakes educational decisions.
- Results should be interpreted with human judgment, especially when informing curriculum development, educational interventions, or product design.
- The evaluator is intended for de-identified, general-purpose text inputs only. Users should not submit student information, personally identifiable data, or any content subject to privacy regulations such as FERPA, COPPA, HIPAA, or GDPR.