Early Release
This evaluator reflects early-stage work. We’re continuously improving its accuracy and reliability.
- The evaluator is calibrated for use within a defined configuration environment. Its performance reliability has not been established outside these boundaries, and use with alternate model settings or prompt formats may lead to inconsistent results.
- It may not perform reliably for texts aimed at lower or higher grade levels than the intended grades 3-4.
- The evaluator was tested on passages of roughly ~200 words. Performance has not been established on significantly longer or shorter passages, or on non-informational genres.
- This evaluator addresses only the vocabulary dimension of text complexity.
- The underlying SCASS rubric was modified for machine compatibility. While this improves consistency, it may introduce interpretive drift relative to educator practice.
- Some variability is inherent in LLM outputs. Each text input should be evaluated 3 times, with outputs aggregated by simple majority vote to reduce LLM variability and inconsistency.
- The evaluator is intended for exploratory use only. It is not validated for formal instructional placement, assessment, or other high-stakes educational decisions.
- Results should be interpreted with human judgment, especially when informing curriculum development, educational interventions, or product design.
- The evaluator is intended for de-identified, general-purpose text inputs only. Users should not submit student information, real-world personal data, or any content subject to privacy regulations such as FERPA, COPPA, HIPAA, or GDPR.