Early Release

This evaluator reflects early-stage work. We’re continuously improving its accuracy and reliability.
The Vocabulary Evaluator was developed mainly for informational texts for grades 3–4, and its performance should be understood within those parameters.  These limitations reflect the evaluator’s validated testing conditions, known performance boundaries, and intended usage guidelines to support responsible application by partners.
  • The evaluator is calibrated for use within a defined configuration environment. Its performance reliability has not been established outside these boundaries, and use with alternate model settings or prompt formats may lead to inconsistent results.
  • It may not perform reliably for texts aimed at lower or higher grade levels than the intended grades 3-4.
  • The evaluator was tested on passages of roughly ~200 words.  Performance has not been established on significantly longer or shorter passages, or on non-informational genres.
  • This evaluator addresses only the vocabulary dimension of text complexity.
  • The underlying SCASS rubric was modified for machine compatibility. While this improves consistency, it may introduce interpretive drift relative to educator practice.
  • Some variability is inherent in LLM outputs. Each text input should be evaluated 3 times, with outputs aggregated by simple majority vote to reduce LLM variability and inconsistency. 
  • The evaluator is intended for exploratory use only. It is not validated for formal instructional placement, assessment, or other high-stakes educational decisions.
  • Results should be interpreted with human judgment, especially when informing curriculum development, educational interventions, or product design.
  • The evaluator is intended for de-identified, general-purpose text inputs only. Users should not submit student information, real-world personal data, or any content subject to privacy regulations such as FERPA, COPPA, HIPAA, or GDPR.