Limitations

Early Release

This evaluator reflects early-stage work. We’re continuously improving its accuracy and reliability.

The Sentence Structure evaluator was developed specifically for informational texts for grades 3–4, and its performance should be understood within those parameters. These limitations reflect the evaluator’s validated testing conditions, known performance boundaries, and intended usage guidelines.

The evaluator is calibrated for use within a defined configuration environment. Its performance reliability has not been established outside these boundaries, and use with alternate model settings or prompt formats may lead to inconsistent results.
It may not perform reliably for texts aimed at lower or higher grade levels than the intended grades 3-4.
Validation focused on informational passages ~100-200 words; performance estimates are most precise within this range. Performance has not been formally validated on very short (under 100 words) or longer (over 200 words) texts.
This evaluator addresses only the sentence structure dimension of text complexity.
Some variability is inherent in LLM outputs. Occasional inconsistencies in labeling or scoring may occur across runs.
The evaluator is intended for exploratory use only. It is not validated for formal instructional placement, assessment, or other high-stakes educational decisions.
Results should be interpreted with human judgment, especially when informing curriculum development, educational interventions, or product design.
The evaluator is intended for deidentified, general-purpose text inputs only. Users should not submit student information, real-world personal data, or any content subject to privacy regulations such as FERPA, COPPA, HIPAA, or GDPR.

Understanding Evaluators

Getting Started

Literacy Evaluators

Datasets

Resources

Early Release