Skip to main content

Early Release

This evaluator reflects early-stage work. We’re continuously improving its accuracy and reliability.

The problem we’re addressing

The texts we give students matter. Research shows that students who consistently engage with complex texts are more likely to succeed in college and beyond. The ability to navigate challenging language, unfamiliar vocabulary, layered sentence structures, and nuanced ideas prepares students not just for academic milestones but for the real-world literacy demands of adulthood. The stakes are high. When students are systematically given simplified or “leveled” texts instead of grade-level ones, their growth stalls. They miss the chance to grapple with the language, ideas, and knowledge-building opportunities that fuel literacy development and readiness for college and career. These inequities fall hardest on historically marginalized students, widening gaps in opportunity and outcomes. Yet despite their importance, complex texts are often absent from classrooms. Quantitative measures of text complexity (like Lexile or Flesch-Kincaid) are useful but limited: they can place Diary of a Wimpy Kid and Fahrenheit 451 in the same band, even though their ideas, themes, and instructional value are worlds apart. What really matters are the qualitative dimensions of text complexity—structure, purpose, vocabulary, knowledge demands, and language features. These signal where students may struggle and where rich instructional opportunities lie. Curricula publishers and assessment creators face mounting pressure to ensure that texts progress in rigor across grades, but accurately evaluating those qualitative features is labor-intensive. And, as AI-generated texts enter classrooms and supplemental resources, companies risk producing content that looks grade-appropriate on the surface but fails to meet the deeper demands of literacy development.

What we’re building

Individual SCASS rubric dimension evaluations

Instead of giving a single complexity score, we can help evaluate generated texts across the specific dimensions that matter:
  • Text structure
  • Language features
  • Vocabulary
  • Purpose
  • Knowledge demands
Because they are anchored in the SCASS rubric, our evaluators surface actionable insights into why a text may be complex or not complex enough and how to best scaffold it for students. This gives you the fine-grained data you need to ensure quality generated texts.

Expert-annotated benchmark datasets

As part of developing the literacy evaluators, we are building rigorously annotated datasets in collaboration with nationally recognized literacy experts and practitioners. These datasets serve as both the foundation for evaluator accuracy and as a shared resource for the field. By codifying expert judgment on qualitative text complexity, we are creating a replicable, transparent benchmark that can guide product development, support research, and advance best practices.

Current literacy evaluators

EvaluatorDescription
Grade level appropriatenessThis evaluator checks that the generated text is appropriate for independent reading at the specified grade level. It also provides an alternative grade level for the text and required scaffolds for assisted reading.
Sentence structureThis evaluator breaks the evaluation of sentence structure into 2 distinct stages, labeling the sentences first and then assigning a complexity rating.
VocabularyThis evaluator checks the vocabulary complexity of AI-generated texts.