About literacy evaluators

Early access
This functionality is actively evolving. Changes may occur as we expand capabilities and improve accuracy and reliability. Email support@learningcommons.org ↗ with your feedback or issues.

The problem we’re addressing

The texts we give students matter. Research ↗ shows that students who consistently engage with complex texts are more likely to succeed in college and beyond. The ability to navigate challenging language, unfamiliar vocabulary, layered sentence structures, and nuanced ideas prepares students not just for academic milestones but for the real-world literacy demands of adulthood. The stakes are high. When students are systematically given simplified or “leveled” texts instead of grade-level ones, their growth stalls ↗. They miss the chance to grapple with the language, ideas, and knowledge-building opportunities that fuel literacy development and readiness for college ↗ and career ↗. These inequities fall hardest on historically marginalized students, widening gaps in opportunity and outcomes. Yet despite their importance, complex texts are often absent from classrooms. Quantitative measures of text complexity (like Lexile or Flesch-Kincaid) are useful but limited: they can place Diary of a Wimpy Kid and Fahrenheit 451 in the same band, even though their ideas, themes, and instructional value are worlds apart. What really matters are the qualitative dimensions of text complexity—structure, purpose, vocabulary, knowledge demands, and language features. These signal where students may struggle and where rich instructional opportunities lie. Curricula publishers and assessment creators face mounting pressure to ensure that texts become more rigorous across grades, but accurately evaluating those qualitative features is labor-intensive. And, as AI-generated texts enter classrooms and supplemental resources, companies risk producing content that looks grade-appropriate on the surface but fails to meet the deeper demands of literacy development.

What we’re building

Individual Qualitative Text Complexity rubric (SAP) dimension evaluations

Instead of giving a single complexity score, we can help evaluate generated texts across the specific dimensions that matter:

Text structure
Language features
Vocabulary
Purpose
Knowledge demands

Because they are anchored in the Qualitative Text Complexity rubric (SAP), our evaluators surface actionable insights into why a text may be complex or not complex enough and how to best scaffold it for students. This gives you the fine-grained data you need to ensure quality generated texts.

Expert-annotated benchmark datasets

As part of developing the literacy evaluators, we are building rigorously annotated datasets in collaboration with nationally recognized literacy experts and practitioners. These datasets serve as both the foundation for evaluator accuracy and as a shared resource for the field. By codifying expert judgment on qualitative text complexity, we are creating a replicable, transparent benchmark that can guide product development, support research, and advance best practices.

Current literacy evaluators

Evaluator	Description
Grade Level AppropriatenessEarly access	Checks that the generated text is appropriate for independent reading at the specified grade level. It also provides an alternative grade level for the text and required scaffolds for assisted reading.
Sentence StructureEarly access	Breaks the evaluation of sentence structure into 2 distinct stages, labeling the sentence features first and then assigning a complexity rating.
VocabularyEarly access	Checks the vocabulary complexity of AI-generated texts.
Subject Matter KnowledgeEarly access	Assesses the subject-matter complexity of passages.
ConventionalityEarly access	Assesses how directly a text communicates its meaning.

Understanding evaluators

Getting started

Using evaluators

Literacy evaluators

Datasets

Resources

The problem we’re addressing

What we’re building

Individual Qualitative Text Complexity rubric (SAP) dimension evaluations

Expert-annotated benchmark datasets

Current literacy evaluators

Understanding evaluators

Getting started

Using evaluators

Literacy evaluators

Datasets

Resources

​The problem we’re addressing

​What we’re building

​Individual Qualitative Text Complexity rubric (SAP) dimension evaluations

​Expert-annotated benchmark datasets

​Current literacy evaluators

The problem we’re addressing

What we’re building

Individual Qualitative Text Complexity rubric (SAP) dimension evaluations

Expert-annotated benchmark datasets

Current literacy evaluators