Literacy

About the dataset

The Literacy dataset contains high-quality text complexity annotations for the CommonLit Ease of Readability (CLEAR) Corpus by literacy and education experts. The Corpus was produced by CommonLit in collaboration with Georgia State University ↗ and released in December 2021. It comprises nearly 5000 publicly available excerpts, each mapped against dimensions including Flesch-Kincaid and Easiness (Bradley-Terry coefficient based on teacher ratings of the text). We expanded the dataset by scoring a subset of rows for dimensions in Student Achievement Partners ()‘s Qualitative Text Complexity Rubric for Informational Text ↗. Our Literacy evaluators use this dataset as a benchmark when assessing AI-generated content. We hope edtech developers can use it to complement their own literacy work as well.

Our initial release in September 2025 focuses on Grades 3 and 4 across sentence structure and vocabulary, but we plan to expand to all grades and all dimensions of text complexity assessed through ’s Qualitative Text Complexity rubric.

Our process

Our process for producing annotated data is as follows:

Filter the corpus to an approximate grade 3-4 range using Flesch Kincaid Grade Level.
Partner with literacy experts from and Achievement Network () to score against dimensions in SAP’s Rubric.
With SAP and ANet, establish a gold set of ~80 examples per grade with representation across the 4 tiers of text complexity (Slightly, Moderately, Very, and Exceedingly complex)
Use the gold set to test and qualify a cohort of educators with 2+ years of experience teaching English Language Arts () at the corresponding grade level.
Produce 200+ rows (50+ per complexity tier), calibrating annotator scores using the Dawid-Skene method.

flowchart showing a high-level overview of the annotation
process

Finally, we package the dataset by mapping multiple dimensions of text complexity to the clear_id column. This results in an annotated dataset that can be easily merged with the corpus.

Columns

Last updated September 23, 2025This list will be updated as we incorporate more text complexity dimensions.

Column	Definition
UID	Unique identifier for each row.
Clear ID	Identifier for texts based on the corpus. This is not a unique identifier, as some texts were scored for multiple grades.
Grade	Grade-level for which the text is scored. For example, if Grade=3 and Sentence Structure Complexity Score Slightly Complex, then the text is Slightly Complex for a third-grade student (see overall project documentation for specific assumptions).
Flesch-Kincaid	Flesch-Kincaid Grade Level score for the text, provided from the corpus.
Text	Text from the corpus that was annotated.
Sentence Score	Overall annotator rating for sentence structure complexity. Takes the values slightly complex, moderately complex, very complex, and exceedingly complex. See the technical docs for additional details on these categories.
Sentence Score Rationale	Annotators’ explanations for their sentence structure score.
Vocabulary Score	Overall annotator rating for vocabulary complexity. Takes the values slightly complex, moderately complex, very complex, and exceedingly complex. See overall project documentation for additional details on these categories.
Vocabulary Score Rationale	Annotators’ explanations for their vocabulary score.
Tier 2 Words	Words that are commonly used in academic settings and are more complex than colloquial, or everyday language, and often have multiple meanings Example: “There are eight planets in the Solar System. From closest to farthest from the Sun, they are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.” Most planets (Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune) are Tier 2 words for Grade 3.
Tier 3 Words	Words that are limited to a specific domain or that are so rare that an avid reader would likely not encounter them in a lifetime Domain-specific example: enzyme Rare unconventional example: abecedarian
Archaic Words	Words, or uses of words that are not commonly used in modern conversational language Example: “The jury retired to deliberate on their verdict.” Using “retire” to mean “withdrawing to a private place” is an archaic use.
Other Complex Words	Additional complex words for students of the grade level, as identified by annotators.
Background Knowledge Assumption	LLM-generated information on the background knowledge that students of a particular grade are likely to have about a topic. Information was provided to annotators during the scoring process. See the overall project documentation for detailed methodology on how this was generated.

Receiving Not Scored for a given column means that the text was not annotated for that column.

Limitations

Annotator coverage per item is limited; reported precision and agreement reflect this coverage.
- Annotations in future updates will improve statistical reliability by increasing confidence, reducing variance, and stabilizing borderline cases.
Flesch–Kincaid grade estimates (based on sentence and word length) are heuristic and do not capture qualitative factors (e.g., conceptual difficulty, vocabulary sophistication, thematic maturity).
- They are recorded as metadata and may be referenced in the evaluator prompt as one of multiple signals.
- They are not the sole determinant of evaluator labels.

Understanding evaluators

Getting started

SDK API Reference

Literacy evaluators

Feedback evaluators

Standards evaluators

Datasets

Resources

About the dataset

Our process

Columns

Limitations

​About the dataset

​Our process

​Columns

​Limitations

About the dataset

Our process

Columns

Limitations