Skip to main content

About the dataset

The Literacy dataset provides text-complexity annotations for the CLEAR (CommonLit Ease of Readability) Corpus by literacy experts and qualified educators. It is the benchmark data that Evaluators use to assess literacy levels in AI-generated text. We are sharing it as a new resource for the learning science community to help address the need for more high-quality text-complexity datasets and to complement existing work in this area. The CLEAR Corpus was produced by CommonLit in collaboration with Georgia State University ↗ and released in December 2021. It comprises nearly 5000 publicly available excerpts, each mapped against dimensions including Flesch-Kincaid and BT Easiness (Bradley-Terry coefficient based on teacher ratings of the text). We expanded the dataset by scoring a subset of rows for text complexity dimensions found in Student Achievement Partners’ Qualitative Text Complexity rubric (SAP) ↗. Our initial release in September 2025 focuses on Grades 3 and 4 across sentence structure and vocabulary, but we plan to expand to all grades and all dimensions of text complexity assessed through SAP’s Qualitative Text Complexity rubric. Thanks to Student Achievement Partners and to Achievement Network for their contributions in helping us assemble this annotated data.

Our process

Our process for producing annotated data is as follows:
  • Filter the CLEAR corpus to an approximate grade 3-4 range using Flesch Kincaid Grade Level.
  • Partner with literacy experts from SAP (Student Achievement Partners) and ANet (Achievement Network) to score against text complexity dimensions on SAP’s Qualitative Text Complexity rubric.
  • With SAP and ANet, establish a gold set of ~80 examples per grade with representation across the four tiers of text complexity: slightly, moderately, very, and exceedingly complex.
  • Use the gold set to test and qualify a cohort of educators with at least 2 years of experience teaching ELA at the corresponding grade level.
  • Produce a minimum of 200 rows (at 50 per complexity tier), calibrating annotator scores using the Dawid-Skene method.
flowchart showing a high-level overview of the annotation
process
Finally, we package the dataset by mapping multiple dimensions of text complexity to the clear_id column. This results in an annotated dataset that can be easily merged with the CLEAR corpus.

Columns

Last updated September 23, 2025This list will be updated as we incorporate more text complexity dimensions.
ColumnDefinition
UIDUnique identifier for each row.
Clear IDIdentifier for texts based on the CLEAR corpus. This is not a unique identifier, as some texts were scored for multiple grades.
GradeGrade-level for which the text is scored. For example, if Grade=3 and Sentence Structure Complexity Score Slightly Complex, then the text is Slightly Complex for a third-grade student (see overall project documentation for specific assumptions).
Flesch-KincaidFlesch-Kincaid Grade Level score for the text, provided from the CLEAR corpus.
TextText from the CLEAR corpus that was annotated.
Sentence ScoreOverall annotator rating for sentence structure complexity. Takes the values slightly complex, moderately complex, very complex, and exceedingly complex. See the technical docs for additional details on these categories.
Sentence Score RationaleAnnotators’ explanations for their sentence structure score.
Vocabulary ScoreOverall annotator rating for vocabulary complexity. Takes the values slightly complex, moderately complex, very complex, and exceedingly complex. See overall project documentation for additional details on these categories.
Vocabulary Score RationaleAnnotators’ explanations for their vocabulary score.
Tier 2 WordsWords that are commonly used in academic settings and are more complex than colloquial, or everyday language, and often have multiple meanings

Example: “There are eight planets in the Solar System. From closest to farthest from the Sun, they are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.” Most planets (Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune) are Tier 2 words for Grade 3.
Tier 3 WordsWords that are limited to a specific domain or that are so rare that an avid reader would likely not encounter them in a lifetime

Domain-specific example: enzyme
Rare unconventional example: abecedarian
Archaic WordsWords, or uses of words that are not commonly used in modern conversational language

Example: “The jury retired to deliberate on their verdict.” Using “retire” to mean “withdrawing to a private place” is an archaic use.
Other Complex WordsAdditional complex words for students of the grade level, as identified by annotators.
Background Knowledge AssumptionLLM-generated information on the background knowledge that students of a particular grade are likely to have about a topic. Information was provided to annotators during the scoring process. See the overall project documentation for detailed methodology on how this was generated.
Receiving Not Scored for a given column means that the text was not annotated for that column.

Limitations

  • Annotator coverage per item is limited; reported precision and agreement reflect this coverage.
    • Annotations in future updates will improve statistical reliability by increasing confidence, reducing variance, and stabilizing borderline cases.
  • Flesch–Kincaid grade estimates (based on sentence and word length) are heuristic and do not capture qualitative factors (e.g., conceptual difficulty, vocabulary sophistication, thematic maturity).
    • They are recorded as metadata and may be referenced in the evaluator prompt as one of multiple signals.
    • They are not the sole determinant of evaluator labels.