Early Release
This evaluator reflects early-stage work. We’re continuously improving its accuracy and reliability.
About the dataset
This dataset provides text complexity annotations of the CLEAR corpus via literacy experts and qualified educators. It is the benchmark data Evaluators use for evaluating literacy levels in AI-generated text. We are sharing it as a new resource for the learning science community to help fill the need for more high-quality text complexity datasets and to complement existing work in this area. The CLEAR (CommonLit Ease of Readability) Corpus was produced by CommonLit in collaboration with Georgia State University and released in December 2021. It comprises nearly 5000 publicly available excerpts, each mapped against dimensions including Flesch Kincaid and BT Easiness (Bradley-Terry coefficient based on teacher ratings of the text). Learning Commons is expanding the dataset by scoring a subset of rows for text complexity dimensions found in the SCASS Rubric for Informational Text from Student Achievement Partners. Our initial release in September 2025 focuses on Grades 3 and 4 across sentence structure and vocabulary, but we plan to expand to all grades and all dimensions of text complexity assessed through the SCASS rubric. Thanks to Student Achievement Partners and to Achievement Network for their contributions in helping us assemble this annotated data.Creating the annotated data
Our process for producing annotated data is as follows:- Filter the CLEAR corpus to an approximate grade 3-4 range using Flesch Kincaid Grade Level.
- Partner with literacy experts from SAP (Student Achievement Partners) and ANet (Achievement Network) to score against text complexity dimensions on the SCASS rubric.
- With SAP and ANet, establish a gold set of ~80 examples per grade with representation across the four tiers of text complexity: slightly, moderately, very, and exceedingly complex.
- Use the gold set to test and qualify a cohort of educators with a minimum of two years of experience teaching ELA at the corresponding grade level.
- Produce a minimum of 200 rows (at 50 per complexity tier), calibrating annotator scores using the Dawid-Skene method.
Our annotation process
Columns in our annotated dataset
The following columns can be found in our dataset: Note that these columns refer to the annotated dataset as of September 23, 2025. This list will be updated as additional dimensions of text complexity are incorporated.- UID: Unique identifier for each row.
- Clear ID: Identifier for texts based on the CLEAR corpus. This is not a unique identifier, as some texts were scored for multiple grades.
- Grade: Grade-level for which the text is scored. For example, if Grade=3 and Sentence Structure Complexity Score Slightly Complex, then the text is Slightly Complex for a third-grade student (see overall project documentation for specific assumptions).
- Flesch Kincaid: Flesch-Kincaid Grade Level score for the text, provided from the CLEAR corpus.
- Text: Text from the CLEAR corpus that was annotated.
- Sentence Score: Overall annotator rating for sentence structure complexity. Takes the values slightly complex, moderately complex, very complex, and exceedingly complex. See the technical docs for additional details on these categories.
- Sentence Score Rationale: Annotators’ explanations for their sentence structure score.
- Vocabulary Score: Overall annotator rating for vocabulary complexity. Takes the values slightly complex, moderately complex, very complex, and exceedingly complex. See overall project documentation for additional details on these categories.
- Vocabulary Score Rationale: Annotators’ explanations for their vocabulary score.
- Tier 2 Words: Tier 2 words identified by annotators.
- Tier 3 Words: Tier 3 words identified by annotators.
- Archaic Words: Archaic words identified by annotators.
- Other Complex Words: Additional complex words for students of the grade level, as identified by annotators.
- Background Knowledge Assumption: LLM-generated information on the background knowledge that students of a particular grade are likely to have about a topic. Information was provided to annotators as part of the scoring process. See the overall project documentation for detailed methodology on how this was generated.
Missing data code
- Not Scored: Text was not annotated for this column.
Definitions
Term | Meaning | Example |
---|---|---|
Tier 2 Words | Words that are commonly used in academic settings and are more complex than colloquial, or everyday language, and often have multiple meanings | For Grade 3 text: “There are eight planets in the Solar System. From closest to farthest from the Sun, they are: Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune.”Most planets (Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranu,s and Neptune) are tier 2 words. |
Tier 3 Words | Words that are limited to a specific domain or that are so rare that an avid reader would likely not encounter them in a lifetime. | Domain-specific example: enzyme Rare unconventional example: abecedarian |
Archaic Words | Words, or uses of words that are not commonly used in modern conversational language. | The jury retired to deliberate on their verdict.” The use of retire to mean withdrawing to a private place is an archaic use. |