| Language features | |||
|---|---|---|---|
| Conventionality | Vocabulary | Sentence structure | |
| Slightly complex | Explicit, literal, straightforward, easy to understand. | Contemporary, familiar, conversational language. | Mainly simple sentences. |
| Moderately complex | Largely explicit and easy to understand, with some occasions for more complex meaning. | Mostly contemporary, familiar, conversational; rarely overly academic. | Structure: Primarily simple and compound sentences, with some complex structures. |
| Very complex | Fairly complex; contains some abstract, ironic, and/or figurative language. | Fairly complex language that is sometimes unfamiliar, archaic, subject-specific, or overly academic. | Many complex sentences with several subordinate phrases or clauses and transition words. |
| Exceedingly complex | Dense and complex; contains considerable abstract, ironic, and/or figurative language. | Complex, generally unfamiliar, archaic, subject-specific, or overly academic language; may be ambiguous or purposefully misleading. | Mainly complex sentences with several subordinate clauses or phrases and transition words; sentences often contain multiple concepts. |
Creating a machine-readable rubric
The Qualitative Text Complexity rubric (SAP) provides a great starting point, however, the definitions of each dimension leave room for interpretation due to words like: fairly, mostly, and overly. Added to that, there are several things educators think about when they use the rubric that aren’t explicitly specified in the original rubric. For example, educators know that if the content of a text helps a student understand a vocabulary word, that reduces some of the complexity of an unfamiliar word. Through several rounds of co-design with 6 experts from Student Achievement Partners (SAP) ↗ and Achievement Network (ANET) ↗ supporting student literacy development, we created a more human and machine-compatible framing for each level of complexity. This framing allows the evaluator to work consistently and accurately. Adapted from the Qualitative Text Complexity rubric (SAP) ↗ from Student Achievement Partners (SAP)Expert annotation
We then created a full, reliable benchmark dataset based on more than 580 text passages from the CLEAR corpus ↗, which were in turn annotated for sentence structure. The dataset consists of informational topics where the Flesch–Kincaid grade level is lower than 9. This allowed us to capture texts that were approximately appropriate for students in grades 3 and 4, while also including a few more difficult texts with “Very Complex” and “Exceedingly Complex” ratings. The dataset is composed of two parts:- Expert-annotated data:
- ~180 rows were annotated by at least two pedagogical experts from SAP and ANET.
- If the two experts provided different scores, a third expert would also provide a score.
- In total, we worked with eight pedagogical experts from SAP and ANET, all of whom had prior experience in literacy or curriculum development.
- Educator-annotated data:
- ~400 texts were annotated by three educators who had passed a pre-test.
- Each texts are annotated by 3 annotators independently.
- Educators were also given some “honeypot” texts to score. These are texts with 2 or more experts agreeing on the scores, and we used these texts to track each educator’s agreement with experts.
- We used the Dawid-Skene model to calibrate the final score for educators.
- We worked with 21 educators.