Language features | |||
---|---|---|---|
Conventionality | Vocabulary | Sentence structure | |
Slightly complex | Explicit, literal, straightforward, easy to understand. | Contemporary, familiar, conversational language. | Mainly simple sentences. |
Moderately complex | Largely explicit and easy to understand, with some occasions for more complex meaning. | Mostly contemporary, familiar, conversational; rarely overly academic. | Structure: Primarily simple and compound sentences, with some complex structures. |
Very complex | Fairly complex; contains some abstract, ironic, and/or figurative language. | Fairly complex language that is sometimes unfamiliar, archaic, subject-specific, or overly academic. | Many complex sentences with several subordinate phrases or clauses and transition words. |
Exceedingly complex | Dense and complex; contains considerable abstract, ironic, and/or figurative language. | Complex, generally unfamiliar, archaic, subject-specific, or overly academic language; may be ambiguous or purposefully misleading. | Mainly complex sentences with several subordinate clauses or phrases and transition words; sentences often contain multiple concepts. |
Creating a machine-readable rubric
The SCASS rubric provides a great starting point, however, the definitions of each dimension leave room for interpretation due to words like: fairly, mostly, and overly. Added to that, there are several things educators think about when they use the rubric that aren’t explicitly specified in the original rubric. For example, educators know that if the content of a text helps a student understand a vocabulary word, that reduces some of the complexity of an unfamiliar word. Through several rounds of co-design with 6 experts from Student Achievement Partners (SAP) and Achievement Network (ANET) supporting student literacy development, we created a more human and machine-compatible framing for each level of complexity. This framing allows the evaluator to work consistently and accurately.1 (Slightly complex) | 2 (Moderately complex) | 3 (Very complex) | 4 (Exceedingly complex) | |
---|---|---|---|---|
Rubric | Vocabulary that is almost entirely not complex: - contemporary, - conversational, and/or - familiar. *** A very low proportion of complex words (archaic, subject-specific, academic) is OK — i.e., doesn’t need to be 0. | Vocabulary that is mostly not complex: - contemporary, - conversational, and/or - familiar. *** A low proportion of complex words (archaic, subject-specific, academic) is OK | Vocabulary that is often complex: - unfamiliar, - archaic, - subject-specific, and/or - overly academic. *** | Vocabulary that is mostly complex: - unfamiliar, - archaic, - subject-specific, and/or - overly academic. *** May be ambiguous or purposefully misleading |
Summary | Overall, vocabulary is easy to understand and does not impede comprehension of the bulk of the text (including main idea and support claims). | Overall, vocabulary generally allows students to comprehend the bulk of the text with little difficulty, though there may be occasional pauses for clarification. | Overall, vocabulary often presents challenges that may slow down comprehension but does not completely block the comprehension of the bulk of the text. | Overall, vocabulary is so complex that it makes comprehension of the bulk of the text very challenging and requires careful effort to interpret. |
Human annotation
We then created a full, reliable benchmark dataset based on more than 580 text passages from the CLEAR corpus, which were in turn annotated for sentence structure. The dataset consists of informational topics where the Flesch–Kincaid grade level is lower than 9. This allowed us to capture texts that were approximately appropriate for students in grades 3 and 4, while also including a few more difficult texts with “Very Complex” and “Exceedingly Complex” ratings. The dataset is composed of two parts:- Expert-annotated data:
- ~180 rows were annotated by at least two pedagogical experts from SAP and ANET.
- If the two experts provided different scores, a third expert would also provide a score.
- In total, we worked with eight pedagogical experts from SAP and ANET, all of whom had prior experience in literacy or curriculum development.
- Educator-annotated data:
- ~400 texts were annotated by three educators who had passed a pre-test.
- Each texts are annotated by 3 annotators independently.
- Educators were also given some “honeypot” texts to score. These are texts with 2 or more experts agreeing on the scores, and we used these texts to track each educator’s agreement with experts.
- We used the Dawid-Skene model to calibrate the final score for educators.
- We worked with 21 educators.