Early Release

This evaluator reflects early-stage work. We’re continuously improving its accuracy and reliability.

Original SCASS qualitative sentence structure rubric

We started with the SCASS rubric and modified it to be more usable for annotation. Then, after data collection, we performed extensive statistical analysis to identify the key sentence features that impact annotators’ scores. This resulted in a machine-compatible rubric, which we used to build our final evaluator.
Language features
ConventionalityVocabularySentence structure
Slightly complexExplicit, literal, straightforward, easy to understand.Contemporary, familiar, conversational language.Mainly simple sentences.
Moderately complexLargely explicit and easy to understand, with some occasions for more complex meaning.Mostly contemporary, familiar, conversational; rarely overly academic.Structure: Primarily simple and compound sentences, with some complex structures.
Very complexFairly complex; contains some abstract, ironic, and/or figurative language.Fairly complex language that is sometimes unfamiliar, archaic, subject-specific, or overly academic.Many complex sentences with several subordinate phrases or clauses and transition words.
Exceedingly complexDense and complex; contains considerable abstract, ironic, and/or figurative language.Complex, generally unfamiliar, archaic, subject-specific, or overly academic language; may be ambiguous or purposefully misleading.Mainly complex sentences with several subordinate clauses or phrases and transition words; sentences often contain multiple concepts.
Taken from the SCASS rubric for qualitative text complexity from Student Achievement Partners (SAP).

Modified SCASS rubric with assumptions for annotation

With substantial input from experts, we created an adapted rubric with accompanying assumptions for annotators to use. Annotators provided their scores based on this rubric.

Modified rubric

Slightly complexModerately complexVery complexExceedingly complex
Mainly simple sentences, few sentences contain multiple concepts.Primarily, simple and compound sentences, with some complex (or compound-complex) constructions, some sentences contain multiple concepts.Many complex (or compound-complex) constructions with several subordinate phrases or clauses and transition words; many sentences contain multiple concepts.Mainly complex (or compound-complex) constructions with several subordinate clauses, phrases, and transition words; most sentences contain multiple concepts.

Annotation assumptions

Student assumptionText assumptionWhat to score vs ignoreSentence structure complexity is relative
The student is on grade level and proficient in all core content areas, including reading fluency, comprehension, science, & social studies. The student is moving through a common progression of topics. The student is fluent in English. The student is in the middle of the academic year.The text is for independent reading/work, without direct instruction.When scoring Sentence Structure complexity, please ignore all other text features such as vocabulary, background knowledge, topic, etc. A text may be less readable for a student because of vocabulary and background knowledge, but still be composed of mostly simple sentences. In this case, the sentence structure is still Slightly Complex.Sentence Structure complexity is relative to the grade level of the student.

Human annotation

We created a full, reliable benchmark dataset based on 500+ text passages from the CLEAR corpus, which were in turn annotated for sentence structure. The dataset consists of informational topics where the Flesch–Kincaid grade level is lower than 9. This allowed us to capture texts that were appropriate for students in grades 3 and 4, while also including a few more difficult texts with “Very Complex” and “Exceedingly Complex” ratings. The dataset is composed of two parts:
  • Expert-annotated data
    • ~80 texts (160 rows of data) were annotated by at least two pedagogical experts from SAP and ANET.
    • If the two experts provided different scores, a third expert would also provide a score.
    • In total, we worked with eight pedagogical experts from SAP and ANET, all of whom had prior experience in literacy or curriculum development.
  • Educator-annotated data
    • ~400 texts (800 rows of data) were annotated by three educators who had passed a pre-test.
    • Educators were also given some “honeypot” texts to score. These are texts with 2+ expert agreement in scores, and we used these texts to track each educator’s agreement with experts.
    • We used the Dawid-Skene model to calibrate the final score for educators.
    • In total, we worked with 21 educators.

Machine-compatible rubric development

After we finalized the benchmark dataset, we conducted extensive data analysis:
  • We calculated the F-statistic of 30+ sentence features, which allowed us to identify the most important sentence features for a text’s sentence structure complexity
  • We used tree-based models to identify the thresholds that made a text fall within a particular category (e.g., average words per sentence < 12 for “Slightly Complex”).
This led to our development of a machine-compatible rubric.
Grade LevelSlightly complexModerately complexVery complexExceedingly complex
Grade 3The text consists of simple, straightforward language and sentence structures. The text is likely slightly complex if it meets at least two of the following criteria:

Sentence type: Primarily simple sentences (typically > 60% simple sentences).

Sentence length: Short sentences (typically <12 average words per sentence).

Subordination: Very low use of subordinate clauses (typically <25% of sentences have subordinate clauses).
The text shows a mix of simple and more complex sentences, introducing some variety in structure without being overly demanding. If the text is not slightly complex, then consider if it is moderately complex based on the following ranges:

Sentence type: Balanced mix of sentence types (typically between 40 to 60% simple sentences).

Sentence length: Medium-length sentences (typically between 12 and 16 average words per sentence).

Subordination: Moderate use of subordinate clauses (typically 25 to 45% of sentences have subordinate clauses).
The text features more elaborate sentences with multiple clauses and ideas, requiring more effort from the reader to parse. Consider if a text is very complex based on the following rates:

Sentence type: Most sentences are complex (<40% of sentences are simple sentences).

Sentence length: Longer sentences (typically between 16 to 19 average words per sentence).

Subordination: High use of subordinate clauses (typically >45% of sentences have subordinate clauses).
The text is dense with very long, intricate sentences and a high degree of subordination, making it exceptionally challenging for this grade level. The text is likely exceedingly complex if it meets at least two of the following criteria, including at least one from the structural density group:

Structural density:

Subordination: >50% of sentences have subordinate clauses.

Multiple subordination: >12% of sentences have more than one subordinate clause.

Syntactic complexity: >15% of students are compound-complex.

Length:

Sentence length: Very long sentence length (typically >19 average words per sentence).

High concentration of very long sentences: >15% of sentences have >=30 words.
Grade 4The text consists of simple, straightforward language and sentence structures. The text is likely slightly complex if it meets at least two of the following criteria:

Sentence type: Primarily simple sentences (typically > 55% simple sentences).

Sentence length: Short to medium sentences (typically <13 average words per sentence).

Subordination: Very low use of subordinate clauses (typically <30% of sentences have subordinate clauses).
The text shows a mix of simple and more complex sentences, introducing some variety in structure without being overly demanding. If the text is not slightly complex, then consider if it is moderately complex based on the following ranges:

Sentence type: Balanced mix of sentence types (typically between 40 to 55% simple sentences).

Sentence length: Medium length sentences (typically between 13 to 17 average words per sentence).

Subordination: Moderate use of subordinate clauses (typically 30 to 50% of sentences have subordinate clauses).
The text features more elaborate sentences with multiple clauses and ideas, requiring more effort from the reader to parse. Consider if a text is very complex based on the following rates:

Sentence type: Most sentences are complex (<40% of sentences are simple sentences).

Sentence length: Longer sentences (typically between 17 to 22 average words per sentence).

Subordination: High use of subordinate clauses (typically >50% of sentences have subordinate clauses).

Multiple subordination: >8% of sentences have more than one subordinate clause.
The text is dense with very long, intricate sentences and a high degree of subordination, making it exceptionally challenging for this grade level. The text is likely exceedingly complex if it meets at least two of the following criteria, including at least one from the structural density group:

Structural density:

Subordination: >60% of sentences have subordinate clauses

Multiple subordination: >15% of sentences have more than one subordinate clause

Syntactic complexity: >20% of students are compound-complex

Length:

Sentence length: Very long sentence length (typically >22 average words per sentence)

High concentration of very long sentences: >15% of sentences have >=30 words

Evaluator creation

Finally, we developed the Sentence Structure Evaluator. We tested different permutations of prompts, models, and temperature settings, looking for patterns in errors, and tuning our approach. In total, we ran over 150 experiments across 4 models, more than 20 prompts, and a range of temperature settings to define this evaluator.