Skip to main content
Evaluator last updated September 23, 2025.

At a glance

Input typeInformational text
Supported grades3–12
This evaluator gives developers fine-grained vocabulary insights that help ensure texts use words that align with grade-level expectations and support growth in academic language. It:
  • Estimates the background knowledge a student at the target grade level is likely to have.
  • Identifies complex words in the text (, , , and other complex words). Evaluates overall vocabulary complexity relative to that background knowledge estimate.

Model and prompt

For instructions on running the evaluator, see Quickstart. This evaluator runs as a two-step process. Each step uses a different model.
Step 1: Background knowledge
Model usedGPT-4o
Temperature0
Step 2: Vocabulary complexity
Model used (Grades 3–4)Gemini-2.5-pro
Model used (Grades 5–12)GPT-4.1
Temperature0
PromptsView prompts ↗
NotebookView notebook ↗
Other configurations will produce different results and may have lower accuracy.

Inputs

RequirementSupportedRequired
Target grade levelEnables grade context evaluationYes
Text typeInformational text
Optimal length: 130-205 words
Yes

Output

FieldDescription
Complex wordsList of Tier 2, Tier 3, archaic, and other complex words in the text.
Complexity scoreVocabulary complexity level based on the rubric.
ReasoningExplanation of the complexity rating in the context of the target grade level.

Interpreting results

This evaluator returns one of the following ratings, along with a list of complex words and reasoning for you to use to determine your best course of action. Complexity ratings are relative to the target grade level you provide.
RatingMeaning
Slightly complexThe text uses everyday, familiar vocabulary with few academic or domain-specific terms.
Moderately complexThe text includes a mix of familiar and academic vocabulary, with some Tier 2 or Tier 3 terms that may require support.
Very complexThe text relies heavily on Tier 2 and Tier 3 vocabulary with limited contextual scaffolding.
Exceedingly complexThe text uses dense academic and domain-specific vocabulary that is likely to be inaccessible without significant support.

Rubric

Qualitative Text Complexity rubric (SAP)

Slightly complexModerately complexVery complexExceedingly complex
Vocabulary that is almost entirely not complex: - contemporary - conversational, and/or - familiar.

A very low proportion of complex words (archaic, subject-specific, academic) is OK — i.e., doesn’t need to be 0.
Vocabulary that is mostly not complex: - contemporary - conversational, and/or - familiar.

A low proportion of complex words (archaic, subject-specific, academic) is OK.
Vocabulary that is often complex: - unfamiliar - archaic - subject-specific, and/or - overly academic.Vocabulary that is mostly complex: - unfamiliar - archaic - subject-specific, and/or - overly academic .

May be ambiguous or purposefully misleading.

Summary (final score)

Based on SAP ↗‘s Qualitative Text Complexity Rubric for Informational Text ↗.
Slightly complexModerately complexVery complexExceedingly complex
Overall, vocabulary is easy to understand and does not impede comprehension of the bulk of the text (including main idea and supporting claims). 1-2 quick pauses for processing by the student are ok here.Overall, vocabulary generally allows students to comprehend the bulk of the text with little difficulty, though there may be occasional pauses for clarification. Several quick pauses or occasional prolonged pauses may occur.Overall, vocabulary often presents challenges that may slow down comprehension, but does not completely block the comprehension of the bulk of the text.Overall, vocabulary is so complex that it makes comprehension of the bulk of the text very challenging and requires careful effort to interpret.

Accuracy and validation

This evaluator is provided as Early access. Comprehensive accuracy measures are not yet available. Validation testing is ongoing.
Accuracy has been most extensively validated on Grades 3–4. We assessed performance against an expert-annotated dataset of 580+ texts. For more information, see Accuracy.

Grade 3-4 accuracy

MetricResult
52% against the validation dataset
33% more accurate (relative) than a naive LLM baseline
Dataset sourceCLEAR Corpus ↗

Evaluator release history

DateChanged
September 23, 2025First release.