Vocabulary

Evaluator last updated September 23, 2025.

Overview

The Vocabulary evaluator gives developers fine-grained vocabulary insights that help ensure texts use words that align with grade-level expectations and support growth in academic language:

Estimates the background knowledge that a student at the target grade level is likely to have
Identifies complex words in the text (, , , etc.)
Evaluates overall vocabulary complexity relative to the background knowledge estimate

At a glance


Input type	Informational text
Supported grades	3–12
Rubric	SAP ↗‘s Qualitative Text Complexity Rubric for Informational Text ↗

The evaluator was built and validated using the model and temperature below (other configurations will produce different results and may have lower accuracy):


Model used	GPT-4o (Step 1); Gemini-2.5-pro (Step 2, Grades 3–4); GPT-4.1 (Step 2, Grades 5–12)
Temperature	0

Getting started

Follow the Quickstart to start using this evaluator:

Access method
Evaluators Playground	View in the Learning Commons Platform ↗
SDK	Python ↗ and TypeScript ↗
Python notebook	View in GitHub ↗
Prompts	View in GitHub ↗

Inputs

Input	Description	Required
Target grade level	Enables grade context evaluation	Yes
Text type	Informational text Optimal length: 130-205 words	Yes

Output

Field	Description
Complex words	List of Tier 2, Tier 3, archaic, and other complex words in the text
Complexity score	Slightly complex: Uses everyday, familiar vocabulary with few academic or domain-specific terms. Almost entirely contemporary and conversational Very low proportion of complex words (archaic, subject-specific, academic) Easy to understand and does not impede comprehension of the bulk of the text (1-2 quick pauses for processing by the student may occur) Moderately complex: Includes a mix of familiar and academic vocabulary, with some Tier 2 or Tier 3 terms that may require support. Mostly contemporary and conversational Low proportion of complex words (archaic, subject-specific, academic) Generally allows students to comprehend the bulk of the text with little difficulty, though there may be occasional pauses for clarification (several quick pauses or occasional prolonged pauses may occur) Very complex: Relies heavily on Tier 2 and Tier 3 vocabulary with limited contextual scaffolding. Often unfamiliar, archaic, subject-specific, and/or overly academic Often presents challenges that may slow down comprehension, but does not completely block the comprehension of the bulk of the text Exceedingly complex: Uses dense academic and domain-specific vocabulary that is likely to be inaccessible without significant support. Mostly unfamiliar, archaic, subject-specific, and/or overly academic May be ambiguous or purposefully misleading. Makes comprehension of the bulk of the text very challenging and requires careful effort to interpret
Reasoning	Explanation of the complexity rating in the context of the target grade level.

Accuracy and validation

This evaluator is provided as Early access. Comprehensive accuracy measures are not yet available. Validation testing is ongoing.

We assessed performance against an expert-annotated dataset of 580+ texts from CLEAR Corpus ↗. Accuracy has been most extensively validated on Grades 3–4.

Grade 3-4 accuracy

Metric	Description	Result
Expert agreement	The percentage of evaluated examples where at least one expert agreed with the evaluator’s rating during review testing.	52% against the validation dataset
Baseline comparison	How the evaluator’s accuracy compares to a simple, unrefined prompt.	33% more accurate (relative) than a naive LLM baseline

For more information on our validation process, see Accuracy.

Evaluator release history

Date	Changed
September 23, 2025	First release

Understanding evaluators

Getting started

SDK API Reference

Literacy evaluators

Feedback evaluators

Standards evaluators

Datasets

Resources

Overview

At a glance

Getting started

Inputs

Output

Accuracy and validation

Grade 3-4 accuracy

Evaluator release history

​Overview

​At a glance

​Getting started

​Inputs

​Output

​Accuracy and validation

​Grade 3-4 accuracy

​Evaluator release history

Overview

At a glance

Getting started

Inputs

Output

Accuracy and validation

Grade 3-4 accuracy

Evaluator release history