> ## Documentation Index
> Fetch the complete documentation index at: https://docs.learningcommons.org/llms.txt
> Use this file to discover all available pages before exploring further.

# About this evaluator

> Reference documentation for the Vocabulary Evaluator.

export const EvalGreenLgBadge = ({children}) => <div style={{
  marginBottom: "8px"
}}>
    <Badge color="gray" size="xs">
      {children != null && children !== "" ? children : null}
    </Badge>
  </div>;

export const EarlyAccessCallout = ({children}) => <div className="eyebrow-callout not-prose rounded-xl border border-gray-200/80 p-5 dark:border-white/10" style={{
  marginBottom: "1rem",
  borderRadius: "4px"
}}>
    <div className="mb-3">
      <Badge color="green" size="md" icon="flask">
        Early access
      </Badge>
    </div>
    <div className="callout-body text-[15px] leading-relaxed text-gray-700 dark:text-gray-300">{children}</div>
    <style>{`.callout-body a { text-decoration: underline; text-decoration-color: #178251; }`}</style>
  </div>;

[Evaluator last updated September 23, 2025.](#evaluator-release-history)

<EarlyAccessCallout>
  This functionality is actively evolving. Changes may occur as we expand capabilities and improve accuracy and reliability. Email [support@learningcommons.org](mailto:support@learningcommons.org) ↗ with your feedback or issues.
</EarlyAccessCallout>

## At a glance

|                      |                    |
| :------------------- | :----------------- |
| **Input type**       | Informational text |
| **Supported grades** | 3–12               |

This evaluator gives developers fine-grained vocabulary insights that help ensure texts use words that align with grade-level expectations and support growth in academic language. It:

* Estimates the background knowledge a student at the target grade level is likely to have.
* Identifies complex words in the text (<Tooltip tip="General academic words that appear across subject areas, more common in writing than speech (e.g., vary, factors, determine). Students may encounter them but are unlikely to have fully mastered them at lower grade levels.">Tier 2</Tooltip>, <Tooltip tip="Domain-specific or subject-matter words that are rare outside a particular field (e.g., precipitation, latitude, equator). Students are unlikely to know these without prior instruction in that subject.">Tier 3</Tooltip>, <Tooltip tip="Words that are outdated or no longer in common use in contemporary writing, which may be unfamiliar to students regardless of grade level.">archaic</Tooltip>, and other complex words). Evaluates overall vocabulary complexity relative to that background knowledge estimate.

## Model and prompt

For instructions on running the evaluator, see [Running an evaluator](/evaluators/using-evaluators/running-evaluators).

This evaluator runs as a two-step process. Each step uses a different model.

| Step 1: Background knowledge |        |
| ---------------------------- | ------ |
| Model used                   | GPT-4o |
| Temperature                  | 0      |

| Step 2: Vocabulary complexity |                                                                                                                  |
| :---------------------------- | :--------------------------------------------------------------------------------------------------------------- |
| Model used (Grades 3–4)       | Gemini-2.5-pro                                                                                                   |
| Model used (Grades 5–12)      | GPT-4.1                                                                                                          |
| Temperature                   | 0                                                                                                                |
| **Prompts**                   | [View prompts](https://github.com/learning-commons-org/evaluators/tree/main/evals/prompts/vocabulary) ↗          |
| **Notebook**                  | [View notebook](https://github.com/learning-commons-org/evaluators/blob/main/evals/vocabulary_evaluator.ipynb) ↗ |

<Note>
  Other configurations will produce different results and may have lower accuracy.
</Note>

## Inputs

| Requirement            | Supported                                             | Required |
| :--------------------- | :---------------------------------------------------- | :------- |
| **Target grade level** | Enables grade context evaluation                      | Yes      |
| **Text type**          | Informational text<br />Optimal length: 130-205 words | Yes      |

## Output

| Field            | Description                                                                                                     |
| :--------------- | :-------------------------------------------------------------------------------------------------------------- |
| Complex words    | List of Tier 2, Tier 3, archaic, and other complex words in the text.                                           |
| Complexity score | Vocabulary complexity level based on the [rubric](/evaluators/literacy-evaluators/vocabulary-evaluator/rubric). |
| Reasoning        | Explanation of the complexity rating in the context of the target grade level.                                  |

## Interpreting results

This evaluator returns one of the following ratings, along with a list of complex words and reasoning for you to use to determine your best course of action. Complexity ratings are relative to the target grade level you provide.

| Rating              | Meaning                                                                                                                    |
| :------------------ | :------------------------------------------------------------------------------------------------------------------------- |
| Slightly complex    | The text uses everyday, familiar vocabulary with few academic or domain-specific terms.                                    |
| Moderately complex  | The text includes a mix of familiar and academic vocabulary, with some Tier 2 or Tier 3 terms that may require support.    |
| Very complex        | The text relies heavily on Tier 2 and Tier 3 vocabulary with limited contextual scaffolding.                               |
| Exceedingly complex | The text uses dense academic and domain-specific vocabulary that is likely to be inaccessible without significant support. |

## Accuracy and validation

<Note>
  This evaluator is provided as Early access. \
  Comprehensive accuracy measures are not yet available. Validation testing is ongoing.
</Note>

Accuracy has been most extensively validated on Grades 3–4. We assessed performance against an expert-annotated dataset of 580+ texts. For more information, see [Accuracy](/evaluators/using-evaluators/accuracy).

### Grade 3-4 accuracy

| Metric                                                                                                                                                             | Result                                                                                                                          |
| :----------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------ |
| <Tooltip tip="The percentage of evaluated examples where at least one expert agreed with the evaluator's rating during review testing.">Expert agreement</Tooltip> | 52% against the validation dataset                                                                                              |
| <Tooltip tip="How the evaluator's accuracy compares to a simple, unrefined prompt.">Baseline comparison</Tooltip>                                                  | 33% more accurate (relative) than a naive LLM baseline                                                                          |
| Dataset source                                                                                                                                                     | [CLEAR Corpus](https://www.commonlit.org/blog/introducing-the-clear-corpus-an-open-dataset-to-advance-research-28ff8cfea84a/) ↗ |

## Evaluator release history

| Date               | Changed        |
| ------------------ | -------------- |
| September 23, 2025 | First release. |
