> ## Documentation Index
> Fetch the complete documentation index at: https://docs.learningcommons.org/llms.txt
> Use this file to discover all available pages before exploring further.

# Math Alignment

> Reference documentation for the Math Alignment evaluator.

export const EarlyAccessCallout = ({children}) => <div className="eyebrow-callout not-prose rounded-xl border border-gray-200/80 p-5 dark:border-white/10" style={{
  marginBottom: "1rem",
  borderRadius: "4px"
}}>
    <div className="mb-3">
      <Badge color="green" size="md" icon="flask">
        Early access
      </Badge>
    </div>
    <div className="callout-body text-[15px] leading-relaxed text-gray-700 dark:text-gray-300">{children}</div>
    <style>{`.callout-body a { text-decoration: underline; text-decoration-color: #178251; }`}</style>
  </div>;

[Evaluator last updated June 22, 2026.](#evaluator-release-history)

<EarlyAccessCallout>
  This functionality is actively evolving. Changes may occur as we expand
  capabilities and improve accuracy and reliability. Email
  [support@learningcommons.org](mailto:support@learningcommons.org) ↗ with your
  feedback or issues.
</EarlyAccessCallout>

The Math Alignment evaluator checks a math question against a standard's individual [learning components](/knowledge-graph/graph-reference/learning-components) — not just against the standard's label.

It reports which of a standard's components the question actually measures.

<Note>
  The Math Alignment evaluator judges whether a question is the *right math* for
  a standard. The Math Visual Correctness evaluator (coming soon), judges
  whether a math visual is *mathematically correct*.
</Note>

## At a glance

|                       |                                          |
| --------------------- | ---------------------------------------- |
| **Input type**        | Math assessment question                 |
| **Supported subject** | Math only                                |
| **Supported grades**  | K–12                                     |
| **Jurisdictions**     | Multi-State/CCSS, all 50 states, plus DC |

## Model and prompt

For instructions on running the evaluator, see [Quickstart](/evaluators/getting-started/quickstart).

|                 |                                                                                                                        |
| :-------------- | :--------------------------------------------------------------------------------------------------------------------- |
| **Model used**  | claude-haiku-4-5-20251001                                                                                              |
| **Temperature** | 0 (fixed)                                                                                                              |
| **Prompts**     | [View prompts](https://github.com/learning-commons-org/evaluators/tree/main/evals/standards/math-question-alignment) ↗ |

The prompt was optimized with [GEPA (Genetic-Pareto)](https://gepa-ai.github.io/gepa/) ↗ via the [DSPy framework](https://dspy.ai/) ↗, and is tuned for the model and temperature listed above – results may vary with other models or parameters.

## Inputs

<Note>
  Inputs must be de-identified. Do not submit student PII or any regulated or
  sensitive personal information.
</Note>

| Input             | Description                                                   | Required |
| :---------------- | :------------------------------------------------------------ | :------- |
| **Math question** | Full assessment prompt/instructions students see, K–12 math   | Yes      |
| **Jurisdiction**  | Multi-State/CCSS, a state adoption, or DC                     | Yes      |
| **Standard(s)**   | One or more standard codes from the selected jurisdiction     | Yes      |
| **Grade**         | Filters the standards list                                    | No       |
| **Coarse filter** | Pre-screens relevance before full evaluation, useful at scale | No       |

The evaluator supports 3 modes:

| Mode                    | Description                                                                                               |
| :---------------------- | :-------------------------------------------------------------------------------------------------------- |
| **Single check**        | One question against one standard                                                                         |
| **Batch evaluation**    | Set of question-standard pairs (e.g. tagging validation), run as a full question × standard cross-product |
| **By-grade evaluation** | Question bank against every standard for a grade, for whole-grade coverage analysis                       |

<Tip>
  Use **Batch** and **By-grade evaluation** to surface which standards (and
  which learning components) are fully, partially, or not covered by a given
  question bank.
</Tip>

## Output

<Note>
  The evaluator reduces alignment to a binary judgment (plus rationale) per learning component, and is not validated for grading, assessment, or placement decisions.

  Treat outputs as directional signals, and keep a human in the loop – especially for borderline cases.
</Note>

| Field                              | Description                                                                                        |
| ---------------------------------- | -------------------------------------------------------------------------------------------------- |
| `statementCode`                    | The standard evaluated.                                                                            |
| `learningComponents`               | One entry per learning component of the standard.                                                  |
| `learningComponents[].description` | Learning component description, sourced from the Knowledge Graph.                                  |
| `learningComponents[].reasoning`   | What the component requires, what the question actually asks, and why that is or is not alignment. |
| `learningComponents[].aligned`     | `true` if aligned, `false` if not.                                                                 |
| `learningComponents[].feedback`    | Revision guidance when not aligned; brief confirmation when aligned.                               |
| `alignedCount`                     | Count of learning components the question aligns to.                                               |
| `totalCount`                       | Total learning components for the standard.                                                        |
| `coarseFiltered`                   | `true` if the standard was excluded by the coarse filter (batch/by-grade modes only).              |

## Interpreting results

The evaluator assesses standards alignment based on how many of a standard's learning components a question meets:

| Level                 | Meaning                                                        |
| :-------------------- | :------------------------------------------------------------- |
| **Fully aligned**     | The question meets every learning component of the standard.   |
| **Partially aligned** | The question meets some, but not all, learning components.     |
| **Not aligned**       | The question meets none of the standard's learning components. |

<Note>
  **Example**: A question that asks students to find the area of a rectangle by
  multiplying its sides' lengths is commonly tagged to Common Core 3.MD.C.7.
  However, that question meets only one of the 4 learing components that make up
  that standard.

  At the parent-code level the question looks aligned; at the learning-component
  level, it covers a quarter of the standard.
</Note>

## Accuracy and validation

<Note>
  This evaluator is provided as **Early access**. Comprehensive accuracy
  measures are still evolving, and validation testing is ongoing.
</Note>

The prompt was optimized using [GEPA](https://gepa-ai.github.io/gepa/) ↗ via [DSPy](https://dspy.ai/) ↗ on a stratified split of 2,011 question-learning component pairs (724 train / 482 validation / 805 test). These were drawn from [Illustrative Mathematics v.360](https://accessim.org/) ↗ cool-down questions and annotated by 3 human experts.

The GEPA-optimized prompt was evaluated on the held-out test split, and separately reviewed by math experts on a sniff test of brand-new questions.

| Metric                                    | Result                                               |
| ----------------------------------------- | ---------------------------------------------------- |
| Accuracy (held-out test set)              | 73% against expert-annotated pairs                   |
| Sniff test pass rate (answer + reasoning) | 78% (29 of 37 sampled outputs passed; threshold 60%) |
| Dataset source                            | Illustrative Mathematics 360 (CC-BY-NC 4.0)          |

<Note>
  The evaluator was designed and validated on Illustrative Mathematics 360 cool-down questions, which are not evenly distributed across grades K–12 — performance may vary by grade. The evaluation set also may not fully represent the range of math inputs developers could submit, so brand-new or unusual inputs carry more risk of poor results.
</Note>

## Evaluator release history

| Date          | Changes       |
| ------------- | ------------- |
| June 30, 2026 | First release |
