Batch evaluator - AI Developer Tools for Education

v0.4.0

What you’ll do

Evaluate a batch of text from a CSV file using all literacy evaluators. Results are output in both CSV and HTML format.

What you’ll need

Install the SDK globally

  npm install -g @learning-commons/evaluators

Create a CSV file with the text you want to evaluate
- Must be 50 or fewer input rows (unless using the --bypass-row-limit option)
- Must have text and grade columns
- May include additional columns (will be preserved as-is in the output)

example.csv

text,grade
"The cat sat on the mat.",3
"Photosynthesis is the process by which plants convert sunlight into energy.",5
"The mitochondria are the powerhouse of the cell.",8

Running the batch evaluator

Run the batch evaluator using npx from any directory:

# Pass the CSV path with the required API key(s) + output directory
npx evaluators-batch input.csv \
  --google-api-key $GOOGLE_API_KEY \
  --openai-api-key $OPENAI_API_KEY \
  --output-dir ./batch-results

npx evaluators-batch --help # Lists all options
npx evaluators-batch --version # Prints SDK version

Interactive prompts

If you omit a required input, the CLI prompts you interactively for the CSV file path, API keys, and/or output directory.

Before starting evaluations, the CLI always shows a confirm prompt. This intentional safety checkpoint helps prevent accidental expensive runs.

Options

Pass options to override the batch evaluator’s defaults:

evaluators-batch input.csv \
  --concurrency 5 \
  --max-retries 3 \
  --model anthropic:claude-opus-4-8 \
  --no-telemetry

Option	Default	Description
`<csv-path>` v0.7.0		Positional argument for the input CSV file path
`--help` v0.7.0		Lists all flags and usage information
`--version` v0.7.0		Prints the SDK version
`--google-api-key <key>` v0.7.0	`GOOGLE_API_KEY`environment variable	Google API key
`--openai-api-key <key>` v0.7.0	`OPENAI_API_KEY` environment variable	OpenAI API key
`--anthropic-api-key <key>` v0.7.0	`ANTHROPIC_API_KEY` environment variable	Anthropic API key
`--model <provider:model>` v0.7.0	Evaluator’s default provider and model	Global model override for an evaluator (e.g., `--model anthropic:claude-opus-4-8`) When set, only the `--model-override` provider’s API key is required.
`--output-dir <path>` v0.7.0	Timestamped folder	Output directory path
`--concurrency <n>`	`3`	Number of evaluations to run in parallel If you have higher rate limits with your provider and model, raise this value for faster execution.
`--max-retries <n>`	`2`	Number of times to retry a failed evaluation
`--no-telemetry`		Disables telemetry data collection
`--bypass-row-limit` v0.6.0		Evaluates a CSV file with more than 50 rows

When the run completes, the CLI prints the full path to the HTML report so you can open it directly.

Results

You’ll see a real-time display of the batch evaluator’s progress:

Processing evaluations...
████████████░░░░░░░░ 60% (30/50)
  ✓ grade-level-appropriateness: 6/10 successful
  ✓ subject-matter-knowledge: 6/10 successful
  ✓ vocabulary: 6/10 successful
  ✓ sentence-structure: 6/10 successful
  ⏳ conventionality: 6/10 successful

⏱  Elapsed: 2m 15s | Estimated remaining: 1m 30s

The batch evaluator will generate 2 files in your output directory:

batch-results-2024-02-07_14-30-22/
├── results.csv
└── results.html

results.csv

Spreadsheet-compatible format
Original CSV columns preserved
New CSV columns for each evaluator
- {evaluator}_score
- {evaluator}_reasoning
- {evaluator}_status

results.html

Summary dashboard with grade-level distribution and text complexity charts
Scores and reasoning for each evaluator

If any evaluations fail (even after retries), only those rows will error out. The batch evaluator will skip those rows and then ultimately surface those failures in the results with an error status.

Graceful shutdown

If you press Ctrl+C during evaluation:

In-flight evaluations finish processing
Pending tasks are cancelled
Completed results are saved to results-partial.* files to preserve progress

⚠️  Shutdown requested. Saving partial results...
   (Press Ctrl+C again to force quit)

✓ Saved 15 results to:
  ./batch-results-2024-02-07_14-30-22/
    ├── results-partial.csv
    └── results-partial.html

If you press Ctrl+C twice to force quit immediately, you may lose in-flight results.

​What you’ll do

​What you’ll need

​Running the batch evaluator

​Interactive prompts

​Options

​Results

​Graceful shutdown

What you’ll do

What you’ll need

Running the batch evaluator

Interactive prompts

Options

Results

Graceful shutdown