Batch evaluator

v0.4.0

What you’ll do

Evaluate a batch of text from a CSV file using all literacy evaluators. Results are output in both CSV and HTML format.

What you’ll need

Install the SDK globally

  npm install -g @learning-commons/evaluators

Create a CSV file with the text you want to evaluate
- Must be 50 or fewer input rows (unless using the --bypass-row-limit option)
- Must have text and grade columns
- May include additional columns (will be preserved as-is in the output)

example.csv

text,grade
"The cat sat on the mat.",3
"Photosynthesis is the process by which plants convert sunlight into energy.",5
"The mitochondria are the powerhouse of the cell.",8

Running the batch evaluator

Run the batch evaluator using npx from any directory:

npx evaluators-batch

You will be prompted for the following information:

CSV file path
Google and OpenAI API keys
- Copy and paste directly in terminal window
- Alternatively, provide as environment variables (GOOGLE_API_KEY and OPENAI_API_KEY, by default)
Output directory
- Defaults to a folder in the current directory with a human-readable timestamp (e.g. batch-results-2024-02-07_14-30-22/)

Options

Pass in options to override the batch evaluator’s defaults:

evaluators-batch --concurrency 5 --max-retries 3 --no-telemetry

Option	Default	Description
`--concurrency <n>`	`3`	Number of evaluations to run in parallel. If you have higher rate limits with your provider and model, you can raise this value for faster execution
`--max-retries <n>`	`2`	Number of times to retry a failed evaluation
`--no-telemetry`	Telemetry is enabled	Disable telemetry data collection
`--bypass-row-limit` v0.6.0	`true`	Evaluates a CSV file with more than 50 rows

Results

You’ll see a real-time display of the batch evaluator’s progress:

Processing evaluations...
████████████░░░░░░░░ 60% (30/50)
  ✓ grade-level-appropriateness: 6/10 successful
  ✓ subject-matter-knowledge: 6/10 successful
  ✓ vocabulary: 6/10 successful
  ✓ sentence-structure: 6/10 successful
  ⏳ conventionality: 6/10 successful

⏱  Elapsed: 2m 15s | Estimated remaining: 1m 30s

The batch evaluator will generate 2 files in your output directory:

batch-results-2024-02-07_14-30-22/
├── results.csv
└── results.html

results.csv

Spreadsheet-compatible format
Original CSV columns preserved
New CSV columns for each evaluator
- {evaluator}_score
- {evaluator}_reasoning
- {evaluator}_status

results.html

Summary dashboard with grade-level distribution and text complexity charts
Scores and reasoning for each evaluator

If any evaluations fail (even after retries), only those rows will error out. The batch evaluator will skip those rows and then ultimately surface those failures in the results with an error status.

Graceful shutdown

If you press Ctrl+C during evaluation:

In-flight evaluations finish processing
Pending tasks are cancelled
Completed results are saved to results-partial.* files to preserve progress

⚠️  Shutdown requested. Saving partial results...
   (Press Ctrl+C again to force quit)

✓ Saved 15 results to:
  ./batch-results-2024-02-07_14-30-22/
    ├── results-partial.csv
    └── results-partial.html

If you press Ctrl+C twice to force quit immediately, you may lose in-flight results.

Understanding evaluators

Getting started

SDK API Reference

Literacy evaluators

Feedback evaluators

Standards evaluators

Datasets

Resources

What you’ll do

What you’ll need

Running the batch evaluator

Options

Results

Graceful shutdown

​What you’ll do

​What you’ll need

​Running the batch evaluator

​Options

​Results

​Graceful shutdown

What you’ll do

What you’ll need

Running the batch evaluator

Options

Results

Graceful shutdown