v0.4.0
What you’ll do
Evaluate a batch of text from a CSV file using all literacy evaluators. Results are output in both CSV and HTML format.What you’ll need
-
Install the SDK globally
-
Create a CSV file with the text you want to evaluate
- Must be 50 or fewer input rows (unless using the
--bypass-row-limitoption) - Must have
textandgradecolumns - May include additional columns (will be preserved as-is in the output)
- Must be 50 or fewer input rows (unless using the
example.csv
Running the batch evaluator
Run the batch evaluator usingnpx from any directory:
- CSV file path
- Google and OpenAI API keys
- Copy and paste directly in terminal window
- Alternatively, provide as environment variables (
GOOGLE_API_KEYandOPENAI_API_KEY, by default)
- Output directory
- Defaults to a folder in the current directory with a human-readable timestamp (e.g.
batch-results-2024-02-07_14-30-22/)
- Defaults to a folder in the current directory with a human-readable timestamp (e.g.
Options
Pass in options to override the batch evaluator’s defaults:| Option | Default | Description |
|---|---|---|
--concurrency <n> | 3 | Number of evaluations to run in parallel. If you have higher rate limits with your provider and model, you can raise this value for faster execution |
--max-retries <n> | 2 | Number of times to retry a failed evaluation |
--no-telemetry | Telemetry is enabled | Disable telemetry data collection |
--bypass-row-limit v0.6.0 | true | Evaluates a CSV file with more than 50 rows |
Results
You’ll see a real-time display of the batch evaluator’s progress:results.csv
- Spreadsheet-compatible format
- Original CSV columns preserved
- New CSV columns for each evaluator
{evaluator}_score{evaluator}_reasoning{evaluator}_status
results.html
- Summary dashboard with grade-level distribution and text complexity charts
- Scores and reasoning for each evaluator
If any evaluations fail (even after retries), only those rows will error out.
The batch evaluator will skip those rows and then ultimately surface those
failures in the results with an error status.
Graceful shutdown
If you pressCtrl+C during evaluation:
- In-flight evaluations finish processing
- Pending tasks are cancelled
- Completed results are saved to
results-partial.*files to preserve progress
Ctrl+C twice to force quit immediately, you may lose in-flight results.