Early Release
This evaluator reflects early-stage work. We’re continuously improving its accuracy and reliability.
Requirements
To run the Vocabulary evaluator, you’ll need the following:- System and user prompts that create a baseline assumption for a student’s background knowledge.
- System and user prompts that measure vocabulary complexity.
- A set of models and temperature settings, one for each stage of the evaluator:
- Model: GPT-4o
- Temp: 0
- Model: Gemini-2.5-pro
- Temp: 0
Running the evaluator
- Make an API call to GPT-4o using the background knowledge prompts. This returns a string representation of the background knowledge that students in the target grade level are likely to have.
- Using the background knowledge produced in step 1, make another API call to Gemini-2.5-pro using the complexity prompt.
Choosing test passages
We tested with texts that range between 130 and 205 words. We recommend evaluating passages that fall within this range.Run multiple times.We recommend running the evaluator 3 times for each text passage to smooth over any errors the LLM might make.If you are working with text intended for multiple grade levels, you may want to run the evaluator a few times using different grade levels.