What you need
Before you begin, make sure you have:- API key from the model provider.
- Python workspace, the Evaluators Playground, or the appropriate SDK.
- The text you want to evaluate.
- Required context (if applicable): Inputs such as grade level or intended audience.
What you’ll do
STEP 1: Choose the content to evaluate
- Select the content you want to evaluate.
Make sure the content:
- Matches the evaluator’s intended content type (e.g., informational text or conversation output)
- Falls within documented length and format constraints
- Does not include personal or sensitive data
- Prepare the inputs required by the evaluator. Refer to the evaluator’s page for this info. This usually includes:
- The content to evaluate
- Any required contextual parameters, like the intended grade level.
STEP 2: Run the evaluator
- Run the evaluator using the:
- Provided prompts
- Recommended LLM model mentioned in the evaluator’s documentation page.
We recommend running the prompt 3 times and aggregating the results using a simple majority rule to improve accuracy when creating or validating your prompts, and running once when you are building our prompt into your production code.
STEP 3: Review the results
- Review the evaluator output based on the interpretation guidelines on the evaluator’s documentation page.