What you need
Before you begin, make sure you have:- API key from the model provider
- Workspace
- The text or conversational output you want to evaluate.
- Required context (if applicable): Inputs such as grade level or intended audience required by the evaluator.
What you’ll do
STEP 1: Choose the content to evaluate
-
Select the content you want to evaluate.
Make sure the content:
- Matches the evaluator’s intended content type (e.g. informational text or conversation output)
- Falls within documented length and format constraints
- Does not include personal or sensitive data
-
Prepare the inputs required by the evaluator. Refer to the evaluator’s page for this info. This usually includes:
- The content to evaluate
- Any required contextual parameters, like the intended grade level.
STEP 2: Run the evaluator
-
Run the evaluator using the:
- Provided prompts
- Recommended LLM model
We recommend running the prompt 3 times and aggregating the results using a simple majority rule to improve accuracy when creating or validating your prompts, and running once when you are building our prompt into your production code.
STEP 3: Review the results
- Review the evaluator output based on the interpretation guidelines on the evaluator’s documentation page.