Skip to main content
All evaluators follow this workflow, even though their inputs and outputs may differ. For evaluator-specific setup, inputs, prompts, and interpretation, see the documentation for the evaluator you want to run.

What you need

Before you begin, make sure you have:
  • API key from the model provider
  • Workspace
  • The text or conversational output you want to evaluate.
  • Required context (if applicable): Inputs such as grade level or intended audience required by the evaluator.

What you’ll do

STEP 1: Choose the content to evaluate

  1. Select the content you want to evaluate. Make sure the content:
    • Matches the evaluator’s intended content type (e.g. informational text or conversation output)
    • Falls within documented length and format constraints
    • Does not include personal or sensitive data
  2. Prepare the inputs required by the evaluator. Refer to the evaluator’s page for this info. This usually includes:
    • The content to evaluate
  • Any required contextual parameters, like the intended grade level.

STEP 2: Run the evaluator

  1. Run the evaluator using the:
    • Provided prompts
    • Recommended LLM model
    mentioned in the evaluator’s documentation page.
We recommend running the prompt 3 times and aggregating the results using a simple majority rule to improve accuracy when creating or validating your prompts, and running once when you are building our prompt into your production code.

STEP 3: Review the results

  1. Review the evaluator output based on the interpretation guidelines on the evaluator’s documentation page.