Learn more about the core concepts of our AI testing platform.

Agents

AI system that takes action and makes decisions

  • Target Agent The AI agent you want to test
  • Maihem Agent An AI agent that simulates interactions with your Target Agent

Workflow

A sequence of workflow steps

  • Workflow step An operation with determined input and output formats

Environment

The context in which the codebase is executed (production, dev, local, etc.)

  • Revision A unique label for the state of the codebase

Interaction

A particular realized sequence of transactions between a target agent and a user

  • Conversation A sequence of messages between a target agent and a user

Message

A corpus of text sent by the user or the target agent When a target agent generates a message, Maihem collects:

  • Trace A particular sequence of spans
  • Span A particular realization of a workflow step

Metric

Quantitative measure used for tracking and comparing performance

  • Criteria Statement that can be falisified with certainty – used to flag failures

Evaluator

Connector between workflow steps and metrics, that maps required inputs and outputs

Evaluation

A qualitative judgement on a criteria using a metric. It contains:

  • Score A numerical value against metric
  • Is failed A boolean value with a pass/fail judgement
  • Explanation String with details behind the judgement

Test

A configuration of a procedure, used to evaluate a workflow or workflow step of a target agent. It maps to a specific evaluator Can be conducted with:

  • Dataset Uploaded data with inputs for the workflow step, and optional expected outputs (ground truth)
  • Maihem agents Simulated and dynamic inputs for the workflow step, and optional expected outputs (ground truth)

Test Run

A particular execution of a test, used to ensure and compare quality levels among different versions and environments of the target agent. It contains:

  • Interactions
  • Evaluations
  • Detected failures