Core concepts

Learn more about the core concepts of our AI testing platform.

Agents

AI system that takes action and makes decisions

Target Agent The AI agent you want to test
Maihem Agent An AI agent that simulates interactions with your Target Agent

Workflow

A sequence of workflow steps

Workflow step An operation with determined input and output formats

Environment

The context in which the codebase is executed (production, dev, local, etc.)

Revision A unique label for the state of the codebase

Interaction

A particular realized sequence of transactions between a target agent and a user

Conversation A sequence of messages between a target agent and a user

Message

A corpus of text sent by the user or the target agent When a target agent generates a message, Maihem collects:

Trace A particular sequence of spans
Span A particular realization of a workflow step

Metric

Quantitative measure used for tracking and comparing performance

Criteria Statement that can be falisified with certainty – used to flag failures

Evaluator

Connector between workflow steps and metrics, that maps required inputs and outputs

Evaluation

A qualitative judgement on a criteria using a metric. It contains:

Score A numerical value against metric
Is failed A boolean value with a pass/fail judgement
Explanation String with details behind the judgement

Test

A configuration of a procedure, used to evaluate a workflow or workflow step of a target agent. It maps to a specific evaluator Can be conducted with:

Dataset Uploaded data with inputs for the workflow step, and optional expected outputs (ground truth)
Maihem agents Simulated and dynamic inputs for the workflow step, and optional expected outputs (ground truth)

Test Run

A particular execution of a test, used to ensure and compare quality levels among different versions and environments of the target agent. It contains:

Interactions
Evaluations
Detected failures

Get started

How-to guides

Reference