Core concepts
Learn more about the core concepts of our AI testing platform.
Agents
AI system that takes action and makes decisions
Target Agent
The AI agent you want to testMaihem Agent
An AI agent that simulates interactions with your Target Agent
Workflow
A sequence of workflow steps
Workflow step
An operation with determined input and output formats
Environment
The context in which the codebase is executed (production, dev, local, etc.)
Revision
A unique label for the state of the codebase
Interaction
A particular realized sequence of transactions between a target agent and a user
Conversation
A sequence of messages between a target agent and a user
Message
A corpus of text sent by the user or the target agent When a target agent generates a message, Maihem collects:
Trace
A particular sequence of spansSpan
A particular realization of a workflow step
Metric
Quantitative measure used for tracking and comparing performance
Criteria
Statement that can be falisified with certainty – used to flag failures
Evaluator
Connector between workflow steps and metrics, that maps required inputs and outputs
Evaluation
A qualitative judgement on a criteria using a metric. It contains:
Score
A numerical value against metricIs failed
A boolean value with a pass/fail judgementExplanation
String with details behind the judgement
Test
A configuration of a procedure, used to evaluate a workflow or workflow step of a target agent. It maps to a specific evaluator Can be conducted with:
Dataset
Uploaded data with inputs for the workflow step, and optional expected outputs (ground truth)Maihem agents
Simulated and dynamic inputs for the workflow step, and optional expected outputs (ground truth)
Test Run
A particular execution of a test, used to ensure and compare quality levels among different versions and environments of the target agent. It contains:
Interactions
Evaluations
Detected failures