Core concepts
Evaluation
An Evaluation
is an assessment of the performance of a Target Agent on Test Run.
There is an Evaluation for each Metric defined in a Test.
An Evaluation
can apply to one of three levels:
- Conversation
- Message
- Sentence
An Evaluation
consists of the following elements:
- Result: Denotes whether the evaluation
passed
orfailed
. - Score: A number between 0 (lowest performance) and 1 (highest performance)
- Explanation: The reason behind a
Score
andResult
, provided in natural language. - Confidence: A number between 0 (lowest confidence) and 1 (highest confidence) that the
Score
is accurate. - Classification: TBC