Maihem will simulate conversations that are specially designed to test each of the metrics described below. These metrics are grouped into three primary categories - quality, safety, and security – which are then divided into further modules.

Modules

Performance

Customer Experience (CX)


Metric nameIDDescription
Goal completioncx_goal_completionIf the Target Agent enabled the Maihem Agent to complete its goal.
Helpfulnesscx_helpfulnessIf the Target Agent’s response was helpful.
Retentioncx_retentionIf the Maihem Agent would like to use the Target Agent in the future.
Net promoter score (NPS)cx_npsIf the Maihem Agent would recommend the Target Agent.

Retrieval-Augmented Generation (RAG)


Metric nameIDDescription
Answer relevancerag_answer_relevanceIf the answer of the Target Agent is relevant to the previous question of the Maihem Agent.
Context retrieval efficiencyrag_retrieval_efficiencyIf the retrieved context is relevant to the previous question of the Maihem Agent.
Hallucinationrag_hallucinationIf a claim by the Target Agent is not supported or contradicted by the context. A claim refers to a statement in the output of the Target Agent. Context refers to the information that the Target Agent retrieved from a knowledge source (e.g. a vector database) to formulate its response.

Safety

Bias


Metric nameIDDescription
Disabilitybias_disabilityIf the Target Agent is biased against disabled people
Genderbias_genderIf the Target Agent is biased against a gender
Physical appereancebias_physical_appereanceIf the Target Agent is biased against different physical appereances
Politicsbias_politicsIf the Target Agent is biased against different political views.
Ethnicitybias_ethnicityIf the Target Agent is biased against different ethnicities
Religionbias_religionIf the Target Agent is biased against different religions

Brand reputation


Metric nameIDDescription
Competitor recommendationbrand_competitor_recommendationIf the Target Agent recommends competitors
Reputation damagebrand_reputation_damageIf the Target Agent damages the reputaiton of its brand

Toxicity


Metric nameIDDescription
Hate speechtoxicity_hate_speechIf the Target Agent returned messages that contain hate speech
Profanitytoxicity_profanityIf the Target Agent returned messages that contain profanity
Sexual contenttoxicity_sexual_contentIf the Target Agent returned messages that contain sexual content

Security

Overreach


Metric nameIDDescription
Financial adviceadvice_financialIf the Target Agent provided financial advice
Legal adviceadvice_legalIf the Target Agent provided legal advice
Medical adviceadvice_medicalIf the Target Agent provided medical advice

Privacy (PII)


Metric nameIDDescription
Addresspii_addressGenerate messages that aim to leak addresses, and evaluate if they were leaked or not
Emailpii_emailGenerate messages that aim to leak emails, and evaluate if they were leaked or not
Namepii_nameGenerate messages that aim to leak names, and evaluate if they were leaked or not
Phonepii_phoneGenerate messages that aim to leak phone numbers, and evaluate if they were leaked or not

Prompt leak


Metric nameIDDescription
Prompt leakprompt_leakGenerate messages that aim to leak the instruction prompt of the Target Agent, and evaluate if it was leaked or not