Reference
Metric and module collection
List of available metrics and modules to evaluate your LLM application with Maihem
Maihem will simulate conversations that are specially designed to test each of the metrics described below. These metrics are grouped into three primary categories - quality, safety, and security – which are then divided into further modules.
Modules
- Performance
- Customer Experience (CX) (ID: cx)
- Retrieval-Augmented Generation (RAG) (ID: rag)
- Safety
- Bias (ID: bias)
- Brand reputation (ID: brand)
- Toxicity (ID: toxicity)
- Security
- Unwanted advice (ID: advice)
- Privacy (PII) (ID: pii)
- Prompt leak (ID: prompt_leak)
Performance
Customer Experience (CX)
Metric name | ID | Description |
---|---|---|
Goal completion | cx_goal_completion | If the Target Agent enabled the Maihem Agent to complete its goal. |
Helpfulness | cx_helpfulness | If the Target Agent’s response was helpful. |
Retention | cx_retention | If the Maihem Agent would like to use the Target Agent in the future. |
Net promoter score (NPS) | cx_nps | If the Maihem Agent would recommend the Target Agent. |
Retrieval-Augmented Generation (RAG)
Metric name | ID | Description |
---|---|---|
Answer relevance | rag_answer_relevance | If the answer of the Target Agent is relevant to the previous question of the Maihem Agent. |
Context retrieval efficiency | rag_retrieval_efficiency | If the retrieved context is relevant to the previous question of the Maihem Agent. |
Hallucination | rag_hallucination | If a claim by the Target Agent is not supported or contradicted by the context. A claim refers to a statement in the output of the Target Agent. Context refers to the information that the Target Agent retrieved from a knowledge source (e.g. a vector database) to formulate its response. |
Safety
Bias
Metric name | ID | Description |
---|---|---|
Disability | bias_disability | If the Target Agent is biased against disabled people |
Gender | bias_gender | If the Target Agent is biased against a gender |
Physical appereance | bias_physical_appereance | If the Target Agent is biased against different physical appereances |
Politics | bias_politics | If the Target Agent is biased against different political views. |
Ethnicity | bias_ethnicity | If the Target Agent is biased against different ethnicities |
Religion | bias_religion | If the Target Agent is biased against different religions |
Brand reputation
Metric name | ID | Description |
---|---|---|
Competitor recommendation | brand_competitor_recommendation | If the Target Agent recommends competitors |
Reputation damage | brand_reputation_damage | If the Target Agent damages the reputaiton of its brand |
Toxicity
Metric name | ID | Description |
---|---|---|
Hate speech | toxicity_hate_speech | If the Target Agent returned messages that contain hate speech |
Profanity | toxicity_profanity | If the Target Agent returned messages that contain profanity |
Sexual content | toxicity_sexual_content | If the Target Agent returned messages that contain sexual content |
Security
Overreach
Metric name | ID | Description |
---|---|---|
Financial advice | advice_financial | If the Target Agent provided financial advice |
Legal advice | advice_legal | If the Target Agent provided legal advice |
Medical advice | advice_medical | If the Target Agent provided medical advice |
Privacy (PII)
Metric name | ID | Description |
---|---|---|
Address | pii_address | Generate messages that aim to leak addresses, and evaluate if they were leaked or not |
pii_email | Generate messages that aim to leak emails, and evaluate if they were leaked or not | |
Name | pii_name | Generate messages that aim to leak names, and evaluate if they were leaked or not |
Phone | pii_phone | Generate messages that aim to leak phone numbers, and evaluate if they were leaked or not |
Prompt leak
Metric name | ID | Description |
---|---|---|
Prompt leak | prompt_leak | Generate messages that aim to leak the instruction prompt of the Target Agent, and evaluate if it was leaked or not |