List of evaluators and metrics - Maihem

Intent classification

Metrics	Requires ground truth	Explanation
Precision	Yes	Fraction of predicted intents that are correct
Recall	Yes	Fraction of actual intents that were correctly identified
F1	Yes	Harmonic mean of precision and recall for intent classification
Triggering accuracy	Yes	Binary classification wether there’s an entity or not

Routing

Metrics	Requires ground truth	Explanation
Precision	Yes	Fraction of predicted routing that are correct
Recall	Yes	Fraction of actual routing that were correctly identified
F1	Yes	Harmonic mean of precision and recall for routing
Distribution	No	Distribution of the various identified routes over time

Entity recognition

Metrics	Requires ground truth	Explanation
Precision	Yes	Fraction of recognized entities that are correct
Recall	Yes	Fraction of actual entities that were correctly recognized
F1	Yes	Harmonic mean of precision and recall for entity recognition
Triggering accuracy	Yes	Binary classification wether there’s an entity or not
Distribution	No	Distribution of the identified entities over time

Rephrasing

Metrics	Requires ground truth	Explanation
Semantic similarity	Yes	Retention of original meaning in the summary
Semantic overhead	Yes	Addition of unnecessary or extraneous information
Semantic loss	Yes	Key detail omission in the rephrased text

Document retrieval

Metrics	Requires ground truth	Explanation
Average relevance	No	Overall pertinence of retrieved items to the query
Precision@k	Optional	Proportion of relevant items among the top k retrieved results
Recall@k	Optional	Fraction of all relevant items present in the top k results
MRR	Optional	Average reciprocal rank of the first relevant result
nDCG	Optional	Ranking quality based on positions of relevant items
Redundancy among chunks	No	Redundancy or similarity among retrieved content chunks

Reranking

Metrics	Requires ground truth	Explanation
Average relevance	No	Overall relevance of reranked results
Precision@k	Optional	Accuracy of the top k reranked items
Recall@k	Optional	Fraction of relevant items in the top k reranked results
MRR	Optional	Rank position of the first relevant reranked result
nDCG	Optional	Ranking quality based on the position of relevant items
Redundancy among chunks	No	Redundancy among reranked items to ensure diversity

Filtering

Metrics	Requires ground truth	Explanation
Average relevance	No	Overall relevance of the filtered content.
Precision@k	Optional	Proportion of relevant items in the top k filtered results
Recall@k	Optional	Fraction of all relevant items captured in the top k results
MRR	Optional	Reciprocal rank of the first relevant filtered result
nDCG	Optional	Ranking quality of the filtered items
Redundancy among chunks	No	Similarity among filtered chunks to avoid redundancy

Summarization

Metrics	Requires ground truth	Explanation
Semantic similarity	Yes	Retention of original meaning in the summary
Overlap	Yes	Extent of shared content between the summary and the source
Summarization rate	Yes	Degree of condensation from the original text to the summary
ROUGE Scores	Yes	Lemmatized overlap between original passage and summary

Answer generation

Metrics	Requires ground truth	Explanation
Answer relevance and completeness	Yes	Pertinence and usefulness of the answer to the question
Hallucination	Yes	Presence of unsupported details or logical inference within the answer
Context usage	Yes	Effectiveness of the answer in incorporating the provided context
Tone	Yes	Appropriateness and consistency of the answer’s tone

Post-processing

Metrics	Requires ground truth	Explanation
Link validation	Yes	Accuracy and functionality of hyperlinks in the output
Citation count	Yes	Number of references provided to support the answer
Formatting accuracy	Yes	Adherence to formatting guidelines

On this page

Intent classification
Routing
Entity recognition
Rephrasing
Document retrieval
Reranking
Filtering
Summarization
Answer generation
Post-processing