Intent classification

MetricsRequires ground truthExplanation
PrecisionYesFraction of predicted intents that are correct
RecallYesFraction of actual intents that were correctly identified
F1YesHarmonic mean of precision and recall for intent classification
Triggering accuracyYesBinary classification wether there’s an entity or not

Routing

MetricsRequires ground truthExplanation
PrecisionYesFraction of predicted routing that are correct
RecallYesFraction of actual routing that were correctly identified
F1YesHarmonic mean of precision and recall for routing
DistributionNoDistribution of the various identified routes over time

Entity recognition

MetricsRequires ground truthExplanation
PrecisionYesFraction of recognized entities that are correct
RecallYesFraction of actual entities that were correctly recognized
F1YesHarmonic mean of precision and recall for entity recognition
Triggering accuracyYesBinary classification wether there’s an entity or not
DistributionNoDistribution of the identified entities over time

Rephrasing

MetricsRequires ground truthExplanation
Semantic similarityYesRetention of original meaning in the summary
Semantic overheadYesAddition of unnecessary or extraneous information
Semantic lossYesKey detail omission in the rephrased text

Document retrieval

MetricsRequires ground truthExplanation
Average relevanceNoOverall pertinence of retrieved items to the query
Precision@kOptionalProportion of relevant items among the top k retrieved results
Recall@kOptionalFraction of all relevant items present in the top k results
MRROptionalAverage reciprocal rank of the first relevant result
nDCGOptionalRanking quality based on positions of relevant items
Redundancy among chunksNoRedundancy or similarity among retrieved content chunks

Reranking

MetricsRequires ground truthExplanation
Average relevanceNoOverall relevance of reranked results
Precision@kOptionalAccuracy of the top k reranked items
Recall@kOptionalFraction of relevant items in the top k reranked results
MRROptionalRank position of the first relevant reranked result
nDCGOptionalRanking quality based on the position of relevant items
Redundancy among chunksNoRedundancy among reranked items to ensure diversity

Filtering

MetricsRequires ground truthExplanation
Average relevanceNoOverall relevance of the filtered content.
Precision@kOptionalProportion of relevant items in the top k filtered results
Recall@kOptionalFraction of all relevant items captured in the top k results
MRROptionalReciprocal rank of the first relevant filtered result
nDCGOptionalRanking quality of the filtered items
Redundancy among chunksNoSimilarity among filtered chunks to avoid redundancy

Summarization

MetricsRequires ground truthExplanation
Semantic similarityYesRetention of original meaning in the summary
OverlapYesExtent of shared content between the summary and the source
Summarization rateYesDegree of condensation from the original text to the summary
ROUGE ScoresYesLemmatized overlap between original passage and summary

Answer generation

MetricsRequires ground truthExplanation
Answer relevance and completenessYesPertinence and usefulness of the answer to the question
HallucinationYesPresence of unsupported details or logical inference within the answer
Context usageYesEffectiveness of the answer in incorporating the provided context
ToneYesAppropriateness and consistency of the answer’s tone

Post-processing

MetricsRequires ground truthExplanation
Link validationYesAccuracy and functionality of hyperlinks in the output
Citation countYesNumber of references provided to support the answer
Formatting accuracyYesAdherence to formatting guidelines