Reference
List of evaluators and metrics
Intent classification
Metrics | Requires ground truth | Explanation |
---|---|---|
Precision | Yes | Fraction of predicted intents that are correct |
Recall | Yes | Fraction of actual intents that were correctly identified |
F1 | Yes | Harmonic mean of precision and recall for intent classification |
Triggering accuracy | Yes | Binary classification wether there’s an entity or not |
Routing
Metrics | Requires ground truth | Explanation |
---|---|---|
Precision | Yes | Fraction of predicted routing that are correct |
Recall | Yes | Fraction of actual routing that were correctly identified |
F1 | Yes | Harmonic mean of precision and recall for routing |
Distribution | No | Distribution of the various identified routes over time |
Entity recognition
Metrics | Requires ground truth | Explanation |
---|---|---|
Precision | Yes | Fraction of recognized entities that are correct |
Recall | Yes | Fraction of actual entities that were correctly recognized |
F1 | Yes | Harmonic mean of precision and recall for entity recognition |
Triggering accuracy | Yes | Binary classification wether there’s an entity or not |
Distribution | No | Distribution of the identified entities over time |
Rephrasing
Metrics | Requires ground truth | Explanation |
---|---|---|
Semantic similarity | Yes | Retention of original meaning in the summary |
Semantic overhead | Yes | Addition of unnecessary or extraneous information |
Semantic loss | Yes | Key detail omission in the rephrased text |
Document retrieval
Metrics | Requires ground truth | Explanation |
---|---|---|
Average relevance | No | Overall pertinence of retrieved items to the query |
Precision@k | Optional | Proportion of relevant items among the top k retrieved results |
Recall@k | Optional | Fraction of all relevant items present in the top k results |
MRR | Optional | Average reciprocal rank of the first relevant result |
nDCG | Optional | Ranking quality based on positions of relevant items |
Redundancy among chunks | No | Redundancy or similarity among retrieved content chunks |
Reranking
Metrics | Requires ground truth | Explanation |
---|---|---|
Average relevance | No | Overall relevance of reranked results |
Precision@k | Optional | Accuracy of the top k reranked items |
Recall@k | Optional | Fraction of relevant items in the top k reranked results |
MRR | Optional | Rank position of the first relevant reranked result |
nDCG | Optional | Ranking quality based on the position of relevant items |
Redundancy among chunks | No | Redundancy among reranked items to ensure diversity |
Filtering
Metrics | Requires ground truth | Explanation |
---|---|---|
Average relevance | No | Overall relevance of the filtered content. |
Precision@k | Optional | Proportion of relevant items in the top k filtered results |
Recall@k | Optional | Fraction of all relevant items captured in the top k results |
MRR | Optional | Reciprocal rank of the first relevant filtered result |
nDCG | Optional | Ranking quality of the filtered items |
Redundancy among chunks | No | Similarity among filtered chunks to avoid redundancy |
Summarization
Metrics | Requires ground truth | Explanation |
---|---|---|
Semantic similarity | Yes | Retention of original meaning in the summary |
Overlap | Yes | Extent of shared content between the summary and the source |
Summarization rate | Yes | Degree of condensation from the original text to the summary |
ROUGE Scores | Yes | Lemmatized overlap between original passage and summary |
Answer generation
Metrics | Requires ground truth | Explanation |
---|---|---|
Answer relevance and completeness | Yes | Pertinence and usefulness of the answer to the question |
Hallucination | Yes | Presence of unsupported details or logical inference within the answer |
Context usage | Yes | Effectiveness of the answer in incorporating the provided context |
Tone | Yes | Appropriateness and consistency of the answer’s tone |
Post-processing
Metrics | Requires ground truth | Explanation |
---|---|---|
Link validation | Yes | Accuracy and functionality of hyperlinks in the output |
Citation count | Yes | Number of references provided to support the answer |
Formatting accuracy | Yes | Adherence to formatting guidelines |