1

Create target agent (if you haven't already)

from maihem import Maihem

maihem_client = Maihem()

maihem_client.create_target_agent(
    name="rag_financial_assistant",
    label="POC stock market agent", # Optional
    role="AI Financial Assistant",
    description="An AI assistant that provides information and summaries from financial documents."
    language="en" # Optional, default is "en" (English), follow ISO 639
)
2

Add documents to generate questions from them

Maihem supports documents in the following formats: pdf, txt, docx, md.

Move all the documents to the same folder.

documents_path = "/path/to/folder/with/documents"
3

Create a test

Create a RAG test by using the RAG module. A module defines the scope of a test.

A RAG test will generate a random set of questions from your documents to test your target agent, and evaluate these metrics:

  • Answer relevance
  • Context retrieval efficiency
  • Hallucinations

See more detailed documentation on metrics and modules.

from maihem import Maihem

maihem_client = Maihem()

maihem_client.create_test(
    name="rag_test_1",
    label="RAG test #1", # Optional
    target_agent_name="rag_financial_assistant",
    initiating_agent="maihem" # or "target"
    modules=["rag"],
    documents_path=documents_path # Path to folder with documents
    number_conversations=50,
    conversation_turns_max=5 # Optional, default is 10
)
4

Connect target agent with wrapper function

Modify the following function to wrap your target agent:

from typing import Tuple, List, Dict

def wrapper_function(
    conversation_id: str, # Keep track of different conversations with conversation_id
    maihem_agent_message: str, # The message from Maihem
    conversation_history: Dict # Auxiliary dictionary to store conversation history (if needed)
) -> Tuple[str, List[str]]:
    """Callable wrapper function to wrap your target agent to be tested."""
    
    # Replace with the message from your target agent
    target_agent_message = "Hi, how can I help you?"

    # If target initiates conversation, first maihem_agent_message is None

    # (Optional) add messages to conversation history
    conversation_history[conversation_id].append({"role": "maihem", "content": maihem_agent_message})
    conversation_history[conversation_id].append({"role": "target", "content": target_agent_message})
    
    # List of retrieved contexts for RAG evaluations
    contexts = ["Context_1", "Context_2"] 
    
    return target_agent_message, contexts
5

Run the test

A test run will generate:

  • Simulated conversations between your target agent and Maihem
  • Evaluations of the conversations
  • A list of detected failures
maihem_client.run_test(
    name="modelX_prompt2.5_28-11-2024",
    label="Model X Prompt v2.5 (28/Nov/2024)", # Optional
    test_name="rag_test_1",
    wrapper_function=wrapper_function, # your wrapper function
    concurrent_conversations=10 # Optional
)
6

See test run results

See the results in your Maihem account.

Or get the test results:

test_run_results = maihem_client.get_test_run_results(
    test_name="rag_test_1",
    test_run_name="modelX_prompt2.5_28-11-2024"
)

test_run_results contains the following information:

test_run_results.result = "failed"
test_run_results.score = 82.5
test_run_results.conversations[0].messages = [
    {
        "role": "maihem",
        "content": "When was Fund X created?"
    },
    {
        "role": "target",
        "content": "Sorry, I could not find this information.",
        "evaluation": {
            "is_failure": True,
            "explanation": "Hallucination detected. Fund X was created in 2005."
        }
    }
]

test_run_failures = [
]