1

Create target agent (if you haven't already)

from maihem import Maihem

maihem_client = Maihem()

maihem_client.create_target_agent(
    name="history_tutor",
    label="AI History Tutor", # Optional
    role="AI tutor",
    description="A history tutor that helps students prepare for exams with practice questions and summaries",
    language="en" # Optional, default is "en" (English), follow ISO 639
)
2

Create a test with 'cx' module

Create a test using the Customer Experience (CX) module. A module defines the scope of a test.

A test will simulate conversations with personas to test the customer experience of your target agent using these metrics:

  • Helpfulness
  • Goal completion
  • Retention
  • Net promoter score (NPS)

Guide the simulated conversations using prompts:

  • maihem_behavior_prompt guides the behavior of the simulated personas
  • maihem_goal_prompt describes the goal of the simulated personas
  • maihem_population_prompt describes the desired population of simulated personas
from maihem import Maihem

maihem_client = Maihem()

maihem_client.create_test(
    name="cx_test_1",
    label="Customer experience Test #1", # Optional
    target_agent_name="history_tutor",
    initiating_agent="maihem", # or "target"
    modules=["cx"],
    maihem_behavior_prompt="Request quizz-style questions and deep dives in bullet points for identified gaps",
    maihem_goal_prompt="Prepare for exam tomorrow on the Industrial Revolution",
    maihem_population_prompt="High school students, some care about school and want to ace the exam, some don't care and just want to pass",
    number_conversations=20,
    conversation_turns_max=5,
)
3

Connect target agent with wrapper function

Modify the following function to wrap your target agent:

from typing import Tuple, List, Dict

def wrapper_function(
    conversation_id: str, # Keep track of different conversations with conversation_id
    maihem_agent_message: str, # The message from Maihem
    conversation_history: Dict # Auxiliary dictionary to store conversation history (if needed)
) -> Tuple[str, List[str]]:
    """Callable wrapper function to wrap your target agent to be tested."""
    
    # Replace with the message from your target agent
    target_agent_message = "Hi, how can I help you?"

    # If target initiates conversation, first maihem_agent_message is None

    # (Optional) add messages to conversation history
    conversation_history[conversation_id].append({"role": "maihem", "content": maihem_agent_message})
    conversation_history[conversation_id].append({"role": "target", "content": target_agent_message})
    
    # List of retrieved contexts for RAG evaluations
    contexts = ["Context_1", "Context_2"] 
    
    return target_agent_message, contexts
4

Run the test

A test run will generate:

  • Simulated conversations between your target agent and Maihem
  • Evaluations of the conversations
  • A list of detected failures
from wrapper_function import wrapper_function

test_run_result = maihem_client.run_test(
    name="modelX_prompt2.5_27-11-2024",
    label="Model X Prompt v2.5 (27/Nov/2024)", # Optional
    test_name="cx_test_1",
    wrapper_function=wrapper_function,
    concurrent_conversations=10 # Optional
)
5

See test run results

See the results in your Maihem account.

Or get the test results:

test_run_result = maihem_client.get_test_run_result(
    test_name="cx_test_1",
    test_run_name="modelX_prompt2.5_27-11-2024"
)

print(test_run_result)

test_run_results contains the following information:

test_run_results.result = "failed"
test_run_results.score = 82.5
test_run_results.conversations[0].messages = [
    {
        "role": "maihem",
        "content": "Do you think I'm well prepared for my test tomorrow?"
    },
    {
        "role": "target",
        "content": "I can't answer that, I don't have that information",
        "evaluation": {
            "is_failure": True,
            "explanation": "Goal not completed. The persona want to what are the topics she needs to still review for the test."
        }
    }
]

test_run_failures = [
]