Create target agent (if you haven't already)

from maihem import Maihem

maihem_client = Maihem()

    label="AI History Tutor", # Optional
    role="AI tutor",
    description="A history tutor that helps students prepare for exams with practice questions and summaries",
    language="en" # Optional, default is "en" (English), follow ISO 639

Create a test with 'cx' module

Create a test using the Customer Experience (CX) module. A module defines the scope of a test.

A test will simulate conversations with personas to test the customer experience of your target agent using these metrics:

  • Helpfulness
  • Goal completion
  • Retention
  • Net promoter score (NPS)

Guide the simulated conversations using prompts:

  • maihem_behavior_prompt guides the behavior of the simulated personas
  • maihem_goal_prompt describes the goal of the simulated personas
  • maihem_population_prompt describes the desired population of simulated personas
from maihem import Maihem

maihem_client = Maihem()

    label="Customer experience Test #1", # Optional
    initiating_agent="maihem", # or "target"
    maihem_behavior_prompt="Request quizz-style questions and deep dives in bullet points for identified gaps",
    maihem_goal_prompt="Prepare for exam tomorrow on the Industrial Revolution",
    maihem_population_prompt="High school students, some care about school and want to ace the exam, some don't care and just want to pass",

Connect target agent with wrapper function

Modify the following function to wrap your target agent:

from typing import Tuple, List, Dict

def wrapper_function(
    conversation_id: str, # Keep track of different conversations with conversation_id
    maihem_agent_message: str, # The message from Maihem
    conversation_history: Dict # Auxiliary dictionary to store conversation history (if needed)
) -> Tuple[str, List[str]]:
    """Callable wrapper function to wrap your target agent to be tested."""
    # Replace with the message from your target agent
    target_agent_message = "Hi, how can I help you?"

    # If target initiates conversation, first maihem_agent_message is None

    # (Optional) add messages to conversation history
    conversation_history[conversation_id].append({"role": "maihem", "content": maihem_agent_message})
    conversation_history[conversation_id].append({"role": "target", "content": target_agent_message})
    # List of retrieved contexts for RAG evaluations
    contexts = ["Context_1", "Context_2"] 
    return target_agent_message, contexts

Run the test

A test run will generate:

  • Simulated conversations between your target agent and Maihem
  • Evaluations of the conversations
  • A list of detected failures
from wrapper_function import wrapper_function

test_run_result = maihem_client.run_test(
    label="Model X Prompt v2.5 (27/Nov/2024)", # Optional
    concurrent_conversations=10 # Optional

See test run results

See the results in your Maihem account.

Or get the test results:

test_run_result = maihem_client.get_test_run_result(


test_run_results contains the following information:

test_run_results.result = "failed"
test_run_results.score = 82.5
test_run_results.conversations[0].messages = [
        "role": "maihem",
        "content": "Do you think I'm well prepared for my test tomorrow?"
        "role": "target",
        "content": "I can't answer that, I don't have that information",
        "evaluation": {
            "is_failure": True,
            "explanation": "Goal not completed. The persona want to what are the topics she needs to still review for the test."

test_run_failures = [