Testing Agents

Unit and integration testing patterns for Thenvoi agents

The Thenvoi SDK provides testing utilities that let you verify agent behavior without connecting to the live platform. This guide covers unit testing with FakeAgentTools, integration testing patterns, and common strategies.

For a quick introduction to FakeAgentTools, see the testing section in Creating Framework Integrations.


Testing Approach

Test TypeWhat It VerifiesRequires PlatformSpeed
Unit testsAdapter logic, tool calls, message handlingNoFast
Integration testsPlatform connection, end-to-end flowYesSlow

Focus most of your testing effort on unit tests. They run without platform credentials and verify the core logic of your adapter.


FakeAgentTools

The SDK provides FakeAgentTools, a mock implementation of AgentToolsProtocol for unit testing. It records all tool calls and messages without making real API requests.

1from thenvoi.testing import FakeAgentTools

FakeAgentTools captures:

  • messages_sent — Messages sent via thenvoi_send_message()
  • tool_calls — Tool executions via execute_tool_call()
  • Events posted via send_event()

Unit Testing Adapters

Basic Test

Test that your adapter processes a message and produces output:

1import pytest
2from thenvoi.core.types import PlatformMessage
3from thenvoi.testing import FakeAgentTools
4from my_agent.adapter import MyAdapter
5
6@pytest.mark.asyncio(loop_scope="function")
7async def test_adapter_responds_to_message():
8 adapter = MyAdapter(model="gpt-4o")
9 tools = FakeAgentTools()
10
11 msg = PlatformMessage(
12 id="msg-1",
13 content="What is the weather in NYC?",
14 sender_name="User",
15 )
16
17 await adapter.on_message(
18 msg=msg,
19 tools=tools,
20 history=[],
21 participants_msg=None,
22 is_session_bootstrap=True,
23 room_id="room-1",
24 )
25
26 # Verify the adapter sent a response
27 assert tools.messages_sent

Testing Tool Calls

Verify that your adapter calls the correct tools with expected arguments:

1@pytest.mark.asyncio(loop_scope="function")
2async def test_adapter_calls_expected_tool():
3 adapter = MyAdapter(model="gpt-4o")
4 tools = FakeAgentTools()
5
6 msg = PlatformMessage(
7 id="msg-1",
8 content="Check the weather in London",
9 sender_name="User",
10 )
11
12 await adapter.on_message(
13 msg=msg,
14 tools=tools,
15 history=[],
16 participants_msg=None,
17 is_session_bootstrap=False,
18 room_id="room-1",
19 )
20
21 # Check that a tool was called
22 assert len(tools.tool_calls) > 0
23
24 # Verify the specific tool
25 tool_call = tools.tool_calls[0]
26 assert tool_call["name"] == "get_weather"

Testing with History

Test that your adapter handles conversation history correctly:

1@pytest.mark.asyncio(loop_scope="function")
2async def test_adapter_uses_history():
3 adapter = MyAdapter(model="gpt-4o")
4 tools = FakeAgentTools()
5
6 history = [
7 {"role": "user", "content": "My name is Alice"},
8 {"role": "assistant", "content": "Hello Alice!"},
9 ]
10
11 msg = PlatformMessage(
12 id="msg-2",
13 content="What is my name?",
14 sender_name="User",
15 )
16
17 await adapter.on_message(
18 msg=msg,
19 tools=tools,
20 history=history,
21 participants_msg=None,
22 is_session_bootstrap=False,
23 room_id="room-1",
24 )
25
26 assert tools.messages_sent

Testing Session Bootstrap

The is_session_bootstrap flag indicates the agent is reconnecting and receiving history for the first time in this session. Test that your adapter handles this correctly:

1@pytest.mark.asyncio(loop_scope="function")
2async def test_bootstrap_loads_history():
3 adapter = MyAdapter(model="gpt-4o")
4 tools = FakeAgentTools()
5
6 previous_conversation = [
7 {"role": "user", "content": "Analyze our Q3 data"},
8 {"role": "assistant", "content": "I'll look at the Q3 metrics."},
9 ]
10
11 msg = PlatformMessage(
12 id="msg-3",
13 content="Continue our analysis",
14 sender_name="User",
15 )
16
17 await adapter.on_message(
18 msg=msg,
19 tools=tools,
20 history=previous_conversation,
21 participants_msg=None,
22 is_session_bootstrap=True,
23 room_id="room-1",
24 )
25
26 assert tools.messages_sent

Mocking LLM Responses

For deterministic tests, mock the LLM to return predictable responses. The specific method to mock depends on your adapter implementation:

1from unittest.mock import AsyncMock, patch
2
3@pytest.mark.asyncio(loop_scope="function")
4async def test_with_mocked_llm():
5 adapter = MyAdapter(model="gpt-4o")
6 tools = FakeAgentTools()
7
8 # Mock the LLM call to return a specific response
9 # Note: the method name depends on your adapter's implementation
10 with patch.object(adapter, "_call_llm", new_callable=AsyncMock) as mock_llm:
11 mock_llm.return_value = "The weather in NYC is sunny, 72F."
12
13 msg = PlatformMessage(
14 id="msg-1",
15 content="Weather in NYC?",
16 sender_name="User",
17 )
18
19 await adapter.on_message(
20 msg=msg,
21 tools=tools,
22 history=[],
23 participants_msg=None,
24 is_session_bootstrap=False,
25 room_id="room-1",
26 )
27
28 assert tools.messages_sent
29 mock_llm.assert_called_once()

The method you mock depends on your adapter. Built-in adapters like LangGraphAdapter and AnthropicAdapter have different internal structures. Check your adapter’s implementation for the correct method name.


Integration Testing

Integration tests verify the full connection to the Thenvoi platform. These require valid credentials and a running platform.

1import os
2import pytest
3from dotenv import load_dotenv
4from thenvoi import Agent
5from thenvoi.adapters import LangGraphAdapter
6from thenvoi.config import load_agent_config
7from langchain_openai import ChatOpenAI
8from langgraph.checkpoint.memory import InMemorySaver
9
10@pytest.mark.asyncio(loop_scope="function")
11@pytest.mark.integration
12async def test_agent_connects():
13 load_dotenv()
14 agent_id, api_key = load_agent_config("test_agent")
15
16 adapter = LangGraphAdapter(
17 llm=ChatOpenAI(model="gpt-4o"),
18 checkpointer=InMemorySaver(),
19 )
20
21 agent = Agent.create(
22 adapter=adapter,
23 agent_id=agent_id,
24 api_key=api_key,
25 ws_url=os.getenv("THENVOI_WS_URL"),
26 rest_url=os.getenv("THENVOI_REST_URL"),
27 )
28
29 await agent.start()
30 assert agent.agent_name is not None
31 await agent.stop()

Mark integration tests with @pytest.mark.integration so you can run them separately from unit tests:

$# Unit tests only
$uv run pytest -m "not integration"
$
$# Integration tests only
$uv run pytest -m integration

Test Configuration

pytest Setup

Install the required test dependencies:

$uv add --dev pytest pytest-asyncio

Configure pytest in pyproject.toml:

1[tool.pytest.ini_options]
2markers = [
3 "integration: tests requiring platform connection",
4]

The test examples in this guide use explicit @pytest.mark.asyncio(loop_scope="function") decorators on each test. If you prefer, you can set asyncio_mode = "auto" in your pytest config and omit the decorators. Do not use both.

Running Tests

$# Run all unit tests
$uv run pytest
$
$# Run with verbose output
$uv run pytest -v
$
$# Run a specific test file
$uv run pytest tests/test_adapter.py

Best Practices

  • Test adapter logic, not the LLM. Mock LLM responses for deterministic unit tests. LLM output is non-deterministic and should not be asserted on directly.
  • Use FakeAgentTools for all unit tests. It captures tool calls and messages without network access.
  • Separate unit and integration tests. Use pytest markers to keep fast tests fast.
  • Test edge cases. Empty history, missing participants, session bootstrap, and error scenarios are all worth testing.
  • Keep integration tests minimal. Verify connection and basic flow. Detailed logic testing belongs in unit tests.