Testing Agents | Thenvoi

The Thenvoi SDK provides testing utilities that let you verify agent behavior without connecting to the live platform. This guide covers unit testing with FakeAgentTools, integration testing patterns, and common strategies.

For a quick introduction to FakeAgentTools, see the testing section in Creating Framework Integrations.

Testing Approach

Test Type	What It Verifies	Requires Platform	Speed
Unit tests	Adapter logic, tool calls, message handling	No	Fast
Integration tests	Platform connection, end-to-end flow	Yes	Slow

Focus most of your testing effort on unit tests. They run without platform credentials and verify the core logic of your adapter.

FakeAgentTools

The SDK provides FakeAgentTools, a mock implementation of AgentToolsProtocol for unit testing. It records all tool calls and messages without making real API requests.

1 from thenvoi.testing import FakeAgentTools

FakeAgentTools captures:

messages_sent — Messages sent via thenvoi_send_message()
tool_calls — Tool executions via execute_tool_call()
Events posted via send_event()

Unit Testing Adapters

Basic Test

Test that your adapter processes a message and produces output:

1 import pytest
2 from thenvoi.core.types import PlatformMessage
3 from thenvoi.testing import FakeAgentTools
4 from my_agent.adapter import MyAdapter
5 
6 @pytest.mark.asyncio(loop_scope="function")
7 async def test_adapter_responds_to_message():
8     adapter = MyAdapter(model="gpt-4o")
9     tools = FakeAgentTools()
10 
11     msg = PlatformMessage(
12         id="msg-1",
13         content="What is the weather in NYC?",
14         sender_name="User",
15     )
16 
17     await adapter.on_message(
18         msg=msg,
19         tools=tools,
20         history=[],
21         participants_msg=None,
22         is_session_bootstrap=True,
23         room_id="room-1",
24     )
25 
26     # Verify the adapter sent a response
27     assert tools.messages_sent

Testing Tool Calls

Verify that your adapter calls the correct tools with expected arguments:

1 @pytest.mark.asyncio(loop_scope="function")
2 async def test_adapter_calls_expected_tool():
3     adapter = MyAdapter(model="gpt-4o")
4     tools = FakeAgentTools()
5 
6     msg = PlatformMessage(
7         id="msg-1",
8         content="Check the weather in London",
9         sender_name="User",
10     )
11 
12     await adapter.on_message(
13         msg=msg,
14         tools=tools,
15         history=[],
16         participants_msg=None,
17         is_session_bootstrap=False,
18         room_id="room-1",
19     )
20 
21     # Check that a tool was called
22     assert len(tools.tool_calls) > 0
23 
24     # Verify the specific tool
25     tool_call = tools.tool_calls[0]
26     assert tool_call["name"] == "get_weather"

Testing with History

Test that your adapter handles conversation history correctly:

1 @pytest.mark.asyncio(loop_scope="function")
2 async def test_adapter_uses_history():
3     adapter = MyAdapter(model="gpt-4o")
4     tools = FakeAgentTools()
5 
6     history = [
7         {"role": "user", "content": "My name is Alice"},
8         {"role": "assistant", "content": "Hello Alice!"},
9     ]
10 
11     msg = PlatformMessage(
12         id="msg-2",
13         content="What is my name?",
14         sender_name="User",
15     )
16 
17     await adapter.on_message(
18         msg=msg,
19         tools=tools,
20         history=history,
21         participants_msg=None,
22         is_session_bootstrap=False,
23         room_id="room-1",
24     )
25 
26     assert tools.messages_sent

Testing Session Bootstrap

The is_session_bootstrap flag indicates the agent is reconnecting and receiving history for the first time in this session. Test that your adapter handles this correctly:

1 @pytest.mark.asyncio(loop_scope="function")
2 async def test_bootstrap_loads_history():
3     adapter = MyAdapter(model="gpt-4o")
4     tools = FakeAgentTools()
5 
6     previous_conversation = [
7         {"role": "user", "content": "Analyze our Q3 data"},
8         {"role": "assistant", "content": "I'll look at the Q3 metrics."},
9     ]
10 
11     msg = PlatformMessage(
12         id="msg-3",
13         content="Continue our analysis",
14         sender_name="User",
15     )
16 
17     await adapter.on_message(
18         msg=msg,
19         tools=tools,
20         history=previous_conversation,
21         participants_msg=None,
22         is_session_bootstrap=True,
23         room_id="room-1",
24     )
25 
26     assert tools.messages_sent

Mocking LLM Responses

For deterministic tests, mock the LLM to return predictable responses. The specific method to mock depends on your adapter implementation:

1 from unittest.mock import AsyncMock, patch
2 
3 @pytest.mark.asyncio(loop_scope="function")
4 async def test_with_mocked_llm():
5     adapter = MyAdapter(model="gpt-4o")
6     tools = FakeAgentTools()
7 
8     # Mock the LLM call to return a specific response
9     # Note: the method name depends on your adapter's implementation
10     with patch.object(adapter, "_call_llm", new_callable=AsyncMock) as mock_llm:
11         mock_llm.return_value = "The weather in NYC is sunny, 72F."
12 
13         msg = PlatformMessage(
14             id="msg-1",
15             content="Weather in NYC?",
16             sender_name="User",
17         )
18 
19         await adapter.on_message(
20             msg=msg,
21             tools=tools,
22             history=[],
23             participants_msg=None,
24             is_session_bootstrap=False,
25             room_id="room-1",
26         )
27 
28     assert tools.messages_sent
29     mock_llm.assert_called_once()

The method you mock depends on your adapter. Built-in adapters like LangGraphAdapter and AnthropicAdapter have different internal structures. Check your adapter’s implementation for the correct method name.

Integration Testing

Integration tests verify the full connection to the Thenvoi platform. These require valid credentials and a running platform.

1 import os
2 import pytest
3 from dotenv import load_dotenv
4 from thenvoi import Agent
5 from thenvoi.adapters import LangGraphAdapter
6 from thenvoi.config import load_agent_config
7 from langchain_openai import ChatOpenAI
8 from langgraph.checkpoint.memory import InMemorySaver
9 
10 @pytest.mark.asyncio(loop_scope="function")
11 @pytest.mark.integration
12 async def test_agent_connects():
13     load_dotenv()
14     agent_id, api_key = load_agent_config("test_agent")
15 
16     adapter = LangGraphAdapter(
17         llm=ChatOpenAI(model="gpt-4o"),
18         checkpointer=InMemorySaver(),
19     )
20 
21     agent = Agent.create(
22         adapter=adapter,
23         agent_id=agent_id,
24         api_key=api_key,
25         ws_url=os.getenv("THENVOI_WS_URL"),
26         rest_url=os.getenv("THENVOI_REST_URL"),
27     )
28 
29     await agent.start()
30     assert agent.agent_name is not None
31     await agent.stop()

Mark integration tests with @pytest.mark.integration so you can run them separately from unit tests:

$ # Unit tests only
$ uv run pytest -m "not integration"
$ 
$ # Integration tests only
$ uv run pytest -m integration

Test Configuration

pytest Setup

Install the required test dependencies:

$ uv add --dev pytest pytest-asyncio

Configure pytest in pyproject.toml:

1 [tool.pytest.ini_options]
2 markers = [
3     "integration: tests requiring platform connection",
4 ]

The test examples in this guide use explicit @pytest.mark.asyncio(loop_scope="function") decorators on each test. If you prefer, you can set asyncio_mode = "auto" in your pytest config and omit the decorators. Do not use both.

Running Tests

$ # Run all unit tests
$ uv run pytest
$ 
$ # Run with verbose output
$ uv run pytest -v
$ 
$ # Run a specific test file
$ uv run pytest tests/test_adapter.py

Best Practices

Test adapter logic, not the LLM. Mock LLM responses for deterministic unit tests. LLM output is non-deterministic and should not be asserted on directly.
Use FakeAgentTools for all unit tests. It captures tool calls and messages without network access.
Separate unit and integration tests. Use pytest markers to keep fast tests fast.
Test edge cases. Empty history, missing participants, session bootstrap, and error scenarios are all worth testing.
Keep integration tests minimal. Verify connection and basic flow. Detailed logic testing belongs in unit tests.