CLI Test Guide

This document describes the CLI test infrastructure, how tests are organized into tiers, and how to run them.

Test Tiers

Tier 1: Core Functionality (No External Dependencies)

Tests that run with cargo test and require no external services, API keys, or Docker.

Location: tests/cli/environment_tests.rs

What's tested:

Config file loading (valid, invalid, missing)
YAML parsing and validation (syntax errors, duplicate keys, tabs)
Edge cases (empty fields, large inputs, concurrent loading)
Non-interactive mode (all commands work via flags, no hanging prompts)
Environment variation (NO_COLOR, quiet/verbose modes, formatter behavior)
Full user journey (template generation → config load → output formatting)

Run:

cargo test cli::environment_tests::

Tier 2: Docker-Gated Service Tests

Tests that require Docker services (Redis, MinIO) to be running. Skipped automatically when services are unavailable.

Location: tests/integration/cli_real_services_test.rs

What's tested:

Redis connectivity and health checks
MinIO connectivity and health checks
Service unavailability detection
Connection error handling

Prerequisites:

make services-up   # Start Redis, MinIO, MySQL via Docker Compose

Run:

cargo test --test lib cli_real_services -- --ignored

Skip message: Tests print a clear message when Docker services are not available.

Tier 3: API-Key-Gated Provider Tests

Tests that require real LLM API keys. Behind the integration-tests feature flag and #[ignore].

Location: tests/integration/cli_real_providers_test.rs

What's tested:

OpenAI provider connection and streaming
Anthropic provider connection
DeepSeek provider connection
End-to-end agent config with real providers

Prerequisites:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export DEEPSEEK_API_KEY="sk-..."

Run:

cargo test --features integration-tests --test lib cli_real_providers -- --ignored

Tier 4: Live LLM API Integration Tests

Direct adapter-level tests that make real API calls to LLM providers. These tests validate the low-level integration of OpenAI, DeepSeek, and Anthropic adapters with their respective APIs. These tests incur API costs and should be run sparingly.

Location: tests/integration/llm_live_api_tests.rs

Feature Flag: live-api-tests

What's tested:

Each provider (OpenAI, DeepSeek, Anthropic) has 4 dedicated tests:

Basic completion - Validates generate() method with real API
Streaming completion - Validates generate_stream() method with chunked responses
Error handling - Tests invalid model detection and error mapping
Capabilities - Validates provider capabilities reporting

Total: 12 tests (4 per provider × 3 providers)

Test Characteristics:

All tests are marked with #[ignore] - they don't run by default
Tests skip gracefully if API keys are not present
Each test makes a real API call (costs apply)
Validates response structure, token usage, and finish reasons
Tests both success and error paths

Prerequisites:

# Set one or more API keys
export OPENAI_API_KEY="sk-..."
export DEEPSEEK_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-..."

Run all live API tests:

cargo test --features live-api-tests -- --ignored

Run specific provider tests:

# OpenAI only (4 tests)
cargo test --features live-api-tests test_openai -- --ignored

# DeepSeek only (4 tests)
cargo test --features live-api-tests test_deepseek -- --ignored

# Anthropic only (4 tests)
cargo test --features live-api-tests test_anthropic -- --ignored

Example output when API key is missing:

test test_openai_basic_completion ... ok (SKIPPED: OpenAI API key not found. Set OPENAI_API_KEY environment variable to run OpenAI live API tests.)

Example output when test passes:

test test_openai_basic_completion ... ok
✓ OpenAI basic completion: Hello from OpenAI

Cost Considerations:

Each test makes 1 API call (except error handling tests, which may fail fast)
Use small prompts (< 100 tokens) to minimize costs
Recommended models: gpt-3.5-turbo, deepseek-chat, claude-3-5-sonnet-20241022
Estimated cost per full test run: < $0.10 USD

When to run these tests:

Before releasing a new version
After modifying adapter implementations
When troubleshooting provider-specific issues
For validating API key configuration during setup
Not recommended in CI/CD pipelines (use mocks instead)

Running Tests

Quick Check (Tier 1 only — no dependencies)

cargo test cli::environment_tests::

All CLI Tests (Tier 1)

cargo test --test lib cli::

With Docker Services (Tier 1 + 2)

make services-up
cargo test --test lib cli:: -- --include-ignored

Full Suite (Tier 1 + 2 + 3)

make services-up
export OPENAI_API_KEY="sk-..."
cargo test --features integration-tests --test lib -- --include-ignored

Test Counts

Tier	Count	Gate
Tier 1 (Core)	45	None
Tier 2 (Docker)	6	`#[ignore]` + service check
Tier 3 (API keys)	5	`integration-tests` feature + `#[ignore]` + env var
Tier 4 (Live API)	12	`live-api-tests` feature + `#[ignore]` + env var

CI/CD Notes

Tier 1 tests run in every CI pipeline with no setup required
Non-interactive safety: All Tier 1 tests verify that CLI operations never block on stdin. The ensure_tty() guard detects non-TTY environments (CI runners) and returns a clear ValidationError instead of hanging
NO_COLOR: Formatters respect the NO_COLOR environment variable. Set NO_COLOR=1 in CI to suppress ANSI escape codes
Line buffering: All output uses println!/eprintln! which flush per-line — safe for CI log capture

Mock Infrastructure for Testing

MockLlmAdapter

The MockLlmAdapter provides a test double for LLM providers, enabling Tier 1 tests without API keys.

Location: tests/helpers/mock_llm_adapter.rs

Features:

Configurable responses: Queue pre-defined text, tool calls, streaming, or errors
Invocation recording: Capture all LLM calls for test assertions
Tool call simulation: Return function calls to test arsenal integration
Error injection: Simulate API failures, timeouts, rate limits

Example usage:

#![allow(unused)]
fn main() {
use tests::helpers::mock_llm_adapter::MockLlmAdapter;

let mock = MockLlmAdapter::new()
    .add_response("First response")
    .add_tool_call("web_search", json!({"query": "test"}))
    .add_response("Final answer");

// Use mock in PaladinExecutionService
let service = PaladinExecutionService::new(
    Arc::new(mock.clone()) as Arc<dyn LlmPort>,
    None,
    Arc::new(ArsenalRegistry::new()),
);

// Execute and assert
let result = service.execute(&paladin, "test input").await?;
assert_eq!(mock.invocations().len(), 3);
}

MockArsenalPort

The MockArsenalPort provides in-process tool mocking for testing arsenal integration.

Location: tests/helpers/mock_arsenal_adapter.rs

Features:

Tool registration: Add mock tools with schemas
Response configuration: Set success responses or errors
Invocation tracking: Verify tool calls with arguments
Error simulation: Test tool failure scenarios

Example usage:

#![allow(unused)]
fn main() {
use tests::helpers::mock_arsenal_adapter::MockArsenalPort;

let mock = MockArsenalPort::new()
    .add_tool("calculator", "Perform calculations", json!({
        "type": "object",
        "properties": {
            "expression": {"type": "string"}
        }
    }))
    .set_response("calculator", Ok(json!({"result": 42})));

// Use in PaladinExecutionService via ArsenalRegistry
let mut registry = ArsenalRegistry::new();
registry.register("mock_server", Arc::new(mock.clone()))?;

// Execute and assert
assert_eq!(mock.call_count("calculator"), 1);
}

MockPaladinPort

The MockPaladinPort enables Battalion testing without full Paladin execution.

Location: tests/helpers/mock_paladin_port.rs

Features:

Result configuration: Set expected Paladin outputs
Error simulation: Test error propagation in Battalions
Execution tracking: Verify execution order and count

Test Coverage

Current Test Statistics (as of Epic 23 completion)

Category	Tests	Coverage
Garrison Configuration	9	In-memory, SQLite, validation
Arsenal Configuration	8	STDIO, SSE, tool registration
Error Handling	14	Config errors, execution errors
Paladin Execution	6	Basic, with garrison, with arsenal
Formation Execution	4	Sequential flow, error propagation
Phalanx Execution	5	Parallel execution, aggregation
Tool Integration	8	LLM → Arsenal → result loop
Mock Infrastructure	9	MockArsenalPort unit tests
Scheduler	21	Unit + integration tests
Total CLI Tests	84	All CI-ready with mocks

Tool Integration Tests

Location: tests/cli/tool_integration_test.rs

Tests the complete LLM ↔ Arsenal ↔ Paladin tool call loop:

Core flow tests (2):
- test_tool_call_basic_flow: LLM function call → Arsenal execution → result
- test_tool_call_result_fed_back_to_llm: Tool result returned to LLM for synthesis
Error handling tests (4):
- test_tool_call_no_arsenal_available: Graceful handling when Arsenal not configured
- test_tool_call_unknown_tool: Tool not in registry
- test_tool_call_invalid_arguments: Malformed JSON arguments
- test_tool_call_execution_error: Tool invocation failure
Advanced tests (2):
- test_multiple_sequential_tool_calls: Chain of tool calls
- test_tool_call_with_garrison: Tools + memory integration

Adding New Tests

Pure logic / config tests → Add to tests/cli/environment_tests.rs (Tier 1)
Requires Docker services → Add to tests/integration/cli_real_services_test.rs with #[ignore]
Requires API keys → Add to tests/integration/cli_real_providers_test.rs with feature gate + #[ignore]
Tool integration → Add to tests/cli/tool_integration_test.rs using MockLlmAdapter + MockArsenalPort
Battalion orchestration → Use MockPaladinPort in Formation/Phalanx/Campaign tests
CLI output formatting → Add snapshot tests to tests/cli/ (see CLI Snapshot Testing)
Live LLM adapter tests → Add to tests/integration/llm_live_api_tests.rs with #[cfg(feature = "live-api-tests")] and #[ignore]
Always run cargo test cli::environment_tests:: after changes to verify Tier 1 passes

CLI Snapshot Testing

CLI snapshot testing ensures output consistency across code changes using the insta library.

Overview

Location: tests/cli/

Test Files:

table_output_test.rs - Table formatting with comfy-table
progress_output_test.rs - Progress indicators and bars
error_output_test.rs - Error messages and styled output
help_output_test.rs - Help text and documentation

Snapshot Location: tests/cli/snapshots/

Running Snapshot Tests

# Run all CLI snapshot tests
cargo test --test cli

# Review new/changed snapshots
cargo insta review

# Accept all new snapshots
cargo insta accept

# Reject all pending snapshots
cargo insta reject

Writing Snapshot Tests

Snapshot tests capture CLI output and compare against saved baselines:

#![allow(unused)]
fn main() {
use paladin::application::cli::formatters::table::TableFormatter;

#[test]
fn test_execution_summary() {
    let mut table = TableFormatter::new();
    table
        .set_header(vec!["Agent", "Status", "Time"])
        .add_row(vec!["DataAnalyzer", "Success", "1.2s"]);

    let output = table.render();

    // Compare against saved snapshot
    insta::assert_snapshot!("execution_summary", output);
}
}

First Run: Creates tests/cli/snapshots/cli__table_output_test__execution_summary.snap

Subsequent Runs: Compares output against snapshot, fails if different

Best Practices

Disable colors in tests:
```
NO_COLOR=1 cargo test --test cli
```

Use descriptive snapshot names:

#![allow(unused)]
fn main() {
insta::assert_snapshot!("table_with_styled_cells", output);  // Good
insta::assert_snapshot!("test1", output);                     // Bad
}

Test edge cases:
- Empty tables
- Long content requiring truncation
- Unicode/special characters
- Multi-line output
Review snapshots carefully:
- Verify output is correct before accepting
- Use cargo insta review for interactive approval
- Inspect snapshot files in tests/cli/snapshots/
Group related tests:
- Table tests → table_output_test.rs
- Error tests → error_output_test.rs
- Keep test files focused and organized

Snapshot File Format

Snapshots are stored as .snap files:

---
source: tests/cli/table_output_test.rs
expression: output
---
┌────────┬─────────┬──────┐
│ Agent  ┆ Status  ┆ Time │
╞════════╪═════════╪══════╡
│ DataA… ┆ Success ┆ 1.2s │
└────────┴─────────┴──────┘

Fields:

source: Test file location
expression: Rust expression being tested
Content: Actual snapshot data

CI/CD Integration

Snapshot tests run automatically in CI:

# .github/workflows/test.yml
- name: Run snapshot tests
  run: NO_COLOR=1 cargo test --test cli

- name: Check for pending snapshots
  run: cargo insta test --test cli --check

Note: CI will fail if snapshots need review. Use cargo insta accept locally and commit changes.

Example Test Categories

Table Output Tests (8 tests)

Simple tables
Long content
Styled cells (success/error/warning/info)
Empty tables
Single column
Numeric data
Special characters
Battalion results

Progress Output Tests (8 tests)

Default progress bar template
Custom template
Different totals
Message variations
Progress states (0%, 25%, 50%, 75%, 100%)
Builder pattern
Batch operations
File size formatting

Error Output Tests (15 tests)

Error message styles
Warning message styles
Info message styles
Success message styles
Link styles
Header rendering
Section rendering
Box message rendering
Key-value formatting
Emoji fallback
Separator lines
Quiet/verbose mode flags
Combined error scenarios
Multi-line error formatting

Help Output Tests (12 tests)

Basic command help
Command help with examples
Subcommand lists
Option groups
Help header
Usage examples section
Error help messages
Feature flags help
Environment variables help
Configuration help
Troubleshooting help
Version output

Use MockLlmAdapter for LLM tests:
- Queue expected responses in order
- Verify invocations after execution
- Test both success and error paths
Use MockArsenalPort for tool tests:
- Register tools with realistic schemas
- Configure responses for each tool
- Verify tool call arguments
Keep tests deterministic:
- No random values in mocks
- Use fixed response sequences
- Assert exact invocation counts
Test error scenarios:
- LLM errors: rate limits, timeouts, invalid responses
- Tool errors: execution failures, timeouts, unknown tools
- Config errors: invalid YAML, missing fields, type mismatches
Verify integration points:
- Garrison is queried for context
- Arsenal is called with correct arguments
- CircuitBreaker tracks failures
- Results are formatted correctly

Last updated: February 14, 2026
Epic: 23 - CLI, Config & Infrastructure Completion

Paladin Framework

CLI Test Guide

Test Tiers

Tier 1: Core Functionality (No External Dependencies)

Tier 2: Docker-Gated Service Tests

Tier 3: API-Key-Gated Provider Tests

Tier 4: Live LLM API Integration Tests

Running Tests

Quick Check (Tier 1 only — no dependencies)

All CLI Tests (Tier 1)

With Docker Services (Tier 1 + 2)

Full Suite (Tier 1 + 2 + 3)

Test Counts

CI/CD Notes

Mock Infrastructure for Testing

MockLlmAdapter

MockArsenalPort

MockPaladinPort

Test Coverage

Current Test Statistics (as of Epic 23 completion)

Tool Integration Tests

Adding New Tests

CLI Snapshot Testing

Overview

Running Snapshot Tests

Writing Snapshot Tests

Best Practices

Snapshot File Format

CI/CD Integration

Example Test Categories

Table Output Tests (8 tests)

Progress Output Tests (8 tests)

Error Output Tests (15 tests)

Help Output Tests (12 tests)

Total Snapshot Tests: 43

Writing Tests with Mocks

Best Practices