HTTP Service Host

Run one long-lived process that keeps several distinct agents resident behind an HTTP API, so external clients can invoke them and many requests run concurrently. This is the closest topology to "a running instance you hit."

Paladin ships this out of the box. The paladin-server binary (the web-server feature) serves a complete agent API — execution, streaming, async jobs, discovery, runtime registration, health/readiness, authentication, and an OpenAPI-documented /v1 surface. You configure it; you don't have to compose the endpoint yourself. (You can still embed the same routes in your own axum app — see Embedding.)

When to choose it

Choose it when an external client needs request/response access to your agents, and a single in-process call won't do.
Look elsewhere when you only call agents from your own code (embedded library), or you need scale-out / backpressure (queue / worker), or hard per-agent process isolation (sidecar).

The shipped server

The agent API is served under a /v1 version prefix; operational and docs endpoints are unversioned.

Method & path	Description
`POST /v1/agents/{id}/execute`	Run an agent, return the full result as JSON
`POST /v1/agents/{id}/execute/stream`	Run an agent, stream tokens as SSE (`chunk` … `done`)
`POST /v1/agents/{id}/jobs`	Enqueue an async run; returns a `job_id`
`GET /v1/agents/{id}/jobs/{job_id}`	Poll a job (`running` → `completed`/`failed`/`timed_out`)
`GET /v1/agents` · `GET /v1/agents/{id}`	Discover registered agents
`POST /v1/agents` · `DELETE /v1/agents/{id}`	Register / deregister at runtime (admin)
`GET /health` · `GET /ready`	Liveness / readiness probes (unauthenticated)
`GET /openapi.json` · `GET /docs`	OpenAPI 3.1 spec + Swagger UI

Every error is a structured envelope { "error": { "code", "message", "details" } }; every response carries an x-request-id. Each run is bounded by a timeout (server default, per-agent, or per-request), and on expiry the work is cancelled (504, or a terminal error SSE event).

Request flow

sequenceDiagram
    participant Client
    participant Server as paladin-server
    participant Service as PaladinExecutionService
    participant Agent as Paladin
    Client->>Server: POST /v1/agents/{id}/execute  (X-API-Key / Bearer)
    Server->>Server: authenticate + authorize (allowed_roles)
    Server->>Service: execute(agent, input)
    Service->>Agent: run (LLM + tools + memory)
    Agent-->>Service: PaladinResult
    Service-->>Server: output
    Server-->>Client: 200 JSON { output, … }

Configuring the host

Agents and host settings come from config.yml (see config.example.yml). A minimal shape:

server:
  host: "0.0.0.0"
  port: 8080

http:
  auth:
    enabled: true                  # fail-closed: the server refuses to start with no credentials
    api_keys:
      - { key: "${PALADIN_API_KEY_CI}", name: "ci", role: "admin" }
  docs:
    enabled: true                  # GET /openapi.json + Swagger UI at /docs

agents:
  - id: "researcher"
    model: "gpt-4"
    system_prompt: "You research topics thoroughly."
    allowed_roles: ["admin", "user"]   # empty ⇒ any authenticated caller

Authentication & authorization

Auth is enabled by default and fail-closed — with no credentials configured the server refuses to start (set http.auth.enabled: false for trusted/dev use). Callers present an API key (X-API-Key) or a JWT (Authorization: Bearer); a key/token maps to a role. Per-agent allowed_roles gate invocation, and runtime register/deregister require an admin role. /health, /ready, /openapi.json, and /docs are always reachable without a credential.

Running it

Binary:

PALADIN_CONFIG=./config.yml \
OPENAI_API_KEY=sk-... PALADIN_API_KEY_CI=sk-... \
cargo run --bin paladin-server --features web-server

Docker (Dockerfile.server):

make docker-build-server
docker run --rm -p 8080:8080 \
  -e OPENAI_API_KEY=sk-... -e PALADIN_API_KEY_CI=sk-... paladin-server:latest
# or: docker compose -f docker/docker-compose.server.yml up --build

Kubernetes (k8s/server/) — Deployment + Service + ConfigMap with liveness /health and readiness /ready probes:

kubectl apply -f k8s/namespace.yaml
kubectl apply -f k8s/server/secret.yaml -f k8s/server/

Versioning

The agent API is versioned under /v1: only additive, backward-compatible changes are made within it; breaking changes ship under a new prefix (/v2). The /openapi.json contract is generated from the handlers and guarded against drift.

Embedding in your own app

You can also mount the agent registry and your own handler inside an existing axum app instead of running the binary. cargo check compiles this in full, so it can't drift from the API:

#![allow(unused)]
fn main() {
use std::sync::Arc;
use std::time::Duration;

use paladin::MockLlmAdapter;
use paladin::application::services::paladin::paladin_builder::PaladinBuilder;
use paladin::application::services::paladin::paladin_execution_service::PaladinExecutionService;
use paladin::infrastructure::resilience::circuit_breaker::CircuitBreaker;
use paladin::infrastructure::web::{
    AgentApiState, AgentRegistry, HttpLayersConfig, agent_router, with_http_layers,
};
use paladin_ports::output::llm_port::LlmPort;
use paladin_ports::output::paladin_executor_port::PaladinExecutorPort;
use paladin_ports::output::streaming_executor_port::StreamingExecutorPort;

/// Build a resident agent registry and serve Paladin's shipped agent API — `/v1/agents/…`
/// (buffered, streaming, async jobs, discovery, registration) plus `/health` and `/ready` —
/// inside your own `axum` process. This is the same router the `paladin-server` binary uses,
/// so the endpoints are provided for you rather than hand-written.
pub async fn serve_agents() -> Result<(), Box<dyn std::error::Error>> {
    let llm: Arc<dyn LlmPort> = Arc::new(MockLlmAdapter::new());
    let breaker = Arc::new(CircuitBreaker::new(5, 2, Duration::from_secs(30)));

    // One execution service backs both the buffered and streaming handles.
    let service = Arc::new(PaladinExecutionService::new(
        llm.clone(),
        breaker,
        None,
        None,
    ));
    let executor: Arc<dyn PaladinExecutorPort> = service.clone();
    let streamer: Arc<dyn StreamingExecutorPort> = service;

    let paladin = PaladinBuilder::new(llm)
        .name("researcher")
        .system_prompt("You research topics thoroughly.")
        .build()
        .await?;

    // Resident agents, keyed by id, shared across concurrent requests.
    let registry = AgentRegistry::new();
    registry.insert_with_streaming("researcher", Arc::new(paladin), executor, Some(streamer));

    // `agent_router` mounts the agent API under `/v1` plus the unversioned health probes;
    // `with_http_layers` adds the cross-cutting layers (request-id, CORS, body limit, timeout,
    // rate limit). Auth is open here (the library default); `paladin-server` enables it from
    // config. To also serve the OpenAPI spec + Swagger UI, merge `openapi::docs_router`.
    let state = AgentApiState::new(Arc::new(registry));
    let app = with_http_layers(agent_router(state), &HttpLayersConfig::default());

    let listener = tokio::net::TcpListener::bind("0.0.0.0:8080").await?;
    axum::serve(listener, app).await?;
    Ok(())
}
}

Paladin Framework