HTTP Service Host

Run one long-lived process that keeps several distinct agents resident behind an HTTP API, so external clients can invoke them and many requests run concurrently. This is the closest topology to "a running instance you hit."

The example below is compiled code pulled from the paladin-doc-examples crate via mdBook {{#include}}, so it matches the current axum + Paladin API.

Paladin ships no agent-execution endpoint. The web crate's create_app_router wires a user-management / auth REST API (/users/register, /users/login, user CRUD) — it does not run agents. The agent endpoint is yours to compose: an axum handler over a shared agent registry that calls PaladinExecutionService. That is exactly what the example below does.

When to choose it

  • Choose it when an external client needs request/response access to your agents, and a single in-process call won't do.
  • Look elsewhere when you only call agents from your own code (embedded library), or you need scale-out / backpressure (queue / worker), or hard per-agent process isolation (sidecar).

Request flow

sequenceDiagram
    participant Client
    participant Handler as axum handler
    participant Service as PaladinExecutionService
    participant Agent as Paladin
    Client->>Handler: POST /agents/{id}/execute
    Handler->>Service: execute(agent, input)
    Service->>Agent: run (LLM + tools + memory)
    Agent-->>Service: PaladinResult
    Service-->>Handler: output
    Handler-->>Client: 200 JSON { output }

Example: agents behind Axum

The handler looks an agent up by id in the shared registry and runs it. cargo check compiles this in full — including the axum::serve bind — so it can never drift from the real API:

#![allow(unused)]
fn main() {
use std::collections::HashMap;
use std::sync::Arc;
use std::time::Duration;

use axum::extract::{Path, State};
use axum::http::StatusCode;
use axum::routing::post;
use axum::{Json, Router};
use serde::{Deserialize, Serialize};

use paladin::MockLlmAdapter;
use paladin::application::services::paladin::paladin_execution_service::PaladinExecutionService;
use paladin::infrastructure::resilience::circuit_breaker::CircuitBreaker;
use paladin::prelude::*; // PaladinBuilder, LlmPort, Paladin, PaladinResult

/// Shared state: a registry of distinct agents, each with its own execution
/// service, all resident in this one long-running process.
#[derive(Clone)]
struct AppState {
    agents: Arc<HashMap<String, (Paladin, Arc<PaladinExecutionService>)>>,
}

#[derive(Deserialize)]
struct ExecuteRequest {
    input: String,
}

#[derive(Serialize)]
struct ExecuteResponse {
    output: String,
}

/// `POST /agents/{id}/execute` — look the agent up by id and run it. This handler
/// is **yours to write**: Paladin ships no agent-execution endpoint
/// (`paladin-web::create_app_router` is a separate user/auth API, not an agent
/// runner), so you compose `axum` + `PaladinExecutionService` yourself.
async fn execute_agent(
    State(state): State<AppState>,
    Path(id): Path<String>,
    Json(req): Json<ExecuteRequest>,
) -> Result<Json<ExecuteResponse>, StatusCode> {
    let (paladin, service) = state.agents.get(&id).ok_or(StatusCode::NOT_FOUND)?;
    let result: PaladinResult = service
        .execute(paladin, &req.input)
        .await
        .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
    Ok(Json(ExecuteResponse {
        output: result.output,
    }))
}

/// Wire the agent registry into an `axum` router.
fn agent_router(state: AppState) -> Router {
    Router::new()
        .route("/agents/{id}/execute", post(execute_agent))
        .with_state(state)
}

/// Build a couple of distinct agents and serve them over HTTP. Concurrent
/// requests share the registry and run on the `tokio` runtime.
pub async fn serve_agents() -> Result<(), Box<dyn std::error::Error>> {
    let llm: Arc<dyn LlmPort> = Arc::new(MockLlmAdapter::new());
    let breaker = Arc::new(CircuitBreaker::new(5, 2, Duration::from_secs(30)));

    let mut agents = HashMap::new();
    for (name, prompt) in [
        ("researcher", "You research topics thoroughly."),
        ("summarizer", "You write concise summaries."),
    ] {
        let agent = PaladinBuilder::new(llm.clone())
            .name(name)
            .system_prompt(prompt)
            .build()
            .await?;
        let service = Arc::new(PaladinExecutionService::new(
            llm.clone(),
            breaker.clone(),
            None,
            None,
        ));
        agents.insert(name.to_string(), (agent, service));
    }

    let state = AppState {
        agents: Arc::new(agents),
    };
    let app = agent_router(state);

    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await?;
    axum::serve(listener, app).await?;
    Ok(())
}
}

Configuring the host

Host and per-agent settings typically come from your config.yml rather than being hard-coded. A minimal shape:

host:
  bind_address: "0.0.0.0:3000"
agents:
  - id: "researcher"
    model: "gpt-4"
    system_prompt: "You research topics thoroughly."
  - id: "summarizer"
    model: "gpt-4"
    system_prompt: "You write concise summaries."

See also

  • The bundled user/auth routes (paladin-web) a real service often also needs — Crate Map & Feature Flags.
  • Running the same agent host in a separate process, called over the network — Sidecar.

← Back to Choosing a topology