HTTP Service Host
Run one long-lived process that keeps several distinct agents resident behind an HTTP API, so external clients can invoke them and many requests run concurrently. This is the closest topology to "a running instance you hit."
The example below is compiled code pulled from the
paladin-doc-examplescrate via mdBook{{#include}}, so it matches the currentaxum+ Paladin API.
Paladin ships no agent-execution endpoint. The web crate's
create_app_routerwires a user-management / auth REST API (/users/register,/users/login, user CRUD) — it does not run agents. The agent endpoint is yours to compose: anaxumhandler over a shared agent registry that callsPaladinExecutionService. That is exactly what the example below does.
When to choose it
- Choose it when an external client needs request/response access to your agents, and a single in-process call won't do.
- Look elsewhere when you only call agents from your own code (embedded library), or you need scale-out / backpressure (queue / worker), or hard per-agent process isolation (sidecar).
Request flow
sequenceDiagram
participant Client
participant Handler as axum handler
participant Service as PaladinExecutionService
participant Agent as Paladin
Client->>Handler: POST /agents/{id}/execute
Handler->>Service: execute(agent, input)
Service->>Agent: run (LLM + tools + memory)
Agent-->>Service: PaladinResult
Service-->>Handler: output
Handler-->>Client: 200 JSON { output }
Example: agents behind Axum
The handler looks an agent up by id in the shared registry and runs it. cargo check
compiles this in full — including the axum::serve bind — so it can never drift from the
real API:
#![allow(unused)] fn main() { use std::collections::HashMap; use std::sync::Arc; use std::time::Duration; use axum::extract::{Path, State}; use axum::http::StatusCode; use axum::routing::post; use axum::{Json, Router}; use serde::{Deserialize, Serialize}; use paladin::MockLlmAdapter; use paladin::application::services::paladin::paladin_execution_service::PaladinExecutionService; use paladin::infrastructure::resilience::circuit_breaker::CircuitBreaker; use paladin::prelude::*; // PaladinBuilder, LlmPort, Paladin, PaladinResult /// Shared state: a registry of distinct agents, each with its own execution /// service, all resident in this one long-running process. #[derive(Clone)] struct AppState { agents: Arc<HashMap<String, (Paladin, Arc<PaladinExecutionService>)>>, } #[derive(Deserialize)] struct ExecuteRequest { input: String, } #[derive(Serialize)] struct ExecuteResponse { output: String, } /// `POST /agents/{id}/execute` — look the agent up by id and run it. This handler /// is **yours to write**: Paladin ships no agent-execution endpoint /// (`paladin-web::create_app_router` is a separate user/auth API, not an agent /// runner), so you compose `axum` + `PaladinExecutionService` yourself. async fn execute_agent( State(state): State<AppState>, Path(id): Path<String>, Json(req): Json<ExecuteRequest>, ) -> Result<Json<ExecuteResponse>, StatusCode> { let (paladin, service) = state.agents.get(&id).ok_or(StatusCode::NOT_FOUND)?; let result: PaladinResult = service .execute(paladin, &req.input) .await .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?; Ok(Json(ExecuteResponse { output: result.output, })) } /// Wire the agent registry into an `axum` router. fn agent_router(state: AppState) -> Router { Router::new() .route("/agents/{id}/execute", post(execute_agent)) .with_state(state) } /// Build a couple of distinct agents and serve them over HTTP. Concurrent /// requests share the registry and run on the `tokio` runtime. pub async fn serve_agents() -> Result<(), Box<dyn std::error::Error>> { let llm: Arc<dyn LlmPort> = Arc::new(MockLlmAdapter::new()); let breaker = Arc::new(CircuitBreaker::new(5, 2, Duration::from_secs(30))); let mut agents = HashMap::new(); for (name, prompt) in [ ("researcher", "You research topics thoroughly."), ("summarizer", "You write concise summaries."), ] { let agent = PaladinBuilder::new(llm.clone()) .name(name) .system_prompt(prompt) .build() .await?; let service = Arc::new(PaladinExecutionService::new( llm.clone(), breaker.clone(), None, None, )); agents.insert(name.to_string(), (agent, service)); } let state = AppState { agents: Arc::new(agents), }; let app = agent_router(state); let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await?; axum::serve(listener, app).await?; Ok(()) } }
Configuring the host
Host and per-agent settings typically come from your config.yml rather than being
hard-coded. A minimal shape:
host:
bind_address: "0.0.0.0:3000"
agents:
- id: "researcher"
model: "gpt-4"
system_prompt: "You research topics thoroughly."
- id: "summarizer"
model: "gpt-4"
system_prompt: "You write concise summaries."
See also
- The bundled user/auth routes (
paladin-web) a real service often also needs — Crate Map & Feature Flags. - Running the same agent host in a separate process, called over the network — Sidecar.
← Back to Choosing a topology