Content Processing
The paladin-content crate (crates/paladin-content/) ingests content from external sources,
runs it through aggregation/analysis use cases, hands it to a Paladin agent for AI enrichment,
and delivers the result. This guide covers the ingestion adapters, the processing
use cases, the content → agent bridge, and delivery — documenting only what is wired
into the compiled crate today.
Every code example targets the current v0.5.0 workspace. The substantive examples are real, compiled code pulled from the
paladin-doc-examplescrate via mdBook{{#include}}(a few illustrative fragments arerust,ignore). The API forms are verified againstcrates/paladin-content/src/.
Feature flags. Content processing lives behind the root
content-processingfeature, which enablespaladin-content. Within the crate,news-apienables the News API fetcher andllmenables LLM-powered analysis. See the Crate Map for the full flag table.
Table of Contents
- Content Ingestion Sources
- Aggregation and the Processing Pipeline
- Content → Agent Bridge
- Content Delivery
- Capabilities and Limitations
- See Also
Content Ingestion Sources
Every fetcher produces a ContentItem
(paladin_core::platform::container::content::ContentItem), the common currency of the
pipeline. Sources are constructed and configured programmatically (there is no dedicated
content: section in config.yml yet — see Limitations).
PDF / documents — PdfExtractor
PdfExtractor parses a PDF (from a path or raw bytes) into a Document. DocumentAdapter
wraps document parsing for the pipeline.
#![allow(unused)] fn main() { use paladin_content::adapters::document::pdf_extractor::PdfExtractor; use std::path::Path; /// Extract a PDF (from a path or raw bytes) into a `Document`. pub fn ingest_pdf() -> Result<(), Box<dyn std::error::Error>> { let extractor = PdfExtractor::new(); let document = extractor.extract(Path::new("./reports/q3-earnings.pdf"))?; // Or from bytes already in memory: // let document = extractor.extract_bytes(&pdf_bytes)?; Ok(()) } }
HTTP endpoints — HttpContentFetcher
HttpContentFetcher fetches a URL and returns a ContentItem. It implements the
ContentFetchingService trait, so it can be driven directly or through the FetchContent
use case.
#![allow(unused)] fn main() { use paladin_content::adapters::input::http_content_fetcher::HttpContentFetcher; use paladin_content::services::content_fetching_service::{ContentFetchingService, FetchContent}; /// Fetch a URL into a `ContentItem`, directly and via the `FetchContent` use case. pub fn ingest_http() -> Result<(), Box<dyn std::error::Error>> { let fetcher = HttpContentFetcher::new(); // Direct use: let item = fetcher.fetch_content("https://example.com/article")?; // Or wrapped in the use case (same trait, swappable adapter): let fetch = FetchContent::new(HttpContentFetcher::new()); let item = fetch.execute("https://example.com/article")?; Ok(()) } }
News / feeds — NewsApiFetcher (feature news-api)
NewsApiFetcher polls a News API endpoint. It takes an API key and reuses an
HttpContentFetcher for transport.
#![allow(unused)] fn main() { use paladin_content::adapters::input::news_api_fetcher::NewsApiFetcher; /// Construct a News API fetcher (feature `news-api`). pub fn ingest_news() { let fetcher = NewsApiFetcher::new("YOUR_NEWS_API_KEY".to_string()) .with_content_fetcher(HttpContentFetcher::new()); } }
Files — FileContentFetcher
For local ingestion and testing, FileContentFetcher reads a file from disk and infers its
content type from the extension. Unlike the HTTP fetcher, it implements ContentIngestionPort
(paladin_ports::input): its fetch_content takes a ContentItem describing the source path
and returns a populated ContentItem. (It is an internal #[doc(hidden)] adapter; the primary
documented ingestion paths are HTTP, PDF, and the News API above.)
Aggregation and the Processing Pipeline
Once items are fetched, the use cases combine and analyze them. Each use case is generic over a trait, so adapters are swappable.
| Stage | Use case / type | Trait | What it does |
|---|---|---|---|
| Fetch | FetchContent<T> | ContentFetchingService | URL → ContentItem |
| Aggregate | AggregateContent<T> | ContentListService | Combine many sources into one JSON view |
| Summarize | ContentSummarizer | — | Brief/detailed summaries, keyword extraction |
| Analyze | AnalyzeContent<T> | ContentAnalysisService | Run an analysis over a ContentItem |
| Analyze (AI) | LlmContentAnalyzer | — (feature llm) | LLM enrichment — see next section |
flowchart LR
src[(Sources: PDF / HTTP / News / File)] --> fetch[FetchContent]
fetch --> agg[AggregateContent]
agg --> sum[ContentSummarizer]
sum --> ai[LlmContentAnalyzer]
ai --> deliver[DeliverContentUseCase]
deliver --> out[(Destinations)]
Aggregation
AggregateContent wraps a ContentListService and merges a vector of JSON values into a single
aggregated value — useful for collapsing multiple fetched sources before analysis.
#![allow(unused)] fn main() { use paladin_content::services::content_aggregator_service::AggregateContent; /// Merge JSON from several sources into one aggregated value. pub fn aggregate() { // `MockListService` implements the `ContentListService` trait. let aggregator = AggregateContent::new(MockListService); let source_a = serde_json::json!({ "title": "A" }); let source_b = serde_json::json!({ "title": "B" }); let aggregated = aggregator.execute(vec![source_a, source_b]); } }
Summarization
ContentSummarizer produces summaries and keywords without an LLM call (deterministic
text processing), returning a ContentSummary plus ContentMetadata.
#![allow(unused)] fn main() { use paladin_content::services::content_summarizer_service::ContentSummarizer; /// Summarize a `ContentItem` and extract keywords (no LLM call). pub fn summarize() { let item = text_content_item("A long article body about quarterly earnings..."); let summarizer = ContentSummarizer::new(); let summary = summarizer.summarize_content(&item, 500); // max 500 chars let keywords = summarizer.extract_keywords(&item); } }
Content → Agent Bridge
The llm feature enables LlmContentAnalyzer, which passes a ContentItem plus a prompt to a
Paladin LLM analysis service for AI enrichment. This is the seam where the content pipeline meets
the agent layer.
LlmContentAnalyzer::analyze_with_prompt_async takes an LlmContentAnalysisInput
(prompt: PromptItem, content: ContentItem) and an LlmContentAnalysisConfig
(model, retries, timeout, max_content_length), and returns the analysis as JSON.
#![allow(unused)] fn main() { use paladin_content::services::content_llm_analysis_service::{ LlmContentAnalysisConfig, LlmContentAnalysisInput, LlmContentAnalyzer, }; use paladin_llm::llm_analysis_service::LlmAnalysisService; use paladin_llm::mock::MockLlmAdapter; use paladin_ports::output::llm_port::LlmPort; /// Pass content + a prompt to a Paladin LLM service for AI enrichment. pub async fn content_to_agent() -> Result<(), Box<dyn std::error::Error>> { // In production this is a real provider (e.g. OpenAIAdapter); here a mock. let llm: Arc<dyn LlmPort> = Arc::new(MockLlmAdapter::new().with_response("{\"summary\":\"...\"}")); let llm_service = Arc::new(LlmAnalysisService::new(llm)); let analyzer = LlmContentAnalyzer::new(llm_service); let input = LlmContentAnalysisInput { prompt: text_prompt_item("Summarize the key risks in this article."), content: text_content_item("Latest article body..."), }; let config = LlmContentAnalysisConfig::default(); // gpt-3.5-turbo, 3 retries, 30s timeout let analysis = analyzer .analyze_with_prompt_async(&input, &config) .await .map_err(|e| -> Box<dyn std::error::Error> { e.into() })?; println!("{}", serde_json::to_string_pretty(&analysis)?); Ok(()) } }
Use the async method (
analyze_with_prompt_async). The syncanalyze_with_promptis a compatibility stub that returns an error directing callers to the async path.
For richer agent interactions — an agent that triggers a workflow, or a workflow step that invokes a full Paladin agent loop — see the Agent ↔ Orchestrator Bridge.
Content Delivery
DeliverContentUseCase sends processed content to a destination through the
ContentDeliveryService port (paladin_ports::output::content_delivery_port). It takes a
DeliveryRequest and returns a DeliveryResponse (with a DeliveryStatus).
#![allow(unused)] fn main() { use paladin_content::services::content_delivery_service::DeliverContentUseCase; use paladin_ports::output::content_delivery_port::{ ContentPayload, DeliveryMethod, DeliveryPriority, DeliveryRequest, }; /// Deliver processed content through a `ContentDeliveryService`. pub fn deliver() -> Result<(), Box<dyn std::error::Error>> { let delivery = DeliverContentUseCase::new(MockDeliveryAdapter); let request = DeliveryRequest { recipient_id: "ops-team".to_string(), delivery_method: DeliveryMethod::Email { to: "ops@example.com".to_string(), subject: "Daily digest".to_string(), }, content_payload: ContentPayload::SingleItem(text_content_item("Digest body...")), priority: DeliveryPriority::Normal, scheduled_time: None, metadata: None, }; let response = delivery.execute(request)?; println!("delivery status: {:?}", response.status); Ok(()) } }
For push/email/system notification of delivered content, wire the delivery adapter to the
notification adapters (paladin-notifications) or fire a notification through the orchestrator
bridge — see the bridge recipes.
Capabilities and Limitations
The crate's manifest declares some features whose adapters are not yet implemented in v0.5.0. To keep this guide honest:
| Capability | Status |
|---|---|
PDF extraction (PdfExtractor) | ✅ Implemented |
HTTP fetching (HttpContentFetcher) | ✅ Implemented |
News API ingestion (NewsApiFetcher, feature news-api) | ✅ Implemented |
| File / local ingestion | ✅ Implemented |
| Aggregation, summarization, analysis use cases | ✅ Implemented |
LLM content analysis (LlmContentAnalyzer, feature llm) | ✅ Implemented |
Content delivery (DeliverContentUseCase) | ✅ Implemented |
Web scraping (web-scraping feature) | ⚠️ Feature/dep declared, no adapter yet |
RSS/Atom feeds (rss feature) | ⚠️ Feature/dep declared, no adapter yet |
Filtering & deduplication (content_filtering_service) | ⚠️ Module present but disabled (not compiled) |
For web-scraping and RSS today, fetch the raw resource with HttpContentFetcher and parse it in
your own adapter. Filtering/dedup must likewise be done in caller code until the
content_filtering_service module is completed and re-enabled.
See Also
- Agent ↔ Orchestrator Bridge — end-to-end recipes combining content ingestion with agent analysis and notification.
- Orchestration — running the analysis Paladin inside a Battalion workflow.
- Paladin Agents — building the Paladin that performs the AI enrichment.
- Crate Map —
paladin-contentexports and feature flags.