Content Processing

The paladin-content crate (crates/paladin-content/) ingests content from external sources, runs it through aggregation/analysis use cases, hands it to a Paladin agent for AI enrichment, and delivers the result. This guide covers the ingestion adapters, the processing use cases, the content → agent bridge, and delivery — documenting only what is wired into the compiled crate today.

Every code example targets the current v0.5.0 workspace. The substantive examples are real, compiled code pulled from the paladin-doc-examples crate via mdBook {{#include}} (a few illustrative fragments are rust,ignore). The API forms are verified against crates/paladin-content/src/.

Feature flags. Content processing lives behind the root content-processing feature, which enables paladin-content. Within the crate, news-api enables the News API fetcher and llm enables LLM-powered analysis. See the Crate Map for the full flag table.


Table of Contents

  1. Content Ingestion Sources
  2. Aggregation and the Processing Pipeline
  3. Content → Agent Bridge
  4. Content Delivery
  5. Capabilities and Limitations
  6. See Also

Content Ingestion Sources

Every fetcher produces a ContentItem (paladin_core::platform::container::content::ContentItem), the common currency of the pipeline. Sources are constructed and configured programmatically (there is no dedicated content: section in config.yml yet — see Limitations).

PDF / documents — PdfExtractor

PdfExtractor parses a PDF (from a path or raw bytes) into a Document. DocumentAdapter wraps document parsing for the pipeline.

#![allow(unused)]
fn main() {
use paladin_content::adapters::document::pdf_extractor::PdfExtractor;
use std::path::Path;

/// Extract a PDF (from a path or raw bytes) into a `Document`.
pub fn ingest_pdf() -> Result<(), Box<dyn std::error::Error>> {
    let extractor = PdfExtractor::new();
    let document = extractor.extract(Path::new("./reports/q3-earnings.pdf"))?;
    // Or from bytes already in memory:
    // let document = extractor.extract_bytes(&pdf_bytes)?;
    Ok(())
}
}

HTTP endpoints — HttpContentFetcher

HttpContentFetcher fetches a URL and returns a ContentItem. It implements the ContentFetchingService trait, so it can be driven directly or through the FetchContent use case.

#![allow(unused)]
fn main() {
use paladin_content::adapters::input::http_content_fetcher::HttpContentFetcher;
use paladin_content::services::content_fetching_service::{ContentFetchingService, FetchContent};

/// Fetch a URL into a `ContentItem`, directly and via the `FetchContent` use case.
pub fn ingest_http() -> Result<(), Box<dyn std::error::Error>> {
    let fetcher = HttpContentFetcher::new();
    // Direct use:
    let item = fetcher.fetch_content("https://example.com/article")?;

    // Or wrapped in the use case (same trait, swappable adapter):
    let fetch = FetchContent::new(HttpContentFetcher::new());
    let item = fetch.execute("https://example.com/article")?;
    Ok(())
}
}

News / feeds — NewsApiFetcher (feature news-api)

NewsApiFetcher polls a News API endpoint. It takes an API key and reuses an HttpContentFetcher for transport.

#![allow(unused)]
fn main() {
use paladin_content::adapters::input::news_api_fetcher::NewsApiFetcher;

/// Construct a News API fetcher (feature `news-api`).
pub fn ingest_news() {
    let fetcher = NewsApiFetcher::new("YOUR_NEWS_API_KEY".to_string())
        .with_content_fetcher(HttpContentFetcher::new());
}
}

Files — FileContentFetcher

For local ingestion and testing, FileContentFetcher reads a file from disk and infers its content type from the extension. Unlike the HTTP fetcher, it implements ContentIngestionPort (paladin_ports::input): its fetch_content takes a ContentItem describing the source path and returns a populated ContentItem. (It is an internal #[doc(hidden)] adapter; the primary documented ingestion paths are HTTP, PDF, and the News API above.)


Aggregation and the Processing Pipeline

Once items are fetched, the use cases combine and analyze them. Each use case is generic over a trait, so adapters are swappable.

StageUse case / typeTraitWhat it does
FetchFetchContent<T>ContentFetchingServiceURL → ContentItem
AggregateAggregateContent<T>ContentListServiceCombine many sources into one JSON view
SummarizeContentSummarizerBrief/detailed summaries, keyword extraction
AnalyzeAnalyzeContent<T>ContentAnalysisServiceRun an analysis over a ContentItem
Analyze (AI)LlmContentAnalyzer— (feature llm)LLM enrichment — see next section
flowchart LR
    src[(Sources: PDF / HTTP / News / File)] --> fetch[FetchContent]
    fetch --> agg[AggregateContent]
    agg --> sum[ContentSummarizer]
    sum --> ai[LlmContentAnalyzer]
    ai --> deliver[DeliverContentUseCase]
    deliver --> out[(Destinations)]

Aggregation

AggregateContent wraps a ContentListService and merges a vector of JSON values into a single aggregated value — useful for collapsing multiple fetched sources before analysis.

#![allow(unused)]
fn main() {
use paladin_content::services::content_aggregator_service::AggregateContent;

/// Merge JSON from several sources into one aggregated value.
pub fn aggregate() {
    // `MockListService` implements the `ContentListService` trait.
    let aggregator = AggregateContent::new(MockListService);
    let source_a = serde_json::json!({ "title": "A" });
    let source_b = serde_json::json!({ "title": "B" });
    let aggregated = aggregator.execute(vec![source_a, source_b]);
}
}

Summarization

ContentSummarizer produces summaries and keywords without an LLM call (deterministic text processing), returning a ContentSummary plus ContentMetadata.

#![allow(unused)]
fn main() {
use paladin_content::services::content_summarizer_service::ContentSummarizer;

/// Summarize a `ContentItem` and extract keywords (no LLM call).
pub fn summarize() {
    let item = text_content_item("A long article body about quarterly earnings...");
    let summarizer = ContentSummarizer::new();
    let summary = summarizer.summarize_content(&item, 500); // max 500 chars
    let keywords = summarizer.extract_keywords(&item);
}
}

Content → Agent Bridge

The llm feature enables LlmContentAnalyzer, which passes a ContentItem plus a prompt to a Paladin LLM analysis service for AI enrichment. This is the seam where the content pipeline meets the agent layer.

LlmContentAnalyzer::analyze_with_prompt_async takes an LlmContentAnalysisInput (prompt: PromptItem, content: ContentItem) and an LlmContentAnalysisConfig (model, retries, timeout, max_content_length), and returns the analysis as JSON.

#![allow(unused)]
fn main() {
use paladin_content::services::content_llm_analysis_service::{
    LlmContentAnalysisConfig, LlmContentAnalysisInput, LlmContentAnalyzer,
};
use paladin_llm::llm_analysis_service::LlmAnalysisService;
use paladin_llm::mock::MockLlmAdapter;
use paladin_ports::output::llm_port::LlmPort;

/// Pass content + a prompt to a Paladin LLM service for AI enrichment.
pub async fn content_to_agent() -> Result<(), Box<dyn std::error::Error>> {
    // In production this is a real provider (e.g. OpenAIAdapter); here a mock.
    let llm: Arc<dyn LlmPort> =
        Arc::new(MockLlmAdapter::new().with_response("{\"summary\":\"...\"}"));
    let llm_service = Arc::new(LlmAnalysisService::new(llm));

    let analyzer = LlmContentAnalyzer::new(llm_service);
    let input = LlmContentAnalysisInput {
        prompt: text_prompt_item("Summarize the key risks in this article."),
        content: text_content_item("Latest article body..."),
    };
    let config = LlmContentAnalysisConfig::default(); // gpt-3.5-turbo, 3 retries, 30s timeout

    let analysis = analyzer
        .analyze_with_prompt_async(&input, &config)
        .await
        .map_err(|e| -> Box<dyn std::error::Error> { e.into() })?;
    println!("{}", serde_json::to_string_pretty(&analysis)?);
    Ok(())
}
}

Use the async method (analyze_with_prompt_async). The sync analyze_with_prompt is a compatibility stub that returns an error directing callers to the async path.

For richer agent interactions — an agent that triggers a workflow, or a workflow step that invokes a full Paladin agent loop — see the Agent ↔ Orchestrator Bridge.


Content Delivery

DeliverContentUseCase sends processed content to a destination through the ContentDeliveryService port (paladin_ports::output::content_delivery_port). It takes a DeliveryRequest and returns a DeliveryResponse (with a DeliveryStatus).

#![allow(unused)]
fn main() {
use paladin_content::services::content_delivery_service::DeliverContentUseCase;
use paladin_ports::output::content_delivery_port::{
    ContentPayload, DeliveryMethod, DeliveryPriority, DeliveryRequest,
};

/// Deliver processed content through a `ContentDeliveryService`.
pub fn deliver() -> Result<(), Box<dyn std::error::Error>> {
    let delivery = DeliverContentUseCase::new(MockDeliveryAdapter);

    let request = DeliveryRequest {
        recipient_id: "ops-team".to_string(),
        delivery_method: DeliveryMethod::Email {
            to: "ops@example.com".to_string(),
            subject: "Daily digest".to_string(),
        },
        content_payload: ContentPayload::SingleItem(text_content_item("Digest body...")),
        priority: DeliveryPriority::Normal,
        scheduled_time: None,
        metadata: None,
    };

    let response = delivery.execute(request)?;
    println!("delivery status: {:?}", response.status);
    Ok(())
}
}

For push/email/system notification of delivered content, wire the delivery adapter to the notification adapters (paladin-notifications) or fire a notification through the orchestrator bridge — see the bridge recipes.


Capabilities and Limitations

The crate's manifest declares some features whose adapters are not yet implemented in v0.5.0. To keep this guide honest:

CapabilityStatus
PDF extraction (PdfExtractor)✅ Implemented
HTTP fetching (HttpContentFetcher)✅ Implemented
News API ingestion (NewsApiFetcher, feature news-api)✅ Implemented
File / local ingestion✅ Implemented
Aggregation, summarization, analysis use cases✅ Implemented
LLM content analysis (LlmContentAnalyzer, feature llm)✅ Implemented
Content delivery (DeliverContentUseCase)✅ Implemented
Web scraping (web-scraping feature)⚠️ Feature/dep declared, no adapter yet
RSS/Atom feeds (rss feature)⚠️ Feature/dep declared, no adapter yet
Filtering & deduplication (content_filtering_service)⚠️ Module present but disabled (not compiled)

For web-scraping and RSS today, fetch the raw resource with HttpContentFetcher and parse it in your own adapter. Filtering/dedup must likewise be done in caller code until the content_filtering_service module is completed and re-enabled.


See Also

  • Agent ↔ Orchestrator Bridge — end-to-end recipes combining content ingestion with agent analysis and notification.
  • Orchestration — running the analysis Paladin inside a Battalion workflow.
  • Paladin Agents — building the Paladin that performs the AI enrichment.
  • Crate Mappaladin-content exports and feature flags.