2026-06-12 · 8 min read

How we built a governed multi-agent research pipeline

A practical walkthrough of the orchestration patterns, subagent lifecycle management, and governance primitives that underpin our production research workflows.

orchestrationmulti-agentgovernancecapital-markets

Most agentic AI content stops at the demo. A single agent calls a tool, produces output, and the architecture diagram is complete. Production systems are different. When you need a research pipeline that ingests filings, surfaces alternative data, synthesises analyst context, and delivers a risk-scored output to a portfolio manager, the hard parts are not the LLM calls. They are orchestration, state, failure recovery, and governance.

This post describes how we approach these problems at Applied Agentic, drawing on patterns we have shipped for capital markets clients.

The orchestration problem

A research pipeline is not a linear chain. It is a directed graph with conditional branching, parallel fan-out, and human checkpoints. Consider the quant workflow we demonstrate as Case Study 1:

Data ingestion — pull SEC filings, alternative data feeds, and market data in parallel
Extraction — parse unstructured documents into structured signals
Synthesis — cross-reference signals against a knowledge base
Risk scoring — compute tail-risk metrics, attribution, and factor exposure
Delivery — present results to a human for review before any downstream action

Each step may spawn sub-tasks. A filing analysis might delegate entity resolution to a specialised agent while the parent continues parsing other documents. This is where most frameworks either force you into a rigid sequential model or give you unconstrained flexibility that is impossible to govern.

What we chose and why

We evaluated several orchestration approaches in early 2026. The landscape shifted significantly: one major open-source framework entered maintenance mode, while newer entrants promised enterprise governance features but lacked production maturity.

Our decision matrix prioritised three properties:

Criterion	Why it matters
Checkpoint persistence	Pipeline recovery after timeout or failure
Human-in-the-loop primitives	Approval gates at decision points
Interoperability	Ability to call external tools via standard protocols

We settled on a role-based orchestration layer paired with a data framework for knowledge retrieval. The orchestration layer provides sequential, parallel, and conditional task flows with built-in checkpointing. The data layer handles document parsing, hybrid search, and retrieval-augmented generation. Neither alone is sufficient. Together they cover the full pipeline.

Subagent lifecycle management

When a parent agent delegates a sub-task, the sub-agent runs in its own session. This creates several operational challenges:

The session tracking problem

Each sub-agent gets a unique session identifier. The parent can query status (running, succeeded, failed, timed out) and access the full transcript. But the original task description is not stored as a structured field — it is embedded in the system prompt of the transcript. The execution plan is not stored at all. Progress percentages do not exist.

For a single agent, this is manageable. For a pipeline with dozens of concurrent sub-agents across multiple research tracks, you need structured tracking.

What we built

We added a persistent task registry that records:

Field	Purpose
Task identifier	Unique reference for cross-session lookup
Parent session link	Traceability back to the originating pipeline
Planned phases	Structured decomposition of the sub-task
Completed phases	Progress tracking per phase
Status	Current state (queued, running, succeeded, failed, timed out)
Transcript path	Full session log for audit

This registry survives process restarts. If a sub-agent times out, the system knows which phases completed and which need re-execution. The pipeline can resume from the last checkpoint rather than restarting from scratch.

The timeout problem

Sub-agents have execution time limits. When a task exceeds its window, the work is lost unless you have explicit checkpointing. Our approach: decompose sub-tasks into phases, and have each phase write progress to the registry before proceeding to the next. On timeout, the pipeline reads the registry, identifies incomplete phases, and re-spawns only the missing work.

This is not elegant. It is reliable. In production, reliability wins.

Governance primitives

For regulated industries, the governance layer is not optional. It is the product. Two patterns matter:

Human-in-the-loop gates

Certain decisions require human approval before execution. The pipeline does not proceed until a human reviewer explicitly approves or rejects. This is implemented as a blocking checkpoint in the task flow:

The agent completes its analysis and presents a recommendation
The pipeline pauses and notifies the human reviewer
The reviewer approves, rejects, or requests modification
The pipeline resumes with the reviewer’s decision as input

This is straightforward to implement. The harder part is deciding which decisions require human gates and which can be automated. Our default: any action with financial, legal, or client-facing consequences requires a gate.

Audit trails

Every agent action, tool call, and decision point is logged. Not just the final output — the full reasoning chain. This serves two purposes:

Debugging — when a pipeline produces an unexpected result, you can trace exactly where the reasoning diverged
Compliance — regulators can inspect the decision process, not just the outcome

The transcript files we maintain provide this audit trail automatically. The structured task registry adds a layer of machine-readable metadata on top.

Lessons learned

After shipping several production pipelines, a few patterns have proven durable:

Decompose aggressively. A sub-task that does too much is a sub-task that will time out. Break work into phases that complete within minutes, not hours.

Checkpoint everything. If a phase completes but the next one fails, you should not re-run the completed phase. Write progress after every meaningful step.

Governance is architecture. Retrofitting human-in-the-loop into a pipeline that was designed for full automation is painful. Build the gates in from the start.

Monitor sub-agent health. Without active monitoring, failed sub-agents silently lose work. A periodic health check that compares expected vs. actual sub-agent status catches failures before they compound.

What this enables

This infrastructure powers the quant research pipeline we demonstrate on our case studies page. The same patterns apply to any multi-step research workflow: regulatory filing analysis, competitive intelligence gathering, due diligence pipelines, and knowledge synthesis across heterogeneous data sources.

The governance primitives are particularly relevant for firms operating under regulatory oversight. When your AI system makes or informs decisions, you need to demonstrate not just what it decided, but how and why — with human checkpoints at the points that matter.

If you are evaluating how agentic AI fits into a regulated research workflow, we are happy to walk through our approach. Get in touch.