How we built a governed multi-agent research pipeline
A practical walkthrough of the orchestration patterns, subagent lifecycle management, and governance primitives that underpin our production research workflows.
Most agentic AI content stops at the demo. A single agent calls a tool, produces output, and the architecture diagram is complete. Production systems are different. When you need a research pipeline that ingests filings, surfaces alternative data, synthesises analyst context, and delivers a risk-scored output to a portfolio manager, the hard parts are not the LLM calls. They are orchestration, state, failure recovery, and governance.
This post describes how we approach these problems at Applied Agentic, drawing on patterns we have shipped for capital markets clients.
The orchestration problem
A research pipeline is not a linear chain. It is a directed graph with conditional branching, parallel fan-out, and human checkpoints. Consider the quant workflow we demonstrate as Case Study 1:
- Data ingestion — pull SEC filings, alternative data feeds, and market data in parallel
- Extraction — parse unstructured documents into structured signals
- Synthesis — cross-reference signals against a knowledge base
- Risk scoring — compute tail-risk metrics, attribution, and factor exposure
- Delivery — present results to a human for review before any downstream action
Each step may spawn sub-tasks. A filing analysis might delegate entity resolution to a specialised agent while the parent continues parsing other documents. This is where most frameworks either force you into a rigid sequential model or give you unconstrained flexibility that is impossible to govern.
What we chose and why
We evaluated several orchestration approaches in early 2026. The landscape shifted significantly: one major open-source framework entered maintenance mode, while newer entrants promised enterprise governance features but lacked production maturity.
Our decision matrix prioritised three properties:
| Criterion | Why it matters |
|---|---|
| Checkpoint persistence | Pipeline recovery after timeout or failure |
| Human-in-the-loop primitives | Approval gates at decision points |
| Interoperability | Ability to call external tools via standard protocols |
We settled on a role-based orchestration layer paired with a data framework for knowledge retrieval. The orchestration layer provides sequential, parallel, and conditional task flows with built-in checkpointing. The data layer handles document parsing, hybrid search, and retrieval-augmented generation. Neither alone is sufficient. Together they cover the full pipeline.
Subagent lifecycle management
When a parent agent delegates a sub-task, the sub-agent runs in its own session. This creates several operational challenges:
The session tracking problem
Each sub-agent gets a unique session identifier. The parent can query status (running, succeeded, failed, timed out) and access the full transcript. But the original task description is not stored as a structured field — it is embedded in the system prompt of the transcript. The execution plan is not stored at all. Progress percentages do not exist.
For a single agent, this is manageable. For a pipeline with dozens of concurrent sub-agents across multiple research tracks, you need structured tracking.
What we built
We added a persistent task registry that records:
| Field | Purpose |
|---|---|
| Task identifier | Unique reference for cross-session lookup |
| Parent session link | Traceability back to the originating pipeline |
| Planned phases | Structured decomposition of the sub-task |
| Completed phases | Progress tracking per phase |
| Status | Current state (queued, running, succeeded, failed, timed out) |
| Transcript path | Full session log for audit |
This registry survives process restarts. If a sub-agent times out, the system knows which phases completed and which need re-execution. The pipeline can resume from the last checkpoint rather than restarting from scratch.
The timeout problem
Sub-agents have execution time limits. When a task exceeds its window, the work is lost unless you have explicit checkpointing. Our approach: decompose sub-tasks into phases, and have each phase write progress to the registry before proceeding to the next. On timeout, the pipeline reads the registry, identifies incomplete phases, and re-spawns only the missing work.
This is not elegant. It is reliable. In production, reliability wins.
Governance primitives
For regulated industries, the governance layer is not optional. It is the product. Two patterns matter:
Human-in-the-loop gates
Certain decisions require human approval before execution. The pipeline does not proceed until a human reviewer explicitly approves or rejects. This is implemented as a blocking checkpoint in the task flow:
- The agent completes its analysis and presents a recommendation
- The pipeline pauses and notifies the human reviewer
- The reviewer approves, rejects, or requests modification
- The pipeline resumes with the reviewer’s decision as input
This is straightforward to implement. The harder part is deciding which decisions require human gates and which can be automated. Our default: any action with financial, legal, or client-facing consequences requires a gate.
Audit trails
Every agent action, tool call, and decision point is logged. Not just the final output — the full reasoning chain. This serves two purposes:
- Debugging — when a pipeline produces an unexpected result, you can trace exactly where the reasoning diverged
- Compliance — regulators can inspect the decision process, not just the outcome
The transcript files we maintain provide this audit trail automatically. The structured task registry adds a layer of machine-readable metadata on top.
Lessons learned
After shipping several production pipelines, a few patterns have proven durable:
Decompose aggressively. A sub-task that does too much is a sub-task that will time out. Break work into phases that complete within minutes, not hours.
Checkpoint everything. If a phase completes but the next one fails, you should not re-run the completed phase. Write progress after every meaningful step.
Governance is architecture. Retrofitting human-in-the-loop into a pipeline that was designed for full automation is painful. Build the gates in from the start.
Monitor sub-agent health. Without active monitoring, failed sub-agents silently lose work. A periodic health check that compares expected vs. actual sub-agent status catches failures before they compound.
What this enables
This infrastructure powers the quant research pipeline we demonstrate on our case studies page. The same patterns apply to any multi-step research workflow: regulatory filing analysis, competitive intelligence gathering, due diligence pipelines, and knowledge synthesis across heterogeneous data sources.
The governance primitives are particularly relevant for firms operating under regulatory oversight. When your AI system makes or informs decisions, you need to demonstrate not just what it decided, but how and why — with human checkpoints at the points that matter.
If you are evaluating how agentic AI fits into a regulated research workflow, we are happy to walk through our approach. Get in touch.