Agentic SystemsNovember 20, 2025·8 min read

Multi-Agent Context Orchestration with LangGraph

How Context Refinery turns raw sources into clean, reranked, self-refined LLM context with a four-stage LangGraph pipeline.

The quality of an LLM's answer is capped by the quality of the context you hand it. Naive single-pass retrieval returns noisy, redundant, or off-target chunks — and most pipelines never check their own work.

To fix that, I built Context Refinery: a cross-platform desktop app (Tauri v2 + Vue 3 + FastAPI) that runs a LangGraph multi-agent pipeline to retrieve, rerank, and self-refine context before it ever reaches the model.

The Four-Stage Pipeline

Context Refinery models retrieval as a LangGraph state machine with four stages:

Intent analysis — interpret what the query is actually asking for.
Hybrid retrieval — ChromaDB dense vectors + BM25 lexical search, merged with Reciprocal Rank Fusion (RRF).
Cross-encoder reranking — ms-marco-MiniLM-L-6-v2 rescores the shortlist by true query relevance.
Iterative self-refinement — score the assembled context; if it's below threshold, loop back and retrieve again (up to 3 iterations).

Why Hybrid Retrieval + RRF

Dense vectors capture meaning; BM25 captures exact terms (error codes, function names, rare tokens). Each fails where the other shines. Reciprocal Rank Fusion merges the two ranked lists into one, so a result that's strong in either signal survives:

# Reciprocal Rank Fusion across dense + sparse rankings
def rrf(rankings, k=60):
    scores = defaultdict(float)
    for ranking in rankings:        # e.g. [dense_hits, bm25_hits]
        for rank, doc_id in enumerate(ranking):
            scores[doc_id] += 1 / (k + rank)
    return sorted(scores, key=scores.get, reverse=True)

Why a Cross-Encoder

Bi-encoder retrieval is fast but approximate — it embeds the query and each document separately. A cross-encoder reads the query and a candidate together and scores the pair directly. It's too slow to run over a whole corpus, but it's perfect for reranking a 20–50 candidate shortlist, where it sharply improves top-k precision.

Why LangGraph (and Tauri)

The self-refinement stage isn't a fixed path — it loops based on an eval score. A LangGraph state machine makes that conditional branching explicit and debuggable, unlike a nested chain. Wrapping it in a Tauri v2 desktop shell means the whole pipeline can run offline against local models (LLaMA / Mistral via Ollama) — no keys or documents leave the machine.

Takeaways

Retrieval quality is a hybrid problem — dense-only or lexical-only each leave relevance on the table. And an eval-gated refinement loop can beat reaching for a bigger model: better context in, better answer out, and it runs locally. v0.1.0 is released and MIT licensed.