Skip to content
NR
02/Case Study

Context Refinery

LangGraph Multi-Agent Context Orchestration Engine

RoleCreator & Maintainer
Timeline2026 – Present
StackPython, LangGraph, FastAPI
SourceGitHub
Impact

Cross-platform desktop engine that turns raw sources into clean, reranked, self-refined LLM context through a four-stage agentic pipeline.

Overview

Context

Context Refinery is a cross-platform desktop application (Windows/Mac/Linux) built with Tauri v2, Vue 3, and FastAPI. It orchestrates a LangGraph multi-agent pipeline that takes a query and a corpus and returns high-quality, reranked, self-refined context — ready to drop into any LLM. It runs against GPT-4o, Claude, and Gemini, or fully local models (LLaMA / Mistral) via Ollama. v0.1.0 is released and MIT licensed.

Challenge

The problem

The quality of any LLM answer is capped by the quality of the context it's given. Naive single-pass retrieval returns noisy, redundant, or off-target chunks, with no feedback loop to catch it. The challenge was an agentic pipeline that retrieves broadly, reranks precisely, then critiques and re-refines its own context until it clears a quality threshold — all in a privacy-friendly desktop app that can run entirely offline.

Approach

How I built it

01

Designed a four-stage LangGraph state machine: intent analysis → hybrid retrieval → cross-encoder reranking → iterative self-refinement

02

Implemented hybrid retrieval combining ChromaDB dense vectors, BM25 lexical search, and Reciprocal Rank Fusion to merge both rankings

03

Added cross-encoder reranking (ms-marco-MiniLM-L-6-v2) to reorder candidates by true query relevance

04

Built a self-refinement loop that scores the assembled context and re-runs (up to 3 iterations) when it falls below threshold

05

Wrapped it in a Tauri v2 + Vue 3 desktop UI on a FastAPI backend with multi-provider support (GPT-4o, Claude, Gemini, local LLaMA/Mistral via Ollama) and token-budget control (512–32K)

Technical Decisions

Why these choices

LangGraph over sequential chains

A state machine gives explicit conditional branching and loops — essential for a refinement stage that re-runs based on eval scores rather than following a hardcoded path.

Hybrid retrieval with Reciprocal Rank Fusion

Dense vectors capture semantics while BM25 captures exact terms; RRF merges both rankings so neither failure mode dominates retrieval.

Cross-encoder reranking

Bi-encoder retrieval is fast but approximate. A cross-encoder rescoring the shortlist of query–document pairs directly sharply improves top-k precision.

Tauri v2 with local Ollama support

A Rust-based desktop shell keeps the bundle small and lets the whole pipeline run offline against local models — no keys or documents leave the machine.

Outcomes

What shipped

Four-stage LangGraph pipeline: intent → hybrid retrieval → rerank → self-refinement
Hybrid ChromaDB + BM25 retrieval with Reciprocal Rank Fusion
Cross-encoder reranking and an eval-gated self-refinement loop (up to 3 iterations)
Multi-provider: GPT-4o, Claude, Gemini, and local LLaMA/Mistral via Ollama
Cross-platform desktop app (Windows/Mac/Linux), v0.1.0 released, MIT licensed
Takeaways

What I learned

LangGraph's explicit state model makes multi-agent pipelines far easier to debug than nested chains
Retrieval quality is a hybrid problem — dense-only or lexical-only each leave relevance on the table
An eval-gated refinement loop can beat a bigger model for context quality — and it runs locally