Context Refinery
LangGraph Multi-Agent Context Orchestration Engine
Cross-platform desktop engine that turns raw sources into clean, reranked, self-refined LLM context through a four-stage agentic pipeline.
Context
Context Refinery is a cross-platform desktop application (Windows/Mac/Linux) built with Tauri v2, Vue 3, and FastAPI. It orchestrates a LangGraph multi-agent pipeline that takes a query and a corpus and returns high-quality, reranked, self-refined context — ready to drop into any LLM. It runs against GPT-4o, Claude, and Gemini, or fully local models (LLaMA / Mistral) via Ollama. v0.1.0 is released and MIT licensed.
The problem
The quality of any LLM answer is capped by the quality of the context it's given. Naive single-pass retrieval returns noisy, redundant, or off-target chunks, with no feedback loop to catch it. The challenge was an agentic pipeline that retrieves broadly, reranks precisely, then critiques and re-refines its own context until it clears a quality threshold — all in a privacy-friendly desktop app that can run entirely offline.
How I built it
Designed a four-stage LangGraph state machine: intent analysis → hybrid retrieval → cross-encoder reranking → iterative self-refinement
Implemented hybrid retrieval combining ChromaDB dense vectors, BM25 lexical search, and Reciprocal Rank Fusion to merge both rankings
Added cross-encoder reranking (ms-marco-MiniLM-L-6-v2) to reorder candidates by true query relevance
Built a self-refinement loop that scores the assembled context and re-runs (up to 3 iterations) when it falls below threshold
Wrapped it in a Tauri v2 + Vue 3 desktop UI on a FastAPI backend with multi-provider support (GPT-4o, Claude, Gemini, local LLaMA/Mistral via Ollama) and token-budget control (512–32K)
Why these choices
LangGraph over sequential chains
A state machine gives explicit conditional branching and loops — essential for a refinement stage that re-runs based on eval scores rather than following a hardcoded path.
Hybrid retrieval with Reciprocal Rank Fusion
Dense vectors capture semantics while BM25 captures exact terms; RRF merges both rankings so neither failure mode dominates retrieval.
Cross-encoder reranking
Bi-encoder retrieval is fast but approximate. A cross-encoder rescoring the shortlist of query–document pairs directly sharply improves top-k precision.
Tauri v2 with local Ollama support
A Rust-based desktop shell keeps the bundle small and lets the whole pipeline run offline against local models — no keys or documents leave the machine.