Multi-Agent Prompt Optimization with LangGraph
Orchestrating specialized agents to iteratively refine, evaluate, and test LLM prompts.
title: "Multi-Agent Prompt Optimization with LangGraph" date: "2025-11-20T00:00:00Z" excerpt: "Orchestrating specialized agents to iteratively refine, evaluate, and test LLM prompts." tag: "LLM Systems" readTime: "8 min" accentColor: "blue"
Beyond Single-Pass Prompting
Writing effective prompts for LLMs requires iterative experimentation, but most developers lack systematic frameworks for optimization. A single-pass optimizer can't account for edge cases or competing quality dimensions.
To address this, I built Context Refinery, a desktop application using LangGraph that orchestrates a multi-agent pipeline for prompt optimization.
The Agent Pipeline
The system is designed as a LangGraph state machine with four specialized agents:
- Analyzer: Breaks down the user's initial prompt and intent.
- Rewriter: Applies established prompt engineering techniques to draft improvements.
- Evaluator: Scores outputs against user-defined criteria.
- Refiner: Triggers conditional loops when thresholds aren't met.
Why LangGraph?
State machine architecture enables conditional branching and loops — essential for a refinement pipeline where re-runs are data-driven, not hardcoded. Unlike sequential chains, LangGraph allows the Evaluator to route back to the Rewriter based on quality scores.
# Conceptual LangGraph routing
def evaluate_prompt(state: GraphState):
score = evaluator_agent.run(state.prompt)
if score < 0.8:
return "rewrite"
return "finalize"Desktop-First Privacy
Context Refinery is built with Electron and React. Local execution means no API keys leave the machine and the app works offline once models are configured — a strong privacy argument for enterprise prompt optimization workflows.
Takeaways
LangGraph's explicit state model makes debugging multi-agent pipelines dramatically easier. Furthermore, agent role specialization beats generalism for complex multi-step workflows. Specialized system prompts per agent produced significantly better role adherence than a generalist multi-task agent.