Production RAG System
Enterprise Knowledge Retrieval & LLM Evaluation at T-Mobile
Sub-2-second retrieval across 10,000+ pages with 60% reduction in hallucinations through systematic evaluation.
Context
At T-Mobile (contracted through Innovcentric LLC), I built the end-to-end RAG pipeline powering customer support knowledge retrieval. The system ingests 10,000+ pages of support documentation, chunks and embeds them intelligently, and serves precise answers to 100+ support agents in under two seconds. I also built the evaluation framework that ensured retrieval quality and reduced factual errors by approximately 60%.
The problem
T-Mobile's customer support documentation spanned 10,000+ pages across multiple formats and update cycles. Support agents needed fast, accurate answers grounded in current documentation. The challenge was building a retrieval system that was both fast and faithful — and proving it with measurable evaluation, not anecdotal testing.
How I built it
Designed a document ingestion pipeline supporting PDF and DOCX formats with structured metadata extraction
Implemented recursive chunking with 512-token windows and 50-token overlap to preserve context boundaries
Used OpenAI embeddings with ChromaDB vector indexing and metadata filtering for precise retrieval
Built an LLM evaluation framework using RAGAS metrics (faithfulness, relevance, hallucination scoring) and LangSmith observability
Engineered 50+ prompt templates with safety guardrails, tone alignment, and escalation logic
Benchmarked across multiple chunking strategies and embedding models to optimize retrieval quality
Why these choices
Recursive chunking over fixed-size splits
Recursive chunking respects document structure (headings, paragraphs, lists), preserving semantic coherence within chunks. This improved retrieval relevance compared to naive fixed-window approaches.
RAGAS + LangSmith for evaluation
Subjective quality assessment doesn't scale. RAGAS provides quantitative metrics for faithfulness and relevance, while LangSmith enables trace-level debugging of retrieval and generation steps.
Metadata filtering alongside semantic search
Pure semantic search can surface contextually similar but categorically wrong results. Metadata filtering (document type, recency, department) ensures retrieval stays within appropriate boundaries.