RAG Pipeline
Turn your documents into a searchable knowledge base. Every answer includes citations back to the source.
Key capabilities
Frequently asked questions
How does the retrieval pipeline work?
Every query passes through a multi-stage pipeline: spell correction (~10 ms, pyspellchecker with 120K-word dictionary), query rewriting (resolves pronouns via conversation history), strategy routing (selects from 4 retrieval strategies based on complexity), 3-way hybrid search (semantic embeddings + BM25 full-text + ColPali visual), Cohere reranking with Cross-Encoder fallback, and a semantic cache with 0.95 similarity threshold and 24-hour TTL for instant repeat-query responses.
What retrieval strategies are available?
Four strategies adapt to query complexity. Simple performs a single-pass semantic search for straightforward lookups. Multi-hop uses iterative query expansion with HyDE (hypothetical document embeddings) for nuanced questions. Iterative progressively refines results across multiple rounds. Sub-question decomposes complex queries into independent sub-parts, retrieves for each, then synthesizes a unified answer. The QueryRouter selects the best strategy automatically.
Does Leepi.ai support multi-document queries?
Yes. When a query spans multiple documents, the context builder groups retrieved chunks by document and inserts clear document headers between groups. A per-document token cap (default 70%) prevents any single source from dominating the context window. The LLM prompt includes explicit instructions to attribute values by document name and flag conflicts when sources disagree.