Visual Document Search
Search by layout, charts, and figures — not just text. Perfect for diagrams, infographics, and formatted reports.
Key capabilities
Frequently asked questions
What is visual document search?
Visual document search uses the ColPali model to create embeddings from the visual layout and content of each document page, not just the extracted text. This means you can find documents by how they look — charts, diagrams, formatted tables, and complex layouts are all captured. The model produces 128-dimensional multi-vectors per page, stored in a dedicated Qdrant collection alongside your text embeddings.
How fast is visual search?
Query-time visual search runs at ~400 ms via Qdrant native MaxSim, which computes maximum similarity across all patch vectors entirely server-side in C++ with a single query call. Indexing runs at ~830 ms per page on a Modal serverless T4 GPU (warm). Cold starts take ~48 seconds but only occur after extended idle periods. Visual search results are fused with semantic and BM25 results via 3-way Reciprocal Rank Fusion.
Do I need to configure visual search?
No. When visual retrieval is enabled (the default), visual indexing runs automatically at upload time as a parallel task alongside text embedding. PDF pages are rendered at 150 DPI, low-variance patches (whitespace) are filtered out, and the top 100 patches per page are stored. No user-facing configuration is needed — it works out of the box for all PDF uploads.