Question 1

What is visual document search?

Accepted Answer

Visual document search uses the ColPali model to create embeddings from the visual layout and content of each document page, not just the extracted text. This means you can find documents by how they look — charts, diagrams, formatted tables, and complex layouts are all captured. The model produces 128-dimensional multi-vectors per page, stored in a dedicated Qdrant collection alongside your text embeddings.

Question 2

How fast is visual search?

Accepted Answer

Query-time visual search runs at ~400 ms via Qdrant native MaxSim, which computes maximum similarity across all patch vectors entirely server-side in C++ with a single query call. Indexing runs at ~830 ms per page on a Modal serverless T4 GPU (warm). Cold starts take ~48 seconds but only occur after extended idle periods. Visual search results are fused with semantic and BM25 results via 3-way Reciprocal Rank Fusion.

Question 3

Do I need to configure visual search?

Accepted Answer

No. When visual retrieval is enabled (the default), visual indexing runs automatically at upload time as a parallel task alongside text embedding. PDF pages are rendered at 150 DPI, low-variance patches (whitespace) are filtered out, and the top 100 patches per page are stored. No user-facing configuration is needed — it works out of the box for all PDF uploads.

Visual Document Search

Key capabilities

Frequently asked questions

What is visual document search?

How fast is visual search?

Do I need to configure visual search?

Try it yourself

Visual Document Search

Key capabilities

Frequently asked questions

What is visual document search?

How fast is visual search?

Do I need to configure visual search?

Try it yourself