Multimodal Processing
Drop in any file and start asking questions. Leepi handles the rest.
Key capabilities
Frequently asked questions
What file types does Leepi.ai support?
Leepi.ai processes a wide range of formats: PDF (including scanned documents via OCR), Office documents (DOCX, XLSX, PPTX), images (PNG, JPEG, GIF, WEBP), audio (MP3, WAV, M4A, OGG, FLAC up to 25 MB), and video files. Every format is converted into searchable, queryable text with full citation support.
How does PDF extraction work?
PDFs pass through a 5-tier extraction chain. DocLing with Tesseract OCR is the primary engine, running parallel batched layout analysis, OCR, and table detection. If DocLing fails, PyMuPDF tries next, then pdfplumber, then LlamaParse, and finally VisionOCR which renders each page as an image and sends it to GPT-4o for text extraction. Large documents are processed in 30-page batches to keep peak memory under 60 MB.
How are tables preserved during processing?
Tables are extracted into a row-linearized key-value format where each row becomes a set of "Header: Value" pairs. The chunking engine detects table boundaries via TABLE_START and TABLE_END markers and splits only at row boundaries, never mid-row. Table captions are prepended to each resulting chunk, and table chunks are tagged with a dedicated type for accurate retrieval. This approach works across DocLing, PyMuPDF, and Office extraction paths.