Question 1

How are entities extracted?

Accepted Answer

Leepi.ai uses a universal NER model with 200 entity types across 18 domains including medical, legal, financial, tech, science, geography, and more. The model is a single custom-trained spaCy pipeline with a hybrid architecture: an EntityRuler with 1,900+ gazetteer patterns and 8 regex rules runs before a trained NER component built on en_core_web_lg with 560K word vectors. It runs at ~70 MB instead of the 430 MB required by the three separate models it replaced.

Question 2

What is entity resolution?

Accepted Answer

Entity resolution merges different mentions of the same real-world entity. Leepi.ai uses a 3-tier chain: exact string match first, then fuzzy matching via RapidFuzz (Levenshtein distance) for near-matches like typos, and finally embedding similarity using OpenAI embeddings for semantically equivalent names. This ensures "IBM", "International Business Machines", and "I.B.M." all resolve to the same entity node in the graph.

Question 3

How does Graph-RAG work?

Accepted Answer

When Graph-RAG is enabled, entity relationships in the knowledge graph expand search queries to find related context that keyword search alone would miss. For example, if a document mentions a drug name, the graph can surface related entities like its manufacturer, known side effects, or regulatory approvals from other documents. Co-occurrence edges are weighted using PMI (pointwise mutual information) to rank relationship strength.

Knowledge Graph

Key capabilities

Frequently asked questions

How are entities extracted?

What is entity resolution?

How does Graph-RAG work?

Try it yourself

Knowledge Graph

Key capabilities

Frequently asked questions

How are entities extracted?

What is entity resolution?

How does Graph-RAG work?

Try it yourself