RAG Series (16): Graph RAG — Using Knowledge Graphs to Solve Multi-Hop Reasoning

DEV Community

WonderLab

May 14, 2026, 10:24 PM

The Relationship Blindspot in Vector Retrieval Every optimization in this series so far — better chunking, reranking, query rewriting, CRAG — works within a fundamental assumption: retrieval is about finding similar text. But a whole category of questions doesn't fit that assumption: Questions that require reasoning across multiple entities. Consider: "bge-large-zh-v1.5 and bge-reranker-v2-m3 both come from the same organization. What is that organization, and what role does each model play in a RAG pipeline?" Vector search will find documents mentioning BAAI or BGE — that part works. But retrieval only returns text chunks. The LLM then has to figure out "these two models are both from BAAI" by reading disconnected paragraphs. The relationship isn't in the retrieval result; it's implicit in the text. Or consider: "Trace the evolution of retrieval quality evaluation from basic RAG to CRAG." This requires placing RAG → Rerank → Self-RAG → CRAG in order along a progression chain. Vector search returns the top-4 most similar documents — not necessarily the nodes on that chain in the right sequence. Graph RAG's insight: instead of hoping that semantic similarity happens to capture the right relationships, extract the entities and relationships explicitly, build a knowledge graph, and traverse it at query time. Finding "what connects X to Y" becomes a graph traversal problem, not a similarity problem. The atomic unit of a knowledge graph is the triple: (head entity, relation, tail entity) For example: BAAI --[developed]--> bge-large-zh-v1.5 BAAI --[developed]--> bge-reranker-v2-m3 bge-large-zh-v1.5 --[used for]--> vector retrieval bge-reranker-v2-m3 --[used for]--> reranking Assembled into a directed graph (NetworkX DiGraph), the question "what models did BAAI develop?" becomes: find the BAAI node, list all outgoing edges. No semantic similarity needed — pure structure. Build phase (offline): documents → LLM triple extraction → NetworkX directed graph documents → embedding → ChromaDB vector index (baseline) Query phase (online): question ↓ LLM extracts question entities (e.g., BAAI, bge-large-zh-v1.5) ↓ Fuzzy-match entities against graph nodes (substring match) ↓ BFS 2-hop traversal: seed_nodes → neighbors → neighbors' neighbors ↓ Assemble triple context (graph traversal result + top-2 vector docs as fallback) ↓ LLM generates answer A critical design decision: Graph RAG is not pure graph retrieval. It's a hybrid of graph traversal context + vector retrieval. Pure graph traversal has a hard "entity boundary" — if a relevant entity wasn't extracted into the graph, it's unreachable. The top-2 vector documents serve as a safety net for those gaps. LangChain ships LLMGraphTransformer, purpose-built for extracting graph structure from text. I started there: from langchain_experimental.graph_transformers import LLMGraphTransformer graph_transformer = LLMGraphTransformer(llm=llm) graph_docs = graph_transformer.convert_to_graph_documents([doc]) All 12 documents failed: Invalid JSON: invalid number at line 1 column 2 [type=json_invalid, input_value='- Node: RAG (Retrieval-A...'] LLMGraphTransformer requires the LLM to return structured JSON. GLM-4-flash was returning text-formatted lists (- Node: xxx). Pydantic validation failed every time. The fix: stop requiring JSON. Use a pipe-delimited format instead — Entity A | relation | Entity B, one triple per line, parsed with split("|"). No JSON parsing, no Pydantic validation, high fault tolerance: TRIPLE_EXTRACT_PROMPT = ChatPromptTemplate.from_messages([ ("system", "Extract entities and relations from the text below. Output a list of triples.\n" "Required format: one triple per line, strictly: Entity A | relation | Entity B\n" "Rules:\n" "- Entities: noun phrases, no brackets or quotes\n" "- Relations: verb phrases, e.g., uses, contains, proposed by, applies to, outperforms\n" "- Output ONLY triples, no numbers, no explanations, nothing else\n" "- 8-15 triples per document\n\n" "Example output:\n" "RAG | uses | vector retrieval\n" "RAGAS | proposed by | Es et al.\n" "Chroma | suitable for | local development"), ("human", "Text:\n{text}"), ]) def extract_triples(text: str) -> list[tuple[str, str, str]]: raw = triple_chain.invoke({"text": text}) triples = [] for line in raw.strip().splitlines(): parts = [p.strip() for p in line.split("|")] if len(parts) == 3 and all(parts): triples.append((parts[0], parts[1], parts[2])) return triples KG = nx.DiGraph() for doc in DOCUMENTS: triples = extract_triples(doc.page_content) for head, rel, tail in triples: KG.add_node(head, source=doc.metadata["source"]) KG.add_node(tail, source=doc.metadata["source"]) KG.add_edge(head, tail, relation=rel) 12 documents → 176 nodes, 139 edges. def graph_retrieve(question: str, graph: nx.DiGraph, hops: int = 2): # Step 1: Extract entities from question entities = extract_entities(question) # LLM, one entity per line # Step 2: Fuzzy-match against graph nodes (substring matching) seed_nodes = [] for entity in entities: entity_lower = entity.lower() for node in graph.nodes: if entity_lower in node.lower() or node.lower() in entity_lower: seed_nodes.append(node) if not seed_nodes: return [] # No match — fall back to pure vector retrieval # Step 3: BFS k-hop traversal visited = set(seed_nodes) frontier = set(seed_nodes) for _ in range(hops): next_frontier = set() for node in frontier: neighbors = set(graph.successors(node)) | set(graph.predecessors(node)) next_frontier |= neighbors - visited visited |= next_frontier frontier = next_frontier # Step 4: Format triples as context triples = [ f"{u} --[{data['relation']}]--> {v}" for u, v, data in graph.edges(data=True) if u in visited or v in visited ] return [Document(page_content= f"[Graph entities]: {', '.join(list(visited)[:20])}\n\n" f"[Graph relationships]:\n" + "\n".join(triples[:40]) )] BFS at 2 hops: from seed entities, collect direct neighbors (1 hop) and their neighbors (2 hops). For "what models did BAAI develop?", 1 hop is enough. For "where do bge-large-zh-v1.5 and bge-reranker-v2-m3 come from, and what are they each used for?", 2 hops connects BAAI → both models → their respective uses. 8 questions, two categories: Type Count Example Single-hop factual 2 "What are the four RAGAS metrics?" Multi-hop relational 6 "Which org do bge-large-zh-v1.5 and bge-reranker-v2-m3 come from, and what does each do?" The multi-hop questions are Graph RAG's home turf — and vector search's weak spot. ====================================================================== RAGAS Metrics Comparison (Vector RAG vs Graph RAG) ====================================================================== Metric Vector RAG Graph RAG Delta ────────────────────────────────────────────────────────── context_recall 0.812 0.750 ↓-0.062 context_precision 0.729 0.948 ↑+0.219 ◀ faithfulness 0.865 0.883 ↑+0.018 answer_relevancy 0.536 0.465 ↓-0.071 ====================================================================== context_precision +0.219 — one of the largest precision gaps between vector and non-vector approaches in this series. context_precision measures how many of the documents sent to the LLM are actually relevant. Vector search returns the top-4 semantically similar chunks — similar doesn't mean precise. Ask "what two models did BAAI build?", and vector search pulls back every paragraph mentioning BAAI or BGE: model dimensions, benchmark rankings, usage scenarios — lots of content the question didn't ask for. Graph traversal starts at the BAAI node and walks directly to its connected subgraph: BAAI --[developed]--> bge-large-zh-v1.5 BAAI --[developed]--> bge-reranker-v2-m3 bge-large-zh-v1.5 --[used for]--> vector embedding bge-reranker-v2-m3 --[used for]--> reranking Four triples, all directly relevant. No noise. context_precision naturally approaches 1.0 for these questions. Graph traversal has a hard entity boundary: it can only expand from entities that were extracted into the graph. If a relevant fact exists in the text but its corresponding entity wasn't captured during triple extraction, that information is permanently unreachable by graph traversal. Vector search has no such boundary. Semantic similarity doesn't require prior structuralization — it can surface relevant content even when the exact entity name doesn't appear in the query. That's why vector search has higher recall: broader reach, even if less precise. This is the classic precision-recall tradeoff. Graph RAG deliberately takes the precision side: return less, but make what's returned count. Graph traversal context is a structured triple list: bge-large-zh-v1.5 --[developed by]--> BAAI Vector retrieval returns full semantic paragraphs with natural language flow. When the LLM synthesizes an answer from triples, the output is technically correct but slightly less fluent than summarizing a natural paragraph — which slightly depresses the answer_relevancy score. This is an inherent limitation of the triple format as context. It can be partially addressed by adding a prompt instruction: "First convert these triples into a coherent paragraph, then answer the question." Strong fit: Complex relationship networks in documents: technical docs, knowledge bases, product manuals where entities have many explicit connections Questions that cross entity boundaries: "what connects X and Y", "what family does Z belong to", "trace the evolution of A" Answers distributed across documents: knowledge you need to assemble from multiple sources Caveats: Triple extraction cost: every document requires an LLM call to extract triples. Build phase is significantly more expensive than a vector-only index Extraction quality is the ceiling: if the LLM misses a key entity during triple extraction, graph traversal can never recover it Limited gain for single-hop QA: if most questions are "what is X?", vector recall advantage outweighs graph precision advantage Answer fluency: triple-format context produces slightly stiffer answers; prompt engineering can compensate Complete code is open-sourced at: https://github.com/chendongqi/llm-in-action/tree/main/16-graph-rag Key file: graph_rag.py — Full implementation: graph build, BFS retrieval, RAGAS evaluation How to run: git clone https://github.com/chendongqi/llm-in-action cd 16-graph-rag cp .env.example .env pip install -r requirements.txt python graph_rag.py This article implemented Graph RAG from scratch. Key findings: context_precision +0.219 is a direct result of graph traversal's "retrieve by path, not by similarity" approach — only edges directly connected to the question's entities are returned context_recall -0.062 is the cost of the entity boundary — what wasn't extracted into the graph is invisible to traversal, while vector search has no such hard limit The most important implementation decision: abandon LLMGraphTransformer, switch to a custom Entity | relation | Entity format — solves GLM-4-flash's unstable JSON output with a simpler, more robust parsing strategy The hybrid approach is the right call: graph traversal + vector fallback. Pure graph traversal leaves gaps; pure vector search misses relationships. The combination covers both. A pattern that's been accumulating across this series: there is no universally dominant RAG strategy. Every approach gains on one dimension while conceding on another. Vector search has strong recall but weak precision for relational questions. Reranking improves precision but adds latency. Graph RAG achieves high precision but sacrifices recall coverage. Which one to deploy depends entirely on what your knowledge base looks like and what questions your users actually ask. Graph RAG Paper: From Local to Global: A Graph RAG Approach to Query-Focused Summarization Microsoft GraphRAG Open Source Project NetworkX Documentation LangChain LLMGraphTransformer