DW-Bench: Benchmarking LLMs on Data Warehouse Graph Topology Reasoning

arXiv

Ahmed G. A. H Ahmed, C. Okan Sakar

Apr 22, 2026, 12:00 AM

arXiv:2604.18964v1 Announce Type: new Abstract: This paper introduces DW-Bench, a new benchmark that evaluates large language models (LLMs) on graph-topology reasoning over data warehouse schemas, explicitly integrating both foreign-key (FK) and data-lineage edges. The benchmark comprises 1,046 automatically generated, verifiably correct questions across five schemas. Experiments show that tool-augmented methods substantially outperform static approaches but plateau on hard compositional subtypes.