AI News Hub Logo

AI News Hub

HalalBench: A Multilingual OCR Benchmark for Food Packaging Ingredient Extraction

cs.CV updates on arXiv.org
Hasan Arief

arXiv:2604.22754v1 Announce Type: new Abstract: No standardized benchmark exists for evaluating OCR on food packaging, despite its critical role in automated halal food verification. Existing benchmarks target documents or scene text, missing the unique challenges of ingredient labels: curved surfaces, dense multilingual text, and sub-8pt fonts. We present HalalBench, the first open multilingual benchmark for food packaging OCR, comprising 1,043 images (50 real, 993 synthetic) with 36,438 annotations in COCO format spanning 14 languages. We evaluate four engines: docTR achieves F1=0.193, ML Kit 0.180, EasyOCR 0.167, while all fail on Japanese (F1=0.000). A clustering ablation shows 36% F1 improvement from our post-processing algorithm. We validate findings through HalalLens (https://halallens.no), a production halal scanner serving 20+ countries. Dataset and code are released under open licenses.