Beyond the Basics: Leveraging Large Language Model for Fine-Grained Medical Entity Recognition

arXiv

Nwe Ni Win (Western Sydney University, Sydney, Australia), Jim Basilakis (Western Sydney University, Sydney, Australia, South Western Emergency Research Institute, Sydney, Australia), Steven Thomas (South Western Emergency Research Institute, Sydney, Australia), Seyhan Yazar (Garvan Institute of Medical Research, Sydney, Australia, University of New South Wales, Sydney, Australia), Laura Pierce (University of New South Wales, Sydney, Australia), Stephanie Liu (Liverpool Hospital, Sydney, Australia), Paul M. Middleton (South Western Emergency Research Institute, Sydney, Australia), Nasser Ghadiri (South Western Emergency Research Institute, Sydney, Australia), X. Rosalind Wang (Western Sydney University, Sydney, Australia, South Western Emergency Research Institute, Sydney, Australia)

Apr 21, 2026, 12:00 AM

arXiv:2604.17214v1 Announce Type: new Abstract: Extracting clinically relevant information from unstructured medical narratives such as admission notes, discharge summaries, and emergency case histories remains a challenge in clinical natural language processing (NLP). Medical Entity Recognition (MER) identifies meaningful concepts embedded in these records. Recent advancements in large language models (LLMs) have shown competitive MER performance; however, evaluations often focus on general entity types, offering limited utility for real-world clinical needs requiring finer-grained extraction. To address this gap, we rigorously evaluated the open-source LLaMA3 model for fine-grained medical entity recognition across 18 clinically detailed categories. To optimize performance, we employed three learning paradigms: zero-shot, few-shot, and fine-tuning with Low-Rank Adaptation (LoRA). To further enhance few-shot learning, we introduced two example selection methods based on token- and sentence-level embedding similarity, utilizing a pre-trained BioBERT model. Unlike prior work assessing zero-shot and few-shot performance on proprietary models (e.g., GPT-4) or fine-tuning different architectures, we ensured methodological consistency by applying all strategies to a unified LLaMA3 backbone, enabling fair comparison across learning settings. Our results showed that fine-tuned LLaMA3 surpasses zero-shot and few-shot approaches by 63.11% and 35.63%, respectivel respectively, achieving an F1 score of 81.24% in granular medical entity extraction.