Interpretable Human Activity Recognition for Subtle Robbery Detection in Surveillance Videos

cs.CV updates on arXiv.org

Bryan Jhoan Caz\'ares Leyva, Ulises Gachuz Davila, Jos\'e Juan Gonz\'alez Fonseca, Juan Irving Vasquez, Vanessa A. Camacho-V\'azquez, Sergio Isah\'i Garrido-Casta\~neda

Apr 17, 2026, 12:00 AM

arXiv:2604.14329v1 Announce Type: new Abstract: Non-violent street robberies (snatch-and-run) are difficult to detect automatically because they are brief, subtle, and often indistinguishable from benign human interactions in unconstrained surveillance footage. This paper presents a hybrid, pose-driven approach for detecting snatch-and-run events that combines real-time perception with an interpretable classification stage suitable for edge deployment. The system uses a YOLO-based pose estimator to extract body keypoints for each tracked person and computes kinematic and interaction features describing hand speed, arm extension, proximity, and relative motion between an aggressor-victim pair. A Random Forest classifier is trained on these descriptors, and a temporal hysteresis filter is applied to stabilize frame-level predictions and reduce spurious alarms. We evaluate the method on a staged dataset and on a disjoint test set collected from internet videos, demonstrating promising generalization across different scenes and camera viewpoints. Finally, we implement the complete pipeline on an NVIDIA Jetson Nano and report real-time performance, supporting the feasibility of proactive, on-device robbery detection.