Uncovering Trajectory and Topological Signatures in Multimodal Pediatric Sleep Embeddings
arXiv:2605.14156v1 Announce Type: new Abstract: While generative models have shown promise in pediatric sleep analysis, the latent structure of their multimodal embeddings remains poorly understood. This work investigates session-wide diagnostic information contained in the sequences of 30-second pediatric PSG epochs embedded by a multimodal masked autoencoder. We test whether augmenting embeddings with PHATE-derived per-epoch coordinates and whole-night movement descriptors, persistent homology summaries of the embedding cloud, and EHR yields task-relevant signals. Simple linear and MLP models, chosen for interpretability rather than state-of-the-art performance, show that geometric, topological, and clinical features each provide complementary gains. For binary predictions, feature importance is task-dependent, and more expressive late-fusion models generally perform better, with AUPRC improving from 0.26 to 0.34 for desaturation, 0.31 to 0.48 for EEG arousal, 0.09 to 0.22 for hypopnea, and 0.05 to 0.14 for apnea. We also report Brier score and Expected Calibration Error, where the full fusion model yields the best calibration across all four binary tasks. Our study reveals that latent geometry/topology and EHR offer complementary, interpretable signals beyond embeddings, improving calibration and robustness under extreme imbalance.
