AGWM: Affordance-Grounded World Models for Environments with Compositional Prerequisites

arXiv

Qinshi Zhang (University of California, San Diego), Weipeng Deng (University of Hong Kong), Zhihan Jiang (Columbia University), Jiaming Qu (Amazon), Qianren Li (City University of Hong Kong), Weitao Xu (City University of Hong Kong), Ray LC (City University of Hong Kong)

May 11, 2026, 12:00 AM

arXiv:2605.06841v1 Announce Type: new Abstract: In model-based learning, the agent learns behaviors by simulating trajectories based on world model predictions. Standard world models typically learn a stationary transition function that maps states and actions to next states, when an action and an outcome frequently co-occur in training data, the model tends to internalize this correlation as a general causal rule while ignoring action preconditions. In interactive environments, however, agent actions can reshape the future affordance space. At each timestep, an action may becomes executable only after its prerequisites are met, or non-executable when they are destroyed. We term such events structure-changing events (SC events). As a result, a conventional world model often fails to determine whether a given action is executable in the current state, especially in multi-step predictions. Each imagined step is conditioned on an incorrect affordance state, and therefore the prediction error compounds over the rollout horizon. In this paper, we propose AGWM (Affordance-Grounded World Model), which learns an abstract affordance structure represented as a DAG of prerequisite dependencies to explicitly track the dynamic executability of actions. Experiments on game-based simulated environments demonstrate the effectiveness of our method by achieving lower multi-step prediction error, better generalization to novel configurations, and improved interpretability.