Skip to content

A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related websites.

License

Notifications You must be signed in to change notification settings

leofan90/Awesome-World-Models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 

Repository files navigation

Awesome World Models for Robotics Awesome

This repository provides a curated list of papers for World Models for General Video Generation, Embodied AI, and Autonomous Driving. Template from Awesome-LLM-Robotics and Awesome-World-Model

Contributions are welcome! Please feel free to submit pull requests or reach out via email to add papers!

If you find this repository useful, please consider citing and giving this list a star ⭐. Feel free to share it with others!


Overview


Foundation paper of World Model

Blog or Technical Report

  • 1X Technologies, 1X World Model. [Blog]
  • Runway, Introducing General World Models. [Blog]
  • Wayve, Introducing GAIA-1: A Cutting-Edge Generative AI Model for Autonomy. [Paper] [Blog]
  • Yann LeCun, A Path Towards Autonomous Machine Intelligence. [Paper]

Surveys

  • "Understanding World or Predicting Future? A Comprehensive Survey of World Models", arXiv 2024.11. [Paper]
  • "World Models: The Safety Perspective", ISSRE WDMD. [Paper]
  • "Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey", arXiv 2024.11. [Paper]
  • "From Efficient Multimodal Models to World Models: A Survey", arXiv 2024.07. [Paper]
  • "Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI", arXiv 2024.07. [Paper] [Code]
  • "Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond", arXiv 2024.05. [Paper] [Code]
  • "World Models for Autonomous Driving: An Initial Survey", TIV. [Paper]
  • "A survey on multimodal large language models for autonomous driving", WACVW 2024. [Paper] [Code]

Benchmarks

  • ACT-Bench: "ACT-Bench: Towards Action Controllable World Models for Autonomous Driving", arxiv 2024.12. [Paper]
  • WorldSimBench: "WorldSimBench: Towards Video Generation Models as World Simulators", arxiv 2024.10. [Paper] [Website]
  • EVA: "EVA: An Embodied World Model for Future Video Anticipation", arxiv 2024.10. [Paper] [Website]
  • AeroVerse: "AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models", arxiv 2024.08. [Paper]
  • CityBench: "CityBench: Evaluating the Capabilities of Large Language Model as World Model", arXiv 2024.6. [Paper] [Code]
  • "Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models", NIPS 2023. [Paper]

General World Models

  • "Towards Physically Interpretable World Models: Meaningful Weakly Supervised Representations for Visual Trajectory Prediction", arXiv 2024.12. [Paper]
  • "Transformers Use Causal World Models in Maze-Solving Tasks", arXiv 2024.12. [Paper]
  • "Causal World Representation in the GPT Model", NIPS 2024 Workshop. [Paper]
  • Owl-1: "Owl-1: Omni World Model for Consistent Long Video Generation", arXiv 2024.12. [Paper]
  • "Navigation World Models", arXiv 2024.12. [Paper] [Website]
  • "Evaluating World Models with LLM for Decision Making", arXiv 2024.11. [Paper]
  • LLMPhy: "LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models", arXiv 2024.11. [Paper]
  • WebDreamer: "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents", arXiv 2024.11. [Paper] [Code]
  • "Scaling Laws for Pre-training Agents and World Models", arXiv 2024.11. [Paper]
  • DINO-WM: "DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning", arXiv 2024.11. [Paper] [Website]
  • "Learning World Models for Unconstrained Goal Navigation", NIPS 2024. [Paper]
  • "How Far is Video Generation from World Model: A Physical Law Perspective", arXiv 2024.11. [Paper] [Website] [Code]
  • Adaptive World Models: "Adaptive World Models: Learning Behaviors by Latent Imagination Under Non-Stationarity", NIPS 2024 Workshop Adaptive Foundation Models. [Paper]
  • LLMCWM: "Language Agents Meet Causality -- Bridging LLMs and Causal World Models", arXiv 2024.10. [Paper] [Code]
  • "Reward-free World Models for Online Imitation Learning", arXiv 2024.10. [Paper]
  • "Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation", arXiv 2024.10. [Paper]
  • AVID: "AVID: Adapting Video Diffusion Models to World Models", arXiv 2024.10. [Paper] [Code]
  • SMAC: "Grounded Answers for Multi-agent Decision-making Problem through Generative World Model", NeurIPS 2024. [Paper]
  • OSWM: "One-shot World Models Using a Transformer Trained on a Synthetic Prior", arXiv 2024.9. [Paper]
  • "Making Large Language Models into World Models with Precondition and Effect Knowledge", arXiv 2024.9. [Paper]
  • "Efficient Exploration and Discriminative World Model Learning with an Object-Centric Abstraction", arXiv 2024.8. [Paper]
  • MoReFree: "World Models Increase Autonomy in Reinforcement Learning", arXiv 2024.8. [Paper] [Project]
  • UrbanWorld: "UrbanWorld: An Urban World Model for 3D City Generation", arXiv 2024.7. [Paper]
  • PWM: "PWM: Policy Learning with Large World Models", arXiv 2024.7. [Paper] [Code]
  • "Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling", arXiv 2024.7. [Paper]
  • GenRL: "GenRL: Multimodal foundation world models for generalist embodied agents", arXiv 2024.6. [Paper] [Code]
  • DLLM: "World Models with Hints of Large Language Models for Goal Achieving", arXiv 2024.6. [Paper]
  • "Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model", arXiv 2024.6. [Paper]
  • CoDreamer: "CoDreamer: Communication-Based Decentralised World Models", arXiv 2024.6. [Paper]
  • Pandora: "Pandora: Towards General World Model with Natural Language Actions and Video States", arXiv 2024.6. [Paper] [Code]
  • EBWM: "Cognitively Inspired Energy-Based World Models", arXiv 2024.6. [Paper]
  • "Evaluating the World Model Implicit in a Generative Model", arXiv 2024.6. [Paper] [Code]
  • "Transformers and Slot Encoding for Sample Efficient Physical World Modelling", arXiv 2024.5. [Paper] [Code]
  • Puppeteer: "Hierarchical World Models as Visual Whole-Body Humanoid Controllers", arXiv 2024.5. [Paper] [Code]
  • BWArea Model: "BWArea Model: Learning World Model, Inverse Dynamics, and Policy for Controllable Language Generation", arXiv 2024.5. [Paper]
  • WKM: "Agent Planning with World Knowledge Model", arXiv 2024.5. [Paper] [Code]
  • Diamond: "Diffusion for World Modeling: Visual Details Matter in Atari", arXiv 2024.5. [Paper] [Code]
  • "Compete and Compose: Learning Independent Mechanisms for Modular World Models", arXiv 2024.4. [Paper]
  • "Dreaming of Many Worlds: Learning Contextual World Models Aids Zero-Shot Generalization", arXiv 2024.3. [Paper] [Code]
  • V-JEPA: "V-JEPA: Video Joint Embedding Predictive Architecture", Meta AI. [Blog] [Paper] [Code]
  • IWM: "Learning and Leveraging World Models in Visual Representation Learning", Meta AI. [Paper]
  • Genie: "Genie: Generative Interactive Environments", DeepMind. [Paper] [Blog]
  • Sora: "Video generation models as world simulators", OpenAI. [Technical report]
  • LWM: "World Model on Million-Length Video And Language With RingAttention", arXiv 2024.2. [Paper] [Code]
  • "Planning with an Ensemble of World Models", OpenReview. [Paper]
  • WorldDreamer: "WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens", arXiv 2024.1. [Paper] [Code]
  • CWM: "Understanding Physical Dynamics with Counterfactual World Modeling", ECCV 2024. [Paper] [Code]
  • Δ-IRIS: "Efficient World Models with Context-Aware Tokenization", ICML 2024. [Paper] [Code]
  • LLM-Sim: "Can Language Models Serve as Text-Based World Simulators?", ACL. [Paper] [Code]
  • AD3: "AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors", ICML 2024. [Paper]
  • MAMBA: "MAMBA: an Effective World Model Approach for Meta-Reinforcement Learning", ICLR 2024. [Paper] [Code]
  • R2I: "Mastering Memory Tasks with World Models", ICLR 2024. [Paper] [Website] [Code]
  • HarmonyDream: "HarmonyDream: Task Harmonization Inside World Models", ICML 2024. [Paper] [Code]
  • REM: "Improving Token-Based World Models with Parallel Observation Prediction", ICML 2024. [Paper] [Code]
  • "Do Transformer World Models Give Better Policy Gradients?"", ICML 2024. [Paper]
  • DreamSmooth: "DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing", ICLR 2024. [Paper]
  • TD-MPC2: "TD-MPC2: Scalable, Robust World Models for Continuous Control", ICLR 2024. [Paper] [Torch Code]
  • Hieros: "Hieros: Hierarchical Imagination on Structured State Space Sequence World Models", ICML 2024. [Paper]
  • CoWorld: "Making Offline RL Online: Collaborative World Models for Offline Visual Reinforcement Learning", NeurIPS 2024. [Paper]

World Models for Embodied AI

  • Dream to Manipulate: "Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination", arXiv 2024.12. [Paper] [Website]
  • WHALE: "WHALE: Towards Generalizable and Scalable World Models for Embodied Decision-making", arXiv 2024.11. [Paper]
  • VisualPredicator: "VisualPredicator: Learning Abstract World Models with Neuro-Symbolic Predicates for Robot Planning", arXiv 2024.10. [Paper]
  • "Multi-Task Interactive Robot Fleet Learning with Visual World Models", CoRL 2024. [Paper] [Code]
  • X-MOBILITY: "X-MOBILITY: End-To-End Generalizable Navigation via World Modeling", arXiv 2024.10. [Paper]
  • PIVOT-R: "PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation", NeurIPS 2024. [Paper]
  • GLIMO: "Grounding Large Language Models In Embodied Environment With Imperfect World Models", arXiv 2024.10. [Paper]
  • EVA: "EVA: An Embodied World Model for Future Video Anticipation", arxiv 2024.10. [Paper] [Website]
  • PreLAR: "PreLAR: World Model Pre-training with Learnable Action Representation", ECCV 2024. [Paper] [Code]
  • WMP: "World Model-based Perception for Visual Legged Locomotion", arXiv 2024.9. [Paper] [Project]
  • R-AIF: "R-AIF: Solving Sparse-Reward Robotic Tasks from Pixels with Active Inference and World Models", arXiv 2024.9. [Paper]
  • "Representing Positional Information in Generative World Models for Object Manipulation" arXiv 2024.9 [Paper]
  • DexSim2Real$^2$: "DexSim2Real$^2: Building Explicit World Model for Precise Articulated Object Dexterous Manipulation", arXiv 2024.9. [Paper]
  • DWL: "Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning", RSS 2024 (Best Paper Award Finalist). [Paper]
  • "Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics", arXiv 2024.6. [Paper] [Website]
  • HRSSM: "Learning Latent Dynamic Robust Representations for World Models", ICML 2024. [Paper] [Code]
  • RoboDreamer: "RoboDreamer: Learning Compositional World Models for Robot Imagination", ICML 2024. [Paper] [Code]
  • COMBO: "COMBO: Compositional World Models for Embodied Multi-Agent Cooperation", ECCV 2024. [Paper] [Website] [Code]
  • 3D-VLA: "3D-VLA: A 3D Vision-Language-Action Generative World Model", ICML 2024. [Paper]
  • ManiGaussian: "ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation", arXiv 2024.3. [Paper] [Code]

World Models for Autonomous Driving

  • DrivingGPT: "DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers", arXiv 2024.12. [Paper] [Project Page]
  • "An Efficient Occupancy World Model via Decoupled Dynamic Flow and Image-assisted Training", arXiv 2024.12. [Paper]
  • GEM: "GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control", arXiv 2024.12. [Paper] [Project Page]
  • GaussianWorld: "GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction", arXiv 2024.12. [Paper] [Code]
  • Doe-1: "Doe-1: Closed-Loop Autonomous Driving with Large World Model", arXiv 2024.12. [Paper] [Project Page] [Code]
  • "Pysical Informed Driving World Model", arXiv 2024.12. [Paper] [Project Page]
  • InfiniCube: "InfiniCube: Unbounded and Controllable Dynamic 3D Driving Scene Generation with World-Guided Video Models", arXiv 2024.12. [Paper] [Project Page]
  • InfinityDrive: "InfinityDrive: Breaking Time Limits in Driving World Models", arXiv 2024.12. [Paper] [Project Page]
  • ReconDreamer: "ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration", arXiv 2024.11. [Paper] [Project Page]
  • Imagine-2-Drive: "Imagine-2-Drive: High-Fidelity World Modeling in CARLA for Autonomous Vehicles", ICRA 2025. [Paper] [Project Page]
  • DriveDreamer4D: "World Models Are Effective Data Machines for 4D Driving Scene Representation", arXiv 2024.10. [Paper] [Project Page]
  • DOME: "Taming Diffusion Model into High-Fidelity Controllable Occupancy World Model", arXiv 2024.10. [Paper] [Project Page]
  • SSR: "Does End-to-End Autonomous Driving Really Need Perception Tasks?", arXiv 2024.9. [Paper] [Code]
  • "Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models", arXiv 2024.9. [Paper]
  • LatentDriver: "Learning Multiple Probabilistic Decisions from Latent World Model in Autonomous Driving", arXiv 2024.9. [Paper] [Code]
  • RenderWorld: "World Model with Self-Supervised 3D Label", arXiv 2024.9. [Paper]
  • OccLLaMA: "An Occupancy-Language-Action Generative World Model for Autonomous Driving", arXiv 2024.9. [Paper]
  • DriveGenVLM: "Real-world Video Generation for Vision Language Model based Autonomous Driving", arXiv 2024.8. [Paper]
  • Drive-OccWorld: "Driving in the Occupancy World: Vision-Centric 4D Occupancy Forecasting and Planning via World Models for Autonomous Driving", arXiv 2024.8. [Paper]
  • CarFormer: "Self-Driving with Learned Object-Centric Representations", ECCV 2024. [Paper] [Code]
  • BEVWorld: "A Multimodal World Model for Autonomous Driving via Unified BEV Latent Space", arXiv 2024.7. [Paper] [Code]
  • TOKEN: "Tokenize the World into Object-level Knowledge to Address Long-tail Events in Autonomous Driving", arXiv 2024.7. [Paper]
  • UMAD: "Unsupervised Mask-Level Anomaly Detection for Autonomous Driving", arXiv 2024.6. [Paper]
  • SimGen: "Simulator-conditioned Driving Scene Generation", arXiv 2024.6. [Paper] [Code]
  • AdaptiveDriver: "Planning with Adaptive World Models for Autonomous Driving", arXiv 2024.6. [Paper] [Code]
  • UnO: "Unsupervised Occupancy Fields for Perception and Forecasting", CVPR 2024. [Paper] [Code]
  • LAW: "Enhancing End-to-End Autonomous Driving with Latent World Model", arXiv 2024.6. [Paper] [Code]
  • Delphi: "Unleashing Generalization of End-to-End Autonomous Driving with Controllable Long Video Generation", arXiv 2024.6. [Paper] [Code]
  • OccSora: "4D Occupancy Generation Models as World Simulators for Autonomous Driving", arXiv 2024.5. [Paper] [Code]
  • MagicDrive3D: "Controllable 3D Generation for Any-View Rendering in Street Scenes", arXiv 2024.5. [Paper] [Code]
  • Vista: "A Generalizable Driving World Model with High Fidelity and Versatile Controllability", NeurIPS 2024. [Paper] [Code]
  • CarDreamer: "Open-Source Learning Platform for World Model based Autonomous Driving", arXiv 2024.5. [Paper] [Code]
  • DriveSim: "Probing Multimodal LLMs as World Models for Driving", arXiv 2024.5. [Paper] [Code]
  • DriveWorld: "4D Pre-trained Scene Understanding via World Models for Autonomous Driving", CVPR 2024. [Paper]
  • LidarDM: "Generative LiDAR Simulation in a Generated World", arXiv 2024.4. [Paper] [Code]
  • SubjectDrive: "Scaling Generative Data in Autonomous Driving via Subject Control", arXiv 2024.3. [Paper] [Project]
  • DriveDreamer-2: "LLM-Enhanced World Models for Diverse Driving Video Generation", arXiv 2024.3. [Paper] [Code]
  • Think2Drive: "Efficient Reinforcement Learning by Thinking in Latent World Model for Quasi-Realistic Autonomous Driving", ECCV 2024. [Paper]
  • MARL-CCE: "Modelling Competitive Behaviors in Autonomous Driving Under Generative World Model", ECCV 2024. [Paper] [Code]
  • GenAD: "Generalized Predictive Model for Autonomous Driving", CVPR 2024. [Paper] [Data]
  • GenAD: "Generative End-to-End Autonomous Driving", ECCV 2024. [Paper] [Code]
  • NeMo: "Neural Volumetric World Models for Autonomous Driving", ECCV 2024. [Paper]
  • MARL-CCE: "Modelling-Competitive-Behaviors-in-Autonomous-Driving-Under-Generative-World-Model", ECCV 2024. [Code]
  • ViDAR: "Visual Point Cloud Forecasting enables Scalable Autonomous Driving", CVPR 2024. [Paper] [Code]
  • Drive-WM: "Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving", CVPR 2024. [Paper] [Code]
  • Cam4DOCC: "Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications", CVPR 2024. [Paper] [Code]
  • Panacea: "Panoramic and Controllable Video Generation for Autonomous Driving", CVPR 2024. [Paper] [Code]
  • OccWorld: "Learning a 3D Occupancy World Model for Autonomous Driving", ECCV 2024. [Paper] [Code]
  • Copilot4D: "Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion", ICLR 2024. [Paper]
  • DrivingDiffusion: "Layout-Guided multi-view driving scene video generation with latent diffusion model", ECCV 2024. [Paper] [Code]
  • SafeDreamer: "Safe Reinforcement Learning with World Models", ICLR 2024. [Paper] [Code]
  • MagicDrive: "Street View Generation with Diverse 3D Geometry Control", ICLR 2024. [Paper] [Code]
  • DriveDreamer: "Towards Real-world-driven World Models for Autonomous Driving", ECCV 2024. [Paper] [Code]
  • SEM2: "Enhance Sample Efficiency and Robustness of End-to-end Urban Autonomous Driving via Semantic Masked World Model", TITS. [Paper]

Citation

If you find this repository useful, please consider citing this list:

@misc{leo2024worldmodelspaperslist,
    title = {Awesome-World-Models},
    author = {Leo Fan},
    journal = {GitHub repository},
    url = {https://github.com/leofan90/Awesome-World-Models},
    year = {2024},
}

About

A comprehensive list of papers for the definition of World Models and using World Models for General Video Generation, Embodied AI, and Autonomous Driving, including papers, codes, and related websites.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published