Releases: opendilab/DI-engine
v0.5.2
Env
- add taxi env (#799) (#807)
- add ising model env (#782)
- add new Flozen Lake env (#781)
- optimize ppo continuous config in MuJoCo (#801)
- fix masac smac config multi_agent=True bug (#791)
- update/speed up pendulum ppo
Algorithm
- fix gtrxl compatibility bug (#796)
- fix complex obs demo for ppo pipeline (#786)
- add naive PWIL demo
- fix marl nstep td compatibility bug
Enhancement
Style
- relax flask requirement (#811)
- add new badge (hellogithub) in readme (#805)
- update discord link and badge in readme (#795)
- fix typo in config.py (#776)
- polish rl_utils api docs
- add constraint about numpy<2
- polish macos platform test version to 12
- polish ci python version
News
- PsyDI: Towards a Multi-Modal and Interactive Chatbot for Psychological Assessments
- ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze
- UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Full Changelog: v0.5.1...v0.5.2
Contributors: @PaParaZz1 @zjowowen @YinminZhang @TuTuHuss @nighood @ruiheng123 @rongkunxue @ooooo-create @eltociear
v0.5.1
Env
- add MADDPG pettingzoo example (#774)
- polish NGU Atari configs (#767)
- fix bug in cliffwalking env (#759)
- add PettingZoo replay video demo
- change default max retry in env manager from 5 to 1
Algorithm
- add QGPO diffusion-model related algorithm (#757)
- add HAPPO multi-agent algorithm (#717)
- add DreamerV3 + MiniGrid adaption (#725)
- fix hppo entropy_weight to avoid nan error in log_prob (#761)
- fix structured action bug (#760)
- polish Decision Transformer entry (#754)
- fix EDAC policy/model bug
Fix
- fix env typos
- fix pynng requirements bug
- fix communication module unittest bug
Style
- polish policy API doc (#762) (#764) (#768)
- add agent API doc (#758)
- polish torch_utils/utils API doc (#745) (#747) (#752) (#755) (#763)
News
- AAAI 2024: SO2: A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
- LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Full Changelog: v0.5.0...v0.5.1
Contributors: @PaParaZz1 @zjowowen @nighood @kxzxvbk @puyuan1996 @Cloud-Pku @AltmanD @HarryXuancy
v0.5.0
Env
Algorithm
- add PromptPG algorithm (#667)
- add Plan Diffuser algorithm (#700) (#749)
- add new pipeline implementation of IMPALA algorithm (#713)
- add dropout layers to DQN-style algorithms (#712)
Enhancement
- add new pipeline agent for sac/ddpg/a2c/ppo and Hugging Face support (#637) (#730) (#737)
- add more unittest cases for model (#728)
- add collector logging in new pipeline (#735)
Fix
- fix logger middleware problems (#715)
- fix ppo parallel bug (#709)
- fix typo in optimizer_helper.py (#726)
- fix mlp dropout if condition bug
- fix drex collecting data unittest bugs
Style
- polish env manager/wrapper comments and API doc (#742)
- polish model comments and API doc (#722) (#729) (#734) (#736) (#741)
- polish policy comments and API doc (#732)
- polish rl_utils comments and API doc (#724)
- polish torch_utils comments and API doc (#738)
- update README.md and Colab demo (#733)
- update metaworld docker image
News
- NeurIPS 2023 Spotlight: LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios
- OpenDILab + Hugging Face DRL Model Zoo link
Full Changelog: v0.4.9...v0.5.0
Contributors: @PaParaZz1 @zjowowen @AltmanD @puyuan1996 @kxzxvbk @Super1ce @nighood @Cloud-Pku @zhangpaipai @ruoyuGao @eltociear
v0.4.9
API Change
- refactor the implementation of Decision Transformer, DI-engine supports both discrete and continuous DT outputs with the multi-modal observation now (example:
ding/example/dt.py
) - Update the multi-GPU Distributed Data Parallel (DDP) example (link)
- Change the return value of
InteractionSerialEvaluator
, simplifying redundant results
Env
- add cliffwalking env (#677)
- add lunarlander ppo config and example
Algorithm
- add BCQ offline RL algorithm (#640)
- add Dreamerv3 model-based RL algorithm (#652)
- add tensor stream merge network tools (#673)
- add scatter connection model (#680)
- refactor Decision Transformer in new pipeline and support img input and discrete output (#693)
- add three variants of Bilinear classes and a FiLM class (#703)
Enhancement
- polish offpolicy RL multi-gpu DDP training (#679)
- add middleware for Ape-X distributed pipeline (#696)
- add example for evaluating trained DQN (#706)
Fix
- fix to_ndarray fails to assign dtype for scalars (#708)
- fix evaluator return episode_info compatibility bug
- fix cql example entry wrong config bug
- fix enable_save_figure env interface
- fix redundant env info bug in evaluator
- fix to_item unittest bug
Style
- polish and simplify requirements (#672)
- add Hugging Face Model Zoo badge (#674)
- add openxlab Model Zoo badge (#675)
- fix py37 macos ci bug and update default pytorch from 1.7.1 to 1.12.1 (#678)
- fix mujoco-py compatibility issue for cython<3 (#711)
- fix type spell error (#704)
- fix pypi release actions ubuntu 18.04 bug
- update contact information (e.g. wechat)
- polish algorithm doc tables
New Repo
- DOS: [CVPR 2023] ReasonNet: End-to-End Driving with Temporal and Global Reasoning
Full Changelog: v0.4.8...v0.4.9
Contributors: @PaParaZz1 @zjowowen @zhangpaipai @AltmanD @puyuan1996 @Cloud-Pku @Super1ce @kxzxvbk @jayyoung0802 @Mossforest @lxl2gf @Privilger
v0.4.8
API Change
stop value
is not the necessary field in config, defaults tomath.inf
, users can indicatemax_env_step
ormax_train_iter
in training entry to run the program with a fixed termination condition.
Env
- fix gym hybrid reward dtype bug (#664)
- fix atari env id noframeskip bug (#655)
- fix typo in gym any_trading env (#654)
- update td3bc d4rl config (#659)
- polish bipedalwalker config
Algorithm
- add EDAC offline RL algorithm (#639)
- add LN and GN norm_type support in ResBlock (#660)
- add normal value norm baseline for PPOF (#658)
- polish last layer init/norm in MLP (#650)
- polish TD3 monitor variable
Enhancement
- add MAPPO/MASAC task example (#661)
- add PPO example for complex env observation (#644)
- add barrier middleware (#570)
Fix
- fix abnormal collector log and add record_random_collect option (#662)
- fix to_item compatibility bug (#646)
- fix trainer dtype transform compatibility bug
- fix pettingzoo 1.23.0 compatibility bug
- fix ensemble head unittest bug
Style
New Repo
- LightZero: A lightweight and efficient MCTS/AlphaZero/MuZero algorithm toolkit.
Full Changelog: v0.4.6...v0.4.7
Contributors: @PaParaZz1 @zjowowen @puyuan1996 @SolenoidWGT @Super1ce @karroyan @zhangpaipai @eltociear
v0.4.7
API Change
- remove the requirements of sub fields (learn/collect/eval) in the policy config (users can define their own config formats)
- use
wandb
as the default logger in task pipeline - remove
value_network
config field and implementations in SAC and related algorithms
Env
- add dmc2gym env support and baseline (#451)
- update pettingzoo to the latest version (#597)
- polish icm/rnd+onppo config bugs and add app_door_to_key env (#564)
- add lunarlander continuous TD3/SAC config
- polish lunarlander discrete C51 config
Algorithm
- add Procedure Cloning (PC) imitation learning algorithm (#514)
- add Munchausen Reinforcement Learning (MDQN) algorithm (#590)
- add reward/value norm methods: popart & value rescale & symlog (#605)
- polish reward model config and training pipeline (#624)
- add PPOF reward space demo support (#608)
- add PPOF Atari demo support (#589)
- polish dqn default config and env examples (#611)
- polish comment and clean code about SAC
Enhancement
- add language model (e.g. GPT) training utils (#625)
- remove policy cfg sub fields requirements (#620)
- add full wandb support (#579)
Fix
- fix confusing shallow copy operation about next_obs (#641)
- fix unsqueeze action_args in PDQN when shape is 1 (#599)
- fix evaluator return_info tensor type bug (#592)
- fix deque buffer wrapper PER bug (#586)
- fix reward model save method compatibility bug
- fix logger assertion and unittest bug
- fix bfs test py3.9 compatibility bug
- fix zergling collector unittest bug
Style
- add DI-engine torch-rpc p2p communication docker (#628)
- add D4RL docker (#591)
- correct typo in task (#617)
- correct typo in time_helper (#602)
- polish readme and add treetensor example
- update contributing doc
New Plan
- Call for contributors about DI-engine (#621)
Full Changelog: v0.4.6...v0.4.7
Contributors: @PaParaZz1 @karroyan @zjowowen @ruoyuGao @kxzxvbk @nighood @song2181 @SolenoidWGT @PSHarold @jimmydengpeng @eltociear
v0.4.6
API Change
- middleware:
CkptSaver(cfg, policy, train_freq=100)
->CkptSaver(policy, cfg.exp_name, train_freq=100)
Env
- add metadrive env and related ppo config (#574)
- add acrobot env and related dqn config (#577)
- add carracing in box2d (#575)
- add new gym hybrid viz (#563)
- update cartpole IL config (#578)
Algorithm
Enhancement
Fix
- fix to_device and prev_state bug when using ttorch (#571)
- fix py38 and numpy unittest bugs (#565)
- fix typo in contrastive_loss.py (#572)
- fix dizoo envs pkg installation bugs
- fix multi_trainer middleware unittest bug
Style
- add evogym docker (#580)
- fix metaworld docker bug
- fix setuptools high version incompatibility bug
- extend treetensor lowest version
New Paper
- GoBigger: [ICLR 2023] A Scalable Platform for Cooperative-Competitive Multi-Agent Interactive Simulation
Contributors: @PaParaZz1 @puyuan1996 @timothijoe @Cloud-Pku @ruoyuGao @Super1ce @karroyan @kxzxvbk @eltociear
v0.4.5
API Change
- Move default examples about adding new env from extending
BaseEnv
to utilizeDingEnvWrapper
- rename
final_eval_reward
toeval_episode_return
in all related codes (including envs and evaluators)
Env
- add beergame supply chain optimization env (#512)
- add env gym_pybullet_drones (#526)
- rename
eval reward
toepisode return
(#536)
Algorithm
- add policy gradient algo implementation (#544)
- add MADDPG algo implementation (#550)
- add IMPALA continuous algo implementation (#551)
- add MADQN algo implementation (#540)
Enhancement
- add new task IMPALA-type distributed training scheme (#321)
- add load and save method for replaybuffer (#542)
- add more DingEnvWrapper example (#525)
- add evaluator more info viz support (#538)
- add trackback log for subprocess env manager (#534)
Fix
- fix halfcheetah td3 config file (#537)
- fix mujoco action_clip args compatibility bug (#535)
- fix atari a2c config entry bug
- fix drex unittest compatibility bug
Style
- add Roadmap issue of DI-engine (#548)
- update related project link and new env doc
New Project
- PPOxFamily: PPO x Family DRL Tutorial Course
- ACE: [AAAI 2023] Official PyTorch implementation of paper "ACE: Cooperative Multi-agent Q-learning with Bidirectional Action-Dependency".
Contributors: @PaParaZz1 @sailxjx @zjowowen @hiha3456 @Weiyuhong-1998 @kxzxvbk @song2181 @zerlinwang
v0.4.4
API Change
- context in new task pipeline is implemented by
dataclass
now, rather thandict
- recommend visulization is
wandb
now, rather thantensorboard
Env
- add modified gym-hybrid including moving, sliding and hardmove (#505) (#519)
- add evogym support (#495) (#527)
- add save_replay_gif option (#506)
- adapt minigrid_env and related config to latest MiniGrid v2.0.0 (#500)
Algorithm
- add pcgrad optimizer (#489)
- add some features in MLP and ResBlock (#511)
- delete mcts related modules (#518) (we will release a MCTS repo in future)
Enhancement
- add wandb middleware and demo (#488) (#523) (#528)
- add new properties in Context (#499)
- add single env policy wrapper for policy deployment (demo)
- add custom model demo and doc (文档)
Fix
- fix build logger args and unittests (#522)
- fix total_loss calculation in PDQN (#504)
- fix save gif function bug
- fix level sample unittest bug
Style
- update contact email address (#503)
- polish env log and resblock name
- add details button in readme
New Repo
- DI-1024: Deep Reinforcement Learning + 1024 Game
Contributors: @PaParaZz1 @puyuan1996 @karroyan @hiha3456 @davide97l @Weiyuhong-1998 @zjowowen @norman26625
v0.4.3
Env
- add rule-based gomoku expert (#465)
Algorithm
- fix a2c policy batch size bug (#481)
- enable activation option in collaq attention and mixer
- minor fix about IBC (#477)
Enhancement
- add IGM support (#486)
- add tb logger middleware and demo
Fix
- the type conversion in ding_env_wrapper (#483)
- di-orchestrator version bug in unittest (#479)
- data collection errors caused by shallow copies (#475)
- gym==0.26.0 seed args bug
Style
- add readme tutorial link(environment & algorithm) (#490) (#493)
- adjust location of the default_model method in policy (#453)
New Repo
- DI-sheep: Deep Reinforcement Learning + 3 Tiles Game
Contributors: @PaParaZz1 @nighood @norman26625 @ZHZisZZ @cpwan @mahuangxu