Skip to content

v0.5.0: Big refactor, `GlobalPowerLimitOptimizer`

Compare
Choose a tag to compare
@jaywonchung jaywonchung released this 12 Jul 03:34
· 166 commits to master since this release

What's New

Callback-based architecture

  • zeus.callback.Callback is the new backbone for Zeus components
  • GlobalPowerLimitOptimizer is the shiny new way to online-profile and optimize the power limit of DNN training.
  • EarlyStopController monitors and manages all sorts of conditions to determine whether training should stop.

Extensive testing

  • tests/ is richer than ever. With deep component tests with exhaustive parametrization, there are now around 1500 test cases.
  • Especially, zeus.util.testing.ReplayZeusMonitor exposes the same public API as ZeusMonitor but replays the measurement window logs produced by ZeusMonitor, instead of doing actual measurement. With this, Zeus can now be tested without any actual GPUs.