Releases: marovira/helios-ml
Releases · marovira/helios-ml
1.2.3
1.2.2
Breaking Changes
- The Optuna plug-in no longer reports metrics automatically. Please call
report_metrics
whenever the metrics are ready to be sent to the trial.
Bug Fixes
- Fixes an issue where the plug-in would incorrectly attempt to report metrics when
on_validation_end
hadn't been called and therefore mistakenly reported the wrong metrics. This in turn resulted in the plug-in reporting metrics that were one validation cycle behind the current cycle. - Updated the docs to ensure they are consistent with the code.
- Fixed an issue where online docs did not show source code.
Release Notes
1.2.1
Breaking Changes
- The Optuna plug-in no longer takes in
num_trials
as an argument. Please see the feature changes for more details.
Bug Fixes
- The Optuna plug-in now has functionality to correctly resume trials that failed due to user intervention or some other errors. Please see the documentation on the new
enqueue_failed_trials
function for more information.
Full Changelog
1.2.0
Breaking Changes
helios.data.functional.load_image
no longer uses PIL as its back-end and is now using OpenCV. If you're using the function with default arguments, no changes need to be made. However if you're using it asload_image(path, "RGB")
, then you should change toload_image(path, cv2.IMREAD_COLOR)
to get the equivalent behaviour. If you require PIL, you can usehelios.data.functional.load_image_pil
.- Helios now does exception handling internally. As a result, any exception that is not registered in the training or testing exception lists will be handled internally and swallowed by Helios.
Feature Changes
- Replaced PIL in
helios.data.functional.load_image
with OpenCV to allow more flexibility in what types of images are loaded. In order to maintain compatibility with PyTorch,helios.data.functional.load_image_pil
has been added so images can be load through PIL. ToImageTensor
is now type-hinted correctly with all the possible types it accepts.- Exception handling has been improved. The new system correctly logs exceptions in both single and distributed training cases. The side-effect of this is that Helios will now swallow exceptions unless explicitly told otherwise. See the documentation for the
Trainer.fit
andTrainer.test
functions for more details. - The Optuna plug-in now has a system to stop optimisation whenever the given number of trials has been reached. This ensures that the number of trials is reached regardless of the number of runs it takes to get there.
Bug Fixes
- The
Model._val_scores
andModel._test_scores
tables now accept any type as their values.
Full Changelog
1.1.0
1.1.0
Breaking Changes
- Dependencies have been updated. Please see the README for more information.
- Helios now requires a minimum NumPY version of 2.0.0.
- The
TrainingState
struct was previously saved in checkpoints as a dictionary. This has now been changed to save the struct itself, so you must migrate your checkpoints to the new system.
Feature Changes
- Introduces a new plug-in system to extend the functionality of Helios.
- Introduces a new
safe_torch_load
function that wrapstorch.load
withweights_only
set to true. This addresses the warnings coming from PyTorch starting with 2.4.0. - Introduces a way to have the trainer ignore certain exception types when training so they can be caught by the calling code.
- Adds a multi-processing queue to the trainer (available only in distributed mode) that allows data to be passed back to the main process.
- Adds native integration with Optuna through the new
OptunaPlugin
. - Adds a new
CUDAPlugin
that automatically moves batches to the set GPU device.
Bug Fixes
- When setting both CPU and GPU for the trainer, an exception is now raised instead of silently ignoring the CPU flag.
- Unit tests are now expanded to cover all supported versions of Python.
- Protobuf is no longer fixed to be less than 5.0.0.
1.0.0
First official release of Helios
Updates
- Adds new unit tests to ensure device and map locations are correct.
- Adds a way to add text to the default Helios banner.
- Adds a tool to migrate checkpoints created by previous versions of Helios.
- Cleans up and updates all documentation
Breaking Changes
- Checkpoints created with prior versions of Helios will no longer work. You may migrate them to the latest version using
python -m helios.chkpt_migrator <chkpt-root>
Full Changelog
0.3.0
Updates
- Adds a new set of callbacks to the
Model
class that are called at the start/end of each epoch. - Adds a way to set a custom
collate_fn
for the dataloader. - The
Model
no longer contains abstract methods. - Changes the call site of
model.on_training_start
so print statements don't interfere with the progress bar. - Extend the list of optimizers and schedulers so all the ones provided by PyTorch are registered by default.
- Extend the
should_training_stop
functionality to allow breaking out of the loop after a training step. - Adds documentation with Sphinx.
- Allow
__version__
to be directly imported from thehelios
package.
Breaking Changes
ToTensor
has been renamed asToImageTensor
in order to be more explicit about what the class does.
Full Changelog
0.2.0
Updates
- Fixes the way epochs are numbered. This should ensure that all epoch counts are now consistent with each other regardless of training type.
- Fixes an issue where training with iterations and gradient accumulation resulted in half iterations being run after training should've stopped.
- Removes F1, recall, and precision metrics. The implementations were not generic enough to be shipped with Helios.
- Refactors the MAE implementation to make it more generic in terms of the types of tensors it accepts.
- Adds a numpy version of MAE.
Full Changelog
0.1.9
0.1.8
Updates
- Allow easy access to the datasets held by the
DataModule
. Previously there was no direct way of accessing them without having to go through the private members of theDataModule
. This complicated certain cases where the length of the dataset was required. - Added a way to halt training based on arbitrary conditions. The main use-case for this is to allow the
Model
sub-classes to halt training when the trained network has converged to a value or if the network is diverging and there's no reason to continue. - Addresses a potential crash that occurs whenever training occurs on a
None
checkpoint path.