v0.7.2
🚀 Streaming v0.7.2
Streaming v0.7.2
is released! Install via pip
:
pip install --upgrade mosaicml-streaming==0.7.2
💎 New Features
1. Canned ACL Support (#512)
Add support for the Canned ACL using the environment variable S3_CANNED_ACL
for AWS S3. Checkout Canned ACL document on how to use it.
2. Allow/reject datasets containing unsafe types (#519)
The pickle serialization format, one of the available MDS encodings, is a potential security vulnerability. We added a boolean flag allow_unsafe_types
in the StreamingDataset
class to allow or reject datasets containing Pickle.
🐛 Bug Fixes
- Retrieve batch size correctly from vision yamls for the streaming simulator (#501)
- Fix for CVE-2023-47248 (#504)
- Streaming simulator bug fixes (proportion, repeat, yaml ingestion) (#514)
- Proportion of None instead of a string 'None' is now handled correctly.
- Repeat of None instead of a string 'None' is now handled correctly.
- Added warning for StreamingDataset subclass defaults
- Fix sample partitioning algorithm bug for tiny datasets (#517)
🔧 Improvements
- Added warning messages for new streaming dataset defaults to inform users about the old and new values. (#502)
What's Changed
- Migrate pydocstyle to ruff by @Skylion007 in #500
- Bump fastapi from 0.104.0 to 0.104.1 by @dependabot in #496
- Bump uvicorn from 0.23.2 to 0.24.0.post1 by @dependabot in #497
- Retrieve batch size correctly from vision yamls for simulator by @snarayan21 in #501
- Adding warning messages for new defaults by @snarayan21 in #502
- Fix for CVE-2023-47248 by @bandish-shah in #504
- Bump pydantic from 2.4.2 to 2.5.2 by @dependabot in #513
- Bump yamllint from 1.32.0 to 1.33.0 by @dependabot in #506
- Fixed comments and update dataframe_to_MDS API signature by @karan6181 in #515
- Simulator bug fixes (proportion, repeat, yaml ingestion) by @snarayan21 in #514
- Add support for the Canned ACL environment variable for AWS S3 by @karan6181 in #512
- Fixed bugs when trying to use very small datasets by @snarayan21 in #517
- Bump databricks-sdk from 0.8.0 to 0.14.0 by @dependabot in #518
- Add flag to allow or reject datasets containing unsafe types (i.e., Pickle) by @knighton in #519
- improve exception error messages for downloading by @Skylion007 in #525
- doc: add NDArray format by @OrenLeung in #527
- Offload exception to mds_write. by @XiaohanZhangCMU in #528
- Add allow_unsafe_types parameter to the streaming regression tests by @karan6181 in #531
- Bump version to 0.7.2 by @karan6181 in #532
New Contributors
- @OrenLeung made their first contribution in #527
Full Changelog: v0.7.1...v0.7.2