v0.8.0
✨ What's New ✨
1. HF File System Streaming (#711)
Streaming now supports streaming data from HF file system! This adds another popular backend as an option to host your data.
What's Changed
- Bump fastapi from 0.110.2 to 0.111.0 by @dependabot in #670
- Fix: having zero bytes files after converting spark dataframe to MDS saved on dbfs:/Volumes by @XiaohanZhangCMU in #668
- Ensure shards cannot be larger than 4GB by @snarayan21 in #672
- Helpful error on
py1e
for improperly written datasets by @snarayan21 in #673 - Bump pytest from 8.2.0 to 8.2.1 by @dependabot in #680
- Update platform references by @aspfohl in #675
- Update CODEOWNERS by @karan6181 in #681
- Fix
batch_size
typo forStream
object in docs by @snarayan21 in #682 - Bump databricks-sdk from 0.27.0 to 0.27.1 by @dependabot in #679
- Improve local temp directory error when only
remote
is specified by @snarayan21 in #683 - Fix node calculation in
replication
forWorld
object by @snarayan21 in #685 - Warning condition changed for Sequence Parallelism by @XiaohanZhangCMU in #688
- Bump pydantic from 2.7.1 to 2.7.2 by @dependabot in #692
- Bump uvicorn from 0.29.0 to 0.30.1 by @dependabot in #691
- Make sure epoch_size is an int by @snarayan21 in #693
- Bump databricks-sdk from 0.27.1 to 0.28.0 by @dependabot in #687
- Bump pytest from 8.2.1 to 8.2.2 by @dependabot in #697
- fix: expand user path for Writer's output directory. by @huxuan in #694
- Bump pydantic from 2.7.2 to 2.7.3 by @dependabot in #696
- Fix edge cases with scalar or empty numpy array encoding by @snarayan21 in #702
- Raise IndexError in
Spanner
object instead ofValueError
by @snarayan21 in #701 - Fix linting issues with numpy 2 by @snarayan21 in #705
- Bump pydantic from 2.7.3 to 2.7.4 by @dependabot in #704
- Enable correct resumption from the end of an epoch by @snarayan21 in #700
- Fix
drop_first
checking in partitioning to account forworld_size
divisibility by @snarayan21 in #706 - fix convert imagenet by @Hprairie in #708
- Bump pytest-split from 0.8.2 to 0.9.0 by @dependabot in #710
- Remove duplicate
dbfs:
prefix from error message by @vanshcsingh in #712 - enable adaptive retry for s3 download by @bigning in #713
- Upgrade ci_testing, remove codeql by @snarayan21 in #714
- Fix Linting from Pillow version update by @XiaohanZhangCMU in #719
- Bump pydantic from 2.7.4 to 2.8.2 by @dependabot in #718
- Bump databricks-sdk from 0.28.0 to 0.29.0 by @dependabot in #715
- Add HF File System Support to Streaming by @orionw in #711
- Improve error message on non-0 rank when index file download failed by @bigning in #723
- Bump pytest from 8.2.2 to 8.3.2 by @dependabot in #735
- Bump uvicorn from 0.30.1 to 0.30.3 by @dependabot in #730
- Bump fastapi from 0.111.0 to 0.111.1 by @dependabot in #724
- Bump Streaming Version to 0.8.0 by @mvpatel2000 in #738
New Contributors
- @aspfohl made their first contribution in #675
- @huxuan made their first contribution in #694
- @Hprairie made their first contribution in #708
- @vanshcsingh made their first contribution in #712
- @orionw made their first contribution in #711
Full Changelog: v0.7.6...v0.8.0