Releases: ml6team/fondant
0.11.dev1
What's Changed
- Write metadata file by @RobbeSneyders in #864
- Update component directory in tag component script by @RobbeSneyders in #866
- Add pre- and post-build script to work around Poetry bug by @RobbeSneyders in #868
Full Changelog: 0.11.dev0...0.11.dev1
0.11.dev0
What's Changed
- Add docker default platform to data explorer by @mrchtr in #841
- Fix exception when invoke consumes with invalid field schema by @mrchtr in #842
- Update readme index weaviate component by @mrchtr in #843
- Create local artifact directory if it does not exist by @mrchtr in #847
- Remove RAG use case custom components by @mrchtr in #848
- Make Fondant installable via git by @RobbeSneyders in #849
- Fix hub generation with new components location by @RobbeSneyders in #851
- Move Dask Client configuration to Component class and use multi-GPU in
embed_images
component by @RobbeSneyders in #852 - Update component dir in build script by @RobbeSneyders in #856
- Make unique index sorted by @RobbeSneyders in #855
- Validate docker versions by @GeorgesLorre in #854
- Infer consume operation if not present in dataset interface by @mrchtr in #859
- Change dask_client to general setup method by @RobbeSneyders in #861
- Add
gpu
extra withdask-cuda
and bump minimum Python version to 3.9 by @RobbeSneyders in #862
Full Changelog: 0.10.1...0.11.dev0
0.10.1
What's Changed
- Add small fixes by @GeorgesLorre in #831
- fix typos by @andres-vv in #837
- Create simple storage writer by @mrchtr in #826
- Update fondant install command in the readme by @mrchtr in #833
- Add reusable write to file component to getting started examples by @mrchtr in #839
- Fix code inspection notebook by @PhilippeMoussalli in #832
- Install only test deps before running tox by @RobbeSneyders in #840
Full Changelog: 0.10.0...0.10.1
0.10.0
🪶 Lightweight components to easily develop and iterate new components
We now support building lightweight components. This is currently the easiest way to get you started in building your own custom components. Lightweight components remove the need to specifying custom files for building components (requirements, Dockerfile, component specification) compared to containerized components.
import pandas as pd
import pyarrow as pa
from fondant.component import PandasTransformComponent
from fondant.pipeline import lightweight_component
@lightweight_component(produces={"z": pa.int32()})
class AddNumber(PandasTransformComponent):
def __init__(self, n: int):
self.n = n
def transform(self, dataframe: pd.DataFrame) -> pd.DataFrame:
dataframe["z"] = dataframe["x"].map(lambda x: x + self.n)
return dataframe
Lightweight Components are constructed by decorating Python functions with the @lightweight_component decorator. The decorator transforms your function into a Fondant components where they can be run on both local and remote runners. 🚀
Some of the benefits of those components are:
⏩ Reduced development efforts
Decrease the amount of work needed to develop a component, this is especially relevant for simpler components that perform simple tasks (e.g., filtering a column on a certain value).
🔄 Accelerated iterations
With the component script integrated inline within your code, the development and iteration process becomes significantly faster.
🛠️ Customization
Despite their lightweight nature, these components remain flexible. Users can still customize them as needed by incorporating extra requirements, specifying a custom image, and more.
Checkout our new guide for more details.
🚥 RAG Component updates
- Added support to create embeddings using an external module instead of having to provide your own embeddings. More info here
- Enabledr hybrid search and reranking to the weaviate retrieve component
What's Changed
- Support applying Lightweight Python components in Pipeline SDK by @GeorgesLorre in #770
- Add support to run lightweight python components in docker runner by @RobbeSneyders in #786
- Enable testing index Weaviate by @PhilippeMoussalli in #790
- Cleanup and add more tests by @GeorgesLorre in #792
- Make embeddings optional in weaviate component by @PhilippeMoussalli in #791
- Integrate argument inference by @RobbeSneyders in #788
- Enable hybrid search by @PhilippeMoussalli in #794
- Enable reranking by @PhilippeMoussalli in #796
- Bump kfp version and enable python 3.11 by @GeorgesLorre in #800
- Support lightweight Python components on Sagemaker by @RobbeSneyders in #804
- Validation for lightweight components by @mrchtr in #793
- Update caching arguments to include
ComponentOp
and theImage
class by @PhilippeMoussalli in #802 - Feature/kfp support for lightweight components by @GeorgesLorre in #803
- Add initial docs on Python Components by @PhilippeMoussalli in #812
- Add fondant base image by @mrchtr in #801
- Update getting started guide by @mrchtr in #816
- Update lightweight docs by @PhilippeMoussalli in #817
- Simplify component init interface by @PhilippeMoussalli in #819
- Enable build of fondant dev base image by @mrchtr in #818
- Simplify component naming by @GeorgesLorre in #815
- Enable write components to cache by @PhilippeMoussalli in #814
- Remove datacomp pipeline reference by @mrchtr in #822
- Fix Readme script generation by @GeorgesLorre in #821
- Fix component readme generation by @GeorgesLorre in #828
- Use fondant dev image if fondant dev version is installed by @mrchtr in #820
- Start from dataset schema for lightweight python component
consumes
by @RobbeSneyders in #789 - Update lightweight docs by @PhilippeMoussalli in #827
- Add produces to lightweight component by @PhilippeMoussalli in #829
Full Changelog: 0.9.0...0.10.0
0.10.dev0
What's Changed
- Support applying Lightweight Python components in Pipeline SDK by @GeorgesLorre in #770
- Add support to run lightweight python components in docker runner by @RobbeSneyders in #786
- Enable testing index Weaviate by @PhilippeMoussalli in #790
- Cleanup and add more tests by @GeorgesLorre in #792
- Make embeddings optional in weaviate component by @PhilippeMoussalli in #791
- Integrate argument inference by @RobbeSneyders in #788
- Enable hybrid search by @PhilippeMoussalli in #794
- Enable reranking by @PhilippeMoussalli in #796
Full Changelog: 0.9.0...0.10.dev0
0.9.0
Highlights
- 🌐 Authenticating to different cloud provider with the Local Runner is now simpler through the SDK AWS SageMaker is now supported as an execution framework for Fondant pipelines
SDK | CLI |
---|---|
from fondant.pipeline.runner import DockerRunner
from fondant.core.schema import CloudCredentialsMount
runner = DockerRunner()
runner.run(
pipeline_ref="<pipeline_ref>",
auth_provider=CloudCredentialsMount.<GCP,AWS,AZURE>,
) |
fondant run local <pipeline_ref> \
--auth-provider <gcp, aws, azure> |
- 🔍 We made it easier to launch the data explorer in a non-blocking way. This allows you to continue working in your notebook while the explorer is running in the background.
🚥 RAG Component updates
New components 🚀
- AWS opensearch indexing component
- Component to load PDF data from local or remote storage. More info here
Updates on existing components 🛠️
- Generalizing the text chunker component to use different chunking techniques. More info here
🛠 Install it now!
pip install fondant==0.9.0
And let us know what you think!
What's Changed
- Update README.md by @CarolineAdam in #726
- Sagemaker doc small update by @PhilippeMoussalli in #733
- Add logs to weaviate by @PhilippeMoussalli in #734
- Revert broken test link by @PhilippeMoussalli in #739
- Add missing explorer screenshots by @PhilippeMoussalli in #743
- Fix retry mechanism embedding component by @PhilippeMoussalli in #736
- Minor adjustments pipeline docstring by @mrchtr in #738
- Update generic readme generation by @PhilippeMoussalli in #737
- Consistent component naming by @PhilippeMoussalli in #745
- Make data explorer non-blocking with docker compose by @PhilippeMoussalli in #731
- Include information on dynamic fields by @PhilippeMoussalli in #744
- Unify operation and component spec by @PhilippeMoussalli in #741
- Add component to index aws opensearch by @shub-kris in #740
- Add named links to hub page by @PhilippeMoussalli in #746
- Propagate dataset schema eagerly by @RobbeSneyders in #748
- Generalize chunk data component by @PhilippeMoussalli in #757
- Accept actual type instead of string representation in Argument class by @RobbeSneyders in #761
- Add env var to force amd platform on docker runner by @GeorgesLorre in #760
- Fix readme generation by @RobbeSneyders in #764
- Add component argument inference by @RobbeSneyders in #763
- Add teardown method by @PhilippeMoussalli in #767
- Add load from pdf component by @PhilippeMoussalli in #765
- Fix ragas component by @PhilippeMoussalli in #759
- Fixing data type in chunk_text component by @mrchtr in #772
- Add teardown method to components by @PhilippeMoussalli in #773
- Handle docker compose errors by @PhilippeMoussalli in #769
- Move integration test into examples by @mrchtr in #756
- Add availability check for docker and docker compose. by @mrchtr in #742
- Fixes by @PhilippeMoussalli in #776
- Move cloud auth to sdk by @PhilippeMoussalli in #779
- Fix load from pdf component by @PhilippeMoussalli in #778
- Fix typo ragas by @PhilippeMoussalli in #781
- Add RAG blogpost announcement by @mrchtr in #777
- Handle nested strings in explorer by @PhilippeMoussalli in #766
New Contributors
- @CarolineAdam made their first contribution in #726
- @shub-kris made their first contribution in #740
Full Changelog: 0.8.0...0.9.0
0.9.dev2
What's Changed
- Move cloud auth to sdk by @PhilippeMoussalli in #779
- Fix load from pdf component by @PhilippeMoussalli in #778
Full Changelog: 0.9.dev1...0.9.dev2
0.9.dev1
What's Changed
- Move integration test into examples by @mrchtr in #756
- Add availability check for docker and docker compose. by @mrchtr in #742
- Fixes by @PhilippeMoussalli in #776
Full Changelog: 0.9.dev0...0.9.dev1
0.9.dev0
What's Changed
- Update README.md by @CarolineAdam in #726
- Sagemaker doc small update by @PhilippeMoussalli in #733
- Add logs to weaviate by @PhilippeMoussalli in #734
- Revert broken test link by @PhilippeMoussalli in #739
- Add missing explorer screenshots by @PhilippeMoussalli in #743
- Fix retry mechanism embedding component by @PhilippeMoussalli in #736
- Minor adjustments pipeline docstring by @mrchtr in #738
- Update generic readme generation by @PhilippeMoussalli in #737
- Consistent component naming by @PhilippeMoussalli in #745
- Make data explorer non-blocking with docker compose by @PhilippeMoussalli in #731
- Include information on dynamic fields by @PhilippeMoussalli in #744
- Unify operation and component spec by @PhilippeMoussalli in #741
- Add component to index aws opensearch by @shub-kris in #740
- Add named links to hub page by @PhilippeMoussalli in #746
- Propagate dataset schema eagerly by @RobbeSneyders in #748
- Generalize chunk data component by @PhilippeMoussalli in #757
- Accept actual type instead of string representation in Argument class by @RobbeSneyders in #761
- Add env var to force amd platform on docker runner by @GeorgesLorre in #760
- Fix readme generation by @RobbeSneyders in #764
- Add component argument inference by @RobbeSneyders in #763
- Add teardown method by @PhilippeMoussalli in #767
- Add load from pdf component by @PhilippeMoussalli in #765
- Fix ragas component by @PhilippeMoussalli in #759
- Fixing data type in chunk_text component by @mrchtr in #772
- Add teardown method to components by @PhilippeMoussalli in #773
- Handle docker compose errors by @PhilippeMoussalli in #769
New Contributors
- @CarolineAdam made their first contribution in #726
- @shub-kris made their first contribution in #740
Full Changelog: 0.8.0...0.9.dev0
0.8.0
Highlights
- 📝 We simplified and improved the way datasets are stored and accessed
- 🚀 The interface to compose a Fondant pipeline is now simpler and more powerful
- 🌐 AWS SageMaker is now supported as an execution framework for Fondant pipelines
- 🔍 The Fondant explorer was improved, especially for text and document data
- 📚 We released a RAG tuning repository powered by Fondant
Read on for more details!
📝 We simplified and improved the way datasets are stored and accessed
We listened to all your feedback and drastically simplified Fondant datasets, while solving some
longstanding issues as part of the design.
Most important for you is that we flattened the datasets, removing the concept of subsets
from
Fondant. Which means you can now access the data fields directly!
Previous | New ✨ |
---|---|
consumes:
images:
fields:
height:
type: int32
width:
type: int32 |
consumes:
height:
type: int32
width:
type: int32 |
import pandas as pd
from fondant.component import PandasTransformComponent
class ExampleComponent(PandasTransformComponent):
This will be available in a future release.
def transform(self, dataframe: pd.DataFrame):
height = dataframe["images"]["height"]
width = dataframe["images"]["width"]
... |
import pandas as pd
from fondant.component import PandasTransformComponent
class ExampleComponent(PandasTransformComponent):
def transform(self, dataframe: pd.DataFrame):
height = dataframe["height"]
width = dataframe["width"]
... |
🚀 The interface to compose a Fondant pipeline is now simpler and more powerful.
You can now chain components together using the read()
, apply()
and write
methods, removing
the need for specifying dependencies separately, making composing pipelines a breeze.
Previous | New ✨ |
---|---|
from fondant.pipeline import Pipeline, component_op
pipeline = Pipeline(
pipeline_name="my-pipeline",
base_path="./data",
)
load_from_hf_hub = ComponentOp(
name="load_from_hf_hub",
arguments={
"dataset_name": "fondant-ai/fondant-cc-25m",
},
)
download_images = ComponentOp.from_registry(
name="download_images",
arguments= {"resize_mode": "no"},
)
pipeline.add_op(load_from_hf_hub)
pipeline.add_op(
download_images,
dependencies=[load_from_hf_hub]
) |
import pyarrow as pa
from fondant.pipeline import Pipeline
pipeline = Pipeline(
name="my-pipeline",
base_path="./data",
)
raw_data = pipeline.read(
"load_from_hf_hub",
arguments={
"dataset_name": "fondant-ai/fondant-cc-25m",
},
produces={
"alt_text": pa.string(),
"image_url": pa.string(),
"license_type": pa.string(),
},
)
images = raw_data.apply(
"download_images",
arguments={"resize_mode": "no"},
) |
Some of the benefits of this new interface are:
- Support for overriding the produces and consumes of a component, allowing you to easily change the output of a component without having to create a custom
fondant_component.yaml
file. - We unlock the future ability to enable eager execution of components and interactive
development of pipelines. Keep an eye on our next releases!
If you want to know more or get started you can check out the documentation
🌐 AWS SageMaker is now supported as an execution framework for Fondant pipelines.
You can now easily run your Fondant pipelines on AWS SageMaker using the fondant run sagemaker <pipeline.py>
command. Run fondant run sagemaker --help
to see the possible configuration options or check out the documentation.
🔍Fondant explorer improvements
We added a lot of improvements to the Fondant explorer, including:
- A pipeline overview showing the data flow through the pipeline
- A document viewer to inspect data (handy for RAG use cases)
- Better filtering, sorting and searching of data while exploring
To get started with the Fondant explorer, check out the documentation.
📚 We released a RAG tuning repository powered by Fondant
This repository helps you tune your RAG system faster and achieve better performance using
Fondant. Find the repository including a full explanation here.
It includes:
- A Fondant pipeline to ingest the data
- A Fondant pipeline to evaluate the data
- Multiple notebooks to go from a basic RAG pipeline to fully auto-tuned RAG pipelines
🔧 New reusable RAG components
A lot of new reusable components were added to the Fondant registry, letting you build new RAG
pipelines quickly!
- Weaviate indexing and retrieval components
- Qdrant indexing
- Ragas evaluation
- LlamaHub loading
- LangChain chunking and
embedding
You can see some of these components in action in the [RAG tuning repository](https://github.
com/ml6team/fondant-usecase-RAG).
🛠 Install it now!
pip install fondant==0.8.0
And let us know what you think!
Detailed changes
- Update fondant_component.yaml by @Hakimovich99 in #647
- feat: Qdrant support by @Anush008 in #646
- Feature/sagemaker compiler by @GeorgesLorre in #662
- Restructure data explorer by @PhilippeMoussalli in #657
- Feature/sagemaker runner by @GeorgesLorre in #664
- Add document viewer to dataset explorer by @PhilippeMoussalli in #666
- Fix cli creds by @PhilippeMoussalli in #669
- Redesign dataset format by @RobbeSneyders in #672
- Explorer front page by @PhilippeMoussalli in #671
- Regenerate qdrant readme by @RobbeSneyders in #673
- Augment DockerRunner to support running from a fondant Pipeline by @GeorgesLorre in #651
- Update tag pattern in prep-release pipeline to match dev versions by @RobbeSneyders in #674
- Fix output dataframe path by @RobbeSneyders in #675
- Fix column names in chunk_text component by @RobbeSneyders in #676
- Set default explorer version to current Fondant version by @RobbeSneyders in #681
- Augment SagemakerRunner to support running from pipeline objects by @GeorgesLorre in #678
- Hide partitions from users by @PhilippeMoussalli in #677
- Explorer new dataset format by @PhilippeMoussalli in #682
- Use cleaner field names in reusable components by @RobbeSneyders in #679
- Add cli commands for sagemaker by @GeorgesLorre in #680
- Explorer search by @PhilippeMoussalli in #691
- Feature/build 2 ecr by @GeorgesLorre in #686
- bugfix old getting started link that had 404 error by @NSFF in #694
- Explorer improve filtering of available runs by @PhilippeMoussalli in #693
- Set default explorer version in python sdk by @RobbeSneyders in #692
- Compile absolute path for custom components by @RobbeSneyders in #696
- Build to AWS ECR on release by @RobbeSneyders in #698
- Add functionality for pullthrough cache rule creation and URI patching by @GeorgesLorre in #697
- Support pipeline factory functions as CLI reference by @RobbeSneyders in #699
- Add logic to handle custom components by @GeorgesLorre in #700
- Move to datasets & apply interface by @r...