Skip to content

Releases: ml6team/fondant

0.11.dev1

20 Feb 21:42
a172ab3
Compare
Choose a tag to compare
0.11.dev1 Pre-release
Pre-release

What's Changed

Full Changelog: 0.11.dev0...0.11.dev1

0.11.dev0

20 Feb 13:38
4a18696
Compare
Choose a tag to compare
0.11.dev0 Pre-release
Pre-release

What's Changed

Full Changelog: 0.10.1...0.11.dev0

0.10.1

05 Feb 13:27
36d0b2a
Compare
Choose a tag to compare

What's Changed

Full Changelog: 0.10.0...0.10.1

0.10.0

31 Jan 13:26
c009879
Compare
Choose a tag to compare

🪶 Lightweight components to easily develop and iterate new components

We now support building lightweight components. This is currently the easiest way to get you started in building your own custom components. Lightweight components remove the need to specifying custom files for building components (requirements, Dockerfile, component specification) compared to containerized components.

import pandas as pd
import pyarrow as pa
from fondant.component import PandasTransformComponent
from fondant.pipeline import lightweight_component

@lightweight_component(produces={"z": pa.int32()})
class AddNumber(PandasTransformComponent):
    def __init__(self, n: int):
        self.n = n

    def transform(self, dataframe: pd.DataFrame) -> pd.DataFrame:
        dataframe["z"] = dataframe["x"].map(lambda x: x + self.n)
        return dataframe

Lightweight Components are constructed by decorating Python functions with the @lightweight_component decorator. The decorator transforms your function into a Fondant components where they can be run on both local and remote runners. 🚀

Some of the benefits of those components are:

⏩ Reduced development efforts

Decrease the amount of work needed to develop a component, this is especially relevant for simpler components that perform simple tasks (e.g., filtering a column on a certain value).

🔄 Accelerated iterations

With the component script integrated inline within your code, the development and iteration process becomes significantly faster.

🛠️ Customization

Despite their lightweight nature, these components remain flexible. Users can still customize them as needed by incorporating extra requirements, specifying a custom image, and more.

Checkout our new guide for more details.

🚥 RAG Component updates

  • Added support to create embeddings using an external module instead of having to provide your own embeddings. More info here
  • Enabledr hybrid search and reranking to the weaviate retrieve component

What's Changed

Full Changelog: 0.9.0...0.10.0

0.10.dev0

22 Jan 10:21
72d6822
Compare
Choose a tag to compare
0.10.dev0 Pre-release
Pre-release

What's Changed

Full Changelog: 0.9.0...0.10.dev0

0.9.0

16 Jan 09:51
410c3f6
Compare
Choose a tag to compare

Highlights

  • 🌐 Authenticating to different cloud provider with the Local Runner is now simpler through the SDK AWS SageMaker is now supported as an execution framework for Fondant pipelines
SDK CLI
from fondant.pipeline.runner import DockerRunner
from fondant.core.schema import CloudCredentialsMount

runner = DockerRunner()
runner.run(
    pipeline_ref="<pipeline_ref>",
    auth_provider=CloudCredentialsMount.<GCP,AWS,AZURE>,
)
fondant run local <pipeline_ref> \
      --auth-provider <gcp, aws, azure> 
  • 🔍 We made it easier to launch the data explorer in a non-blocking way. This allows you to continue working in your notebook while the explorer is running in the background.

🚥 RAG Component updates

New components 🚀

  • AWS opensearch indexing component
  • Component to load PDF data from local or remote storage. More info here

Updates on existing components 🛠️

  • Generalizing the text chunker component to use different chunking techniques. More info here

🛠 Install it now!

pip install fondant==0.9.0

And let us know what you think!

What's Changed

New Contributors

Full Changelog: 0.8.0...0.9.0

0.9.dev2

15 Jan 11:44
2738aad
Compare
Choose a tag to compare
0.9.dev2 Pre-release
Pre-release

What's Changed

Full Changelog: 0.9.dev1...0.9.dev2

0.9.dev1

12 Jan 12:37
255548d
Compare
Choose a tag to compare
0.9.dev1 Pre-release
Pre-release

What's Changed

Full Changelog: 0.9.dev0...0.9.dev1

0.9.dev0

11 Jan 15:09
16888c8
Compare
Choose a tag to compare
0.9.dev0 Pre-release
Pre-release

What's Changed

New Contributors

Full Changelog: 0.8.0...0.9.dev0

0.8.0

13 Dec 19:53
f2b1b01
Compare
Choose a tag to compare

Highlights

  • 📝 We simplified and improved the way datasets are stored and accessed
  • 🚀 The interface to compose a Fondant pipeline is now simpler and more powerful
  • 🌐 AWS SageMaker is now supported as an execution framework for Fondant pipelines
  • 🔍 The Fondant explorer was improved, especially for text and document data
  • 📚 We released a RAG tuning repository powered by Fondant

Read on for more details!

📝 We simplified and improved the way datasets are stored and accessed

We listened to all your feedback and drastically simplified Fondant datasets, while solving some
longstanding issues as part of the design.

Most important for you is that we flattened the datasets, removing the concept of subsets from
Fondant. Which means you can now access the data fields directly!

Previous New ✨
consumes:
  images:
    fields:
      height:
        type: int32
      width:
        type: int32
consumes:
  height:
    type: int32
  width:
    type: int32
import pandas as pd
from fondant.component import PandasTransformComponent


class ExampleComponent(PandasTransformComponent):
This will be available in a future release.
    def transform(self, dataframe: pd.DataFrame):
        height = dataframe["images"]["height"]
        width = dataframe["images"]["width"]
        ...
import pandas as pd
from fondant.component import PandasTransformComponent


class ExampleComponent(PandasTransformComponent):

    def transform(self, dataframe: pd.DataFrame):
        height = dataframe["height"]
        width = dataframe["width"]
        ...

🚀 The interface to compose a Fondant pipeline is now simpler and more powerful.

You can now chain components together using the read(), apply() and write methods, removing
the need for specifying dependencies separately, making composing pipelines a breeze.

Previous New ✨
from fondant.pipeline import Pipeline, component_op

pipeline = Pipeline(
    pipeline_name="my-pipeline",
    base_path="./data",
)

load_from_hf_hub = ComponentOp(
    name="load_from_hf_hub",
    arguments={
        "dataset_name": "fondant-ai/fondant-cc-25m",
    },
)

download_images = ComponentOp.from_registry(
    name="download_images",
    arguments= {"resize_mode": "no"},
)

pipeline.add_op(load_from_hf_hub)
pipeline.add_op(
    download_images, 
    dependencies=[load_from_hf_hub]
)
import pyarrow as pa
from fondant.pipeline import Pipeline

pipeline = Pipeline(
    name="my-pipeline",
    base_path="./data",
)

raw_data = pipeline.read(
    "load_from_hf_hub",
    arguments={
        "dataset_name": "fondant-ai/fondant-cc-25m",
    },
    produces={
        "alt_text": pa.string(),
        "image_url": pa.string(),
        "license_type": pa.string(),
    },
)

images = raw_data.apply(
    "download_images",
    arguments={"resize_mode": "no"},
)

Some of the benefits of this new interface are:

  • Support for overriding the produces and consumes of a component, allowing you to easily change the output of a component without having to create a custom fondant_component.yaml file.
  • We unlock the future ability to enable eager execution of components and interactive
    development of pipelines. Keep an eye on our next releases!

If you want to know more or get started you can check out the documentation

🌐 AWS SageMaker is now supported as an execution framework for Fondant pipelines.

You can now easily run your Fondant pipelines on AWS SageMaker using the fondant run sagemaker <pipeline.py> command. Run fondant run sagemaker --help to see the possible configuration options or check out the documentation.

🔍Fondant explorer improvements

We added a lot of improvements to the Fondant explorer, including:

  • A pipeline overview showing the data flow through the pipeline
  • A document viewer to inspect data (handy for RAG use cases)
  • Better filtering, sorting and searching of data while exploring

To get started with the Fondant explorer, check out the documentation.

📚 We released a RAG tuning repository powered by Fondant

This repository helps you tune your RAG system faster and achieve better performance using
Fondant. Find the repository including a full explanation here.

It includes:

  • A Fondant pipeline to ingest the data
  • A Fondant pipeline to evaluate the data
  • Multiple notebooks to go from a basic RAG pipeline to fully auto-tuned RAG pipelines

🔧 New reusable RAG components

A lot of new reusable components were added to the Fondant registry, letting you build new RAG
pipelines quickly!

You can see some of these components in action in the [RAG tuning repository](https://github.
com/ml6team/fondant-usecase-RAG).

🛠 Install it now!

pip install fondant==0.8.0

And let us know what you think!

Detailed changes

Read more