diff --git a/docs/art/announcements/RAG.png b/docs/art/announcements/RAG.png new file mode 100644 index 000000000..64ef5cdda Binary files /dev/null and b/docs/art/announcements/RAG.png differ diff --git a/docs/art/data_explorer/explorer_document.png b/docs/art/data_explorer/explorer_document.png new file mode 100644 index 000000000..9dcfc8a13 Binary files /dev/null and b/docs/art/data_explorer/explorer_document.png differ diff --git a/docs/art/runners/sagemaker_run.png b/docs/art/runners/sagemaker_run.png new file mode 100644 index 000000000..cbb1e5d6f Binary files /dev/null and b/docs/art/runners/sagemaker_run.png differ diff --git a/docs/blog/.authors.yml b/docs/blog/.authors.yml index 4a81e59b1..000e89a46 100644 --- a/docs/blog/.authors.yml +++ b/docs/blog/.authors.yml @@ -7,3 +7,7 @@ authors: name: Matthias Richter description: ML Engineer avatar: https://avatars.githubusercontent.com/u/15777729 + GeorgesLorre: + name: Georges LorrΓ© + description: Data Engineer + avatar: https://avatars.githubusercontent.com/u/35808396 diff --git a/docs/blog/posts/2023-12-13|Fondant_0.8_Interface.md b/docs/blog/posts/2023-12-13|Fondant_0.8_Interface.md new file mode 100644 index 000000000..ee3f10b95 --- /dev/null +++ b/docs/blog/posts/2023-12-13|Fondant_0.8_Interface.md @@ -0,0 +1,235 @@ +--- +date: + created: 2023-12-13 +authors: + - GeorgesLorre + - RobbeSneyders +--- + +# Fondant 0.8: Simplification, Sagemaker, RAG, and more! + +Hi all, we released Fondant 0.8, which brings some major new features and improvements: + +* πŸ“ We simplified and improved the way datasets are stored and accessed +* πŸš€ The interface to compose a Fondant pipeline is now simpler and more powerful +* 🌐 AWS SageMaker is now supported as an execution framework for Fondant pipelines +* πŸ” The Fondant explorer was improved, especially for text and document data +* πŸ“š We released a RAG tuning repository powered by Fondant + +Read on for more details! + + + +## πŸ“ We simplified and improved the way datasets are stored and accessed + +We listened to all your feedback and drastically simplified Fondant datasets, while solving some +longstanding issues as part of the design. + +Most important for you is that we flattened the datasets, removing the concept of `subsets` from +Fondant. Which means you can now access the data fields directly! + + + + + + + + + + + + + + +
PreviousNew ✨
+ +```yaml title="fondant_component.yaml" +consumes: + images: + fields: + height: + type: int32 + width: + type: int32 +``` + + +```yaml title="fondant_component.yaml" +consumes: + height: + type: int32 + width: + type: int32 +``` +
+ +```python title="src/main.py" +import pandas as pd +from fondant.component import PandasTransformComponent + + +class ExampleComponent(PandasTransformComponent): +This will be available in a future release. + def transform(self, dataframe: pd.DataFrame): + height = dataframe["images"]["height"] + width = dataframe["images"]["width"] + ... +``` + + + +```python title="src/main.py" +import pandas as pd +from fondant.component import PandasTransformComponent + + +class ExampleComponent(PandasTransformComponent): + + def transform(self, dataframe: pd.DataFrame): + height = dataframe["height"] + width = dataframe["width"] + ... + +``` + +
+ +## πŸš€ The interface to compose a Fondant pipeline is now simpler and more powerful. + +You can now chain components together using the `read()`, `apply()` and `write` methods, removing +the need for specifying dependencies separately, making composing pipelines a breeze. + + + + + + + + + + +
PreviousNew ✨
+ +```python title="pipeline.py" +from fondant.pipeline import Pipeline, component_op + +pipeline = Pipeline( + pipeline_name="my-pipeline", + base_path="./data", +) + +load_from_hf_hub = ComponentOp( + name="load_from_hf_hub", + arguments={ + "dataset_name": "fondant-ai/fondant-cc-25m", + }, +) + +download_images = ComponentOp.from_registry( + name="download_images", + arguments= {"resize_mode": "no"}, +) + +pipeline.add_op(load_from_hf_hub) +pipeline.add_op( + download_images, + dependencies=[load_from_hf_hub] +) + +``` + + + +```python title="pipeline.py" +import pyarrow as pa +from fondant.pipeline import Pipeline + +pipeline = Pipeline( + name="my-pipeline", + base_path="./data", +) + +raw_data = pipeline.read( + "load_from_hf_hub", + arguments={ + "dataset_name": "fondant-ai/fondant-cc-25m", + }, + produces={ + "alt_text": pa.string(), + "image_url": pa.string(), + "license_type": pa.string(), + }, +) + +images = raw_data.apply( + "download_images", + arguments={"resize_mode": "no"}, +) +``` + +
+ +Some of the benefits of this new interface are: + +- Support for overriding the produces and consumes of a component, allowing you to easily change the output of a component without having to create a custom `fondant_component.yaml` file. +- We unlock the future ability to enable eager execution of components and interactive + development of pipelines. Keep an eye on our next releases! + +If you want to know more or get started you can check out the [documentation](https://fondant.ai/en/latest/pipeline/) + +## 🌐 AWS SageMaker is now supported as an execution framework for Fondant pipelines. + +You can now easily run your Fondant pipelines on AWS SageMaker using the `fondant run sagemaker ` command. Run `fondant run sagemaker --help` to see the possible configuration options or check out the [documentation](https://fondant.ai/en/latest/runners/sagemaker/). + +![Sagemaker pipeline](../../art/runners/sagemaker_run.png) + + +## πŸ”Fondant explorer improvements + +We added a lot of improvements to the Fondant explorer, including: + +- A pipeline overview showing the data flow through the pipeline +- A document viewer to inspect data (handy for RAG use cases) +- Better filtering, sorting and searching of data while exploring + +![General overview data explorer](../../art/data_explorer/general_overview.png) +![Document view data explorer](../../art/data_explorer/explorer_document.png) + +To get started with the Fondant explorer, check out the [documentation](https://fondant.ai/en/latest/data_explorer/). + + +## πŸ“š We released a RAG tuning repository powered by Fondant + +This repository helps you tune your RAG system faster and achieve better performance using +Fondant. Find the repository including a full explanation [here](https://github.com/ml6team/fondant-usecase-RAG). + +![RAG tuning](../../art/announcements/RAG.png) + +It includes: + +- A Fondant pipeline to ingest the data +- A Fondant pipeline to evaluate the data +- Multiple notebooks to go from a basic RAG pipeline to fully auto-tuned RAG pipelines + +## πŸ”§ New reusable RAG components + +A lot of new reusable components were added to the Fondant registry, letting you build new RAG +pipelines quickly! + +- Weaviate [indexing](https://github.com/ml6team/fondant/tree/main/components/index_weaviate) and [retrieval](https://github.com/ml6team/fondant/tree/main/components/retrieve_from_weaviate) components +- Qdrant [indexing](https://github.com/ml6team/fondant/blob/main/components/index_qdrant/README.md) +- Ragas [evaluation](https://github.com/ml6team/fondant/blob/main/components/evaluate_ragas/README.md) +- LlamaHub [loading](https://github.com/ml6team/fondant/tree/main/components/load_with_llamahub) +- LangChain [chunking](https://github.com/ml6team/fondant/tree/main/components/chunk_text) and + [embedding](https://github.com/ml6team/fondant/tree/main/components/embed_text) + +You can see some of these components in action in the [RAG tuning repository](https://github. +com/ml6team/fondant-usecase-RAG). + +## πŸ›  Install it now! + +```bash +pip install fondant==0.8.0 +``` + +And let us know what you think! \ No newline at end of file diff --git a/docs/overrides/main.html b/docs/overrides/main.html index 6b640abf4..4f8969d6b 100644 --- a/docs/overrides/main.html +++ b/docs/overrides/main.html @@ -2,8 +2,9 @@ {% block announce %}

- πŸŒ€ You can now run your Fondant pipelines on Vertex AI! - Read more

{% endblock %} \ No newline at end of file diff --git a/docs/runners/vertex.md b/docs/runners/vertex.md index ac5978370..b1bf583c2 100644 --- a/docs/runners/vertex.md +++ b/docs/runners/vertex.md @@ -29,7 +29,7 @@ info [here](https://codelabs.developers.google.com/vertex-pipelines-intro#2) ```bash fondant run vertex \ --project-id $PROJECT_ID \ - --project-region $PROJECT_REGION \ + --region $PROJECT_REGION \ --service-account $SERVICE_ACCOUNT ``` @@ -52,7 +52,7 @@ info [here](https://codelabs.developers.google.com/vertex-pipelines-intro#2) runner = VertexRunner( project_id=project_id, - project_region=project_region, + region=project_region, service_account=service_account) ) runner.run(input_spec=)