diff --git a/docs/art/announcements/RAG.png b/docs/art/announcements/RAG.png new file mode 100644 index 000000000..64ef5cdda Binary files /dev/null and b/docs/art/announcements/RAG.png differ diff --git a/docs/art/data_explorer/explorer_document.png b/docs/art/data_explorer/explorer_document.png new file mode 100644 index 000000000..9dcfc8a13 Binary files /dev/null and b/docs/art/data_explorer/explorer_document.png differ diff --git a/docs/art/runners/sagemaker_run.png b/docs/art/runners/sagemaker_run.png new file mode 100644 index 000000000..cbb1e5d6f Binary files /dev/null and b/docs/art/runners/sagemaker_run.png differ diff --git a/docs/blog/.authors.yml b/docs/blog/.authors.yml index 4a81e59b1..000e89a46 100644 --- a/docs/blog/.authors.yml +++ b/docs/blog/.authors.yml @@ -7,3 +7,7 @@ authors: name: Matthias Richter description: ML Engineer avatar: https://avatars.githubusercontent.com/u/15777729 + GeorgesLorre: + name: Georges LorrΓ© + description: Data Engineer + avatar: https://avatars.githubusercontent.com/u/35808396 diff --git a/docs/blog/posts/2023-12-13|Fondant_0.8_Interface.md b/docs/blog/posts/2023-12-13|Fondant_0.8_Interface.md new file mode 100644 index 000000000..ee3f10b95 --- /dev/null +++ b/docs/blog/posts/2023-12-13|Fondant_0.8_Interface.md @@ -0,0 +1,235 @@ +--- +date: + created: 2023-12-13 +authors: + - GeorgesLorre + - RobbeSneyders +--- + +# Fondant 0.8: Simplification, Sagemaker, RAG, and more! + +Hi all, we released Fondant 0.8, which brings some major new features and improvements: + +* π We simplified and improved the way datasets are stored and accessed +* π The interface to compose a Fondant pipeline is now simpler and more powerful +* π AWS SageMaker is now supported as an execution framework for Fondant pipelines +* π The Fondant explorer was improved, especially for text and document data +* π We released a RAG tuning repository powered by Fondant + +Read on for more details! + + + +## π We simplified and improved the way datasets are stored and accessed + +We listened to all your feedback and drastically simplified Fondant datasets, while solving some +longstanding issues as part of the design. + +Most important for you is that we flattened the datasets, removing the concept of `subsets` from +Fondant. Which means you can now access the data fields directly! + +
Previous | +New β¨ | +
---|---|
+ +```yaml title="fondant_component.yaml" +consumes: + images: + fields: + height: + type: int32 + width: + type: int32 +``` + | ++ +```yaml title="fondant_component.yaml" +consumes: + height: + type: int32 + width: + type: int32 +``` + | +
+ +```python title="src/main.py" +import pandas as pd +from fondant.component import PandasTransformComponent + + +class ExampleComponent(PandasTransformComponent): +This will be available in a future release. + def transform(self, dataframe: pd.DataFrame): + height = dataframe["images"]["height"] + width = dataframe["images"]["width"] + ... +``` + + | ++ +```python title="src/main.py" +import pandas as pd +from fondant.component import PandasTransformComponent + + +class ExampleComponent(PandasTransformComponent): + + def transform(self, dataframe: pd.DataFrame): + height = dataframe["height"] + width = dataframe["width"] + ... + +``` + + | +
Previous | +New β¨ | +
---|---|
+ +```python title="pipeline.py" +from fondant.pipeline import Pipeline, component_op + +pipeline = Pipeline( + pipeline_name="my-pipeline", + base_path="./data", +) + +load_from_hf_hub = ComponentOp( + name="load_from_hf_hub", + arguments={ + "dataset_name": "fondant-ai/fondant-cc-25m", + }, +) + +download_images = ComponentOp.from_registry( + name="download_images", + arguments= {"resize_mode": "no"}, +) + +pipeline.add_op(load_from_hf_hub) +pipeline.add_op( + download_images, + dependencies=[load_from_hf_hub] +) + +``` + + | ++ +```python title="pipeline.py" +import pyarrow as pa +from fondant.pipeline import Pipeline + +pipeline = Pipeline( + name="my-pipeline", + base_path="./data", +) + +raw_data = pipeline.read( + "load_from_hf_hub", + arguments={ + "dataset_name": "fondant-ai/fondant-cc-25m", + }, + produces={ + "alt_text": pa.string(), + "image_url": pa.string(), + "license_type": pa.string(), + }, +) + +images = raw_data.apply( + "download_images", + arguments={"resize_mode": "no"}, +) +``` + + | +
- π You can now run your Fondant pipelines on Vertex AI! - Read more
{% endblock %} \ No newline at end of file diff --git a/docs/runners/vertex.md b/docs/runners/vertex.md index ac5978370..b1bf583c2 100644 --- a/docs/runners/vertex.md +++ b/docs/runners/vertex.md @@ -29,7 +29,7 @@ info [here](https://codelabs.developers.google.com/vertex-pipelines-intro#2) ```bash fondant run vertex