diff --git a/docs/art/data_explorer/data_explorer.png b/docs/art/data_explorer/data_explorer.png new file mode 100644 index 000000000..040148158 Binary files /dev/null and b/docs/art/data_explorer/data_explorer.png differ diff --git a/docs/art/data_explorer/data_explorer_numeric_analysis.png b/docs/art/data_explorer/data_explorer_numeric_analysis.png new file mode 100644 index 000000000..564cfd6bd Binary files /dev/null and b/docs/art/data_explorer/data_explorer_numeric_analysis.png differ diff --git a/docs/art/data_explorer/image_explorer.png b/docs/art/data_explorer/image_explorer.png new file mode 100644 index 000000000..7cdce0f00 Binary files /dev/null and b/docs/art/data_explorer/image_explorer.png differ diff --git a/docs/components/hub.md b/docs/components/hub.md index 53dce2ecc..9b49b00a2 100644 --- a/docs/components/hub.md +++ b/docs/components/hub.md @@ -10,6 +10,10 @@ Below you can find the reusable components offered by Fondant. --8<-- "components/caption_images/README.md:1" +??? "chunk_text" + + --8<-- "components/chunk_text/README.md:1" + ??? "download_images" --8<-- "components/download_images/README.md:1" @@ -18,22 +22,18 @@ Below you can find the reusable components offered by Fondant. --8<-- "components/embed_images/README.md:1" -??? "embedding_based_laion_retrieval" +??? "embed_text" - --8<-- "components/embedding_based_laion_retrieval/README.md:1" + --8<-- "components/embed_text/README.md:1" -??? "filter_comments" +??? "embedding_based_laion_retrieval" - --8<-- "components/filter_comments/README.md:1" + --8<-- "components/embedding_based_laion_retrieval/README.md:1" ??? "filter_image_resolution" --8<-- "components/filter_image_resolution/README.md:1" -??? "filter_line_length" - - --8<-- "components/filter_line_length/README.md:1" - ??? "image_cropping" --8<-- "components/image_cropping/README.md:1" @@ -42,6 +42,10 @@ Below you can find the reusable components offered by Fondant. --8<-- "components/image_resolution_extraction/README.md:1" +??? "index_weaviate" + + --8<-- "components/index_weaviate/README.md:1" + ??? "language_filter" --8<-- "components/language_filter/README.md:1" @@ -62,10 +66,6 @@ Below you can find the reusable components offered by Fondant. --8<-- "components/minhash_generator/README.md:1" -??? "pii_redaction" - - --8<-- "components/pii_redaction/README.md:1" - ??? "prompt_based_laion_retrieval" --8<-- "components/prompt_based_laion_retrieval/README.md:1" diff --git a/docs/data_explorer.md b/docs/data_explorer.md index fbe439188..67190b27c 100644 --- a/docs/data_explorer.md +++ b/docs/data_explorer.md @@ -1,5 +1,15 @@ # Data explorer +## Data explorer UI + +The data explorer UI enables Fondant users to explore the inputs and outputs of their Fondant pipeline. + +The user can specify a pipeline and a specific pipeline run and component to explore. The user will then be able to explore the different subsets produced by by Fondant components. + +The chosen subset (and the columns within the subset) can be explored in 3 tabs. + +![data explorer](../art/data_explorer/data_explorer.png) + ## How to use? You can setup the data explorer container with the `fondant explore` CLI command, which is installed together with the Fondant python package. @@ -16,21 +26,19 @@ Example: ```bash fondant explore --base_path gs://foo/bar --auth-gcp ``` -## Data explorer UI - -The data explorer UI enables Fondant users to explore the inputs and outputs of their Fondant pipeline. - -The user can specify a pipeline and a specific pipeline run and component to explore. The user will then be able to explore the different subsets produced by by Fondant components. - -The chosen subset (and the columns within the subset) can be explored in 3 tabs. ### Sidebar In the sidebar, the user can specify the path to a manifest file. This will load the available subsets into a dropdown, from which the user can select one of the subsets. Finally, the columns within the subset are shown in a multiselect box, and can be used to remove / select the columns that are loaded into the exploration tabs. + ### Data explorer Tab The data explorer shows an interactive table of the loaded subset DataFrame with on each row a sample. The table can be used to browse through a partition of the data, to visualize images inside image columns and more. ### Numeric analysis Tab The numerical analysis tab shows statistics of the numerical columns of the loaded subset (mean, std, percentiles, ...) in a table. In the second part of the tab, the user can choose one of the numerical columns for in depth exploration of the data by visualizing it in a variety of interactive plots. +![data explorer](../art/data_explorer/data_explorer_numeric_analysis.png) + ### Image explorer Tab -The image explorer tab enables the user to choose one of the image columns and analyse these images. \ No newline at end of file +The image explorer tab enables the user to choose one of the image columns and analyse these images. + +![data explorer](../art/data_explorer/image_explorer.png) \ No newline at end of file