Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to leverage YAML anchors & aliases to avoid copy-pasting properties in catalog #2127

Open
1 task done
yury-fedotov opened this issue Jun 25, 2024 · 10 comments · May be fixed by #2181
Open
1 task done
Assignees
Labels
Documentation Hacktoberfest Issues to be completed during Hacktoberfest

Comments

@yury-fedotov
Copy link
Contributor

yury-fedotov commented Jun 25, 2024

Description

This section of docs provides a guide to adding layers to the visualization by defining them as follows:

companies:
  type: pandas.CSVDataset
  filepath: data/01_raw/companies.csv
  metadata:
    kedro-viz:
      layer: raw

Also it gives the following example below:

companies:
  type: pandas.CSVDataset
  filepath: data/01_raw/companies.csv
  metadata:
    kedro-viz:
      layer: raw

reviews:
  type: pandas.CSVDataset
  filepath: data/01_raw/reviews.csv
  metadata:
    kedro-viz:
      layer: raw

shuttles:
  type: pandas.ExcelDataset
  filepath: data/01_raw/shuttles.xlsx
  metadata:
    kedro-viz:
      layer: raw

...

Context

In my projects I found it very helpful to use YAML anchors to save those 3 lines per layer into a variable like this:

_raw_layer: &raw_layer
  metadata:
    kedro-viz:
      layer: 01_raw

And then reuse it like this:

companies:
  type: pandas.CSVDataset
  filepath: data/01_raw/companies.csv
  <<: *raw_layer

reviews:
  type: pandas.CSVDataset
  filepath: data/01_raw/reviews.csv
  <<: *raw_layer

shuttles:
  type: pandas.ExcelDataset
  filepath: data/01_raw/shuttles.xlsx
  <<: *raw_layer

Possible Implementation

What I propose to do it to add a small note admonition suggesting that YAML anchors & aliases can be a great fit here to avoid copypasting those 3 lines if you have e.g. 10 datasets defined in a layer.

By admonition I mean e.g. this:

Screenshot 2024-06-24 at 10 35 00 PM

It can mention that this feature is not Kedro-specific at all and enabled by YAML format itself, but I think it can be helpful since this trick is highly reusable and can simplify large catalogs quite a lot for users unfamiliar with anchors & aliases in YAML.

I do not propose to change the existing example which replicates those 3 lines 3 times.
I think my suggestion better fits a note admonition.

Checklist

  • Include labels so that we can categorise your feature request
@yury-fedotov
Copy link
Contributor Author

LMK if that's something you would want in the docs, I'm happy to open a PR if so.

@astrojuanlu
Copy link
Member

Good idea! @pascalwhoop was complaining about the same thing in kedro-org/kedro-plugins#774

I'm moving this to Framework to properly document this trick. Should work with anything really (credentials, load_args etc)

@astrojuanlu astrojuanlu transferred this issue from kedro-org/kedro-viz Sep 23, 2024
@astrojuanlu astrojuanlu changed the title Add a note admonition about leveraging YAML anchors & aliases to avoid copypasting metadata.kedro-viz.layer in catalog Document how to leverage YAML anchors & aliases to avoid copy-pasting properties in catalog Sep 23, 2024
@noklam
Copy link
Contributor

noklam commented Oct 7, 2024

Can we use variable interpolation instead?

@merelcht
Copy link
Member

merelcht commented Oct 7, 2024

Discussed in backlog grooming that we'll add a "note" block in the Kedro viz docs and also link to the existing description on yaml anchors in the Kedro docs: https://docs.kedro.org/en/stable/data/data_catalog_yaml_examples.html#load-multiple-datasets-with-similar-configuration-using-yaml-anchors

@merelcht merelcht transferred this issue from kedro-org/kedro Oct 7, 2024
@merelcht merelcht added Documentation Hacktoberfest Issues to be completed during Hacktoberfest labels Oct 7, 2024
@yury-fedotov
Copy link
Contributor Author

@rashidakanchwala / @merelcht can I take this one?

@merelcht
Copy link
Member

Of course! Thanks @yury-fedotov

@rashidakanchwala
Copy link
Contributor

@yury-fedotov , October is already over, but let me know if you are still keen on taking on this one.

@rashidakanchwala rashidakanchwala self-assigned this Nov 6, 2024
@yury-fedotov
Copy link
Contributor Author

yury-fedotov commented Nov 7, 2024

@yury-fedotov , October is already over, but let me know if you are still keen on taking on this one.

Hey @rashidakanchwala ! Sorry, my bad, I didn't follow up. I'll have time this week to do that - have you already started, or could/should I?

@rashidakanchwala
Copy link
Contributor

Sure, please go ahead. thanks again!

@yury-fedotov
Copy link
Contributor Author

Sure, please go ahead. thanks again!

Ok!

@yury-fedotov yury-fedotov linked a pull request Nov 8, 2024 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Hacktoberfest Issues to be completed during Hacktoberfest
Projects
Status: In Review
Status: Todo
Development

Successfully merging a pull request may close this issue.

5 participants