Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow attribute filtering in nomenclature.yaml for importing definitions form external repo #326

Open
phackstock opened this issue Feb 12, 2024 · 4 comments · May be fixed by #396
Open
Assignees
Labels
enhancement New feature or request

Comments

@phackstock
Copy link
Contributor

phackstock commented Feb 12, 2024

When importing from an external repository we should be able to filter by attributes. This way we don't import the whole definition if it's not needed.
The first three use cases that come to mind are:

  • filtering by hierarchy for regions
  • filtering by tier for variables
  • filtering any dimension by name

The question would be how to integrate this into the existing nomenclature.yaml structure.
I've tried a few things now and this is my current favorite:

repositories:
  common-definitions:
    url: https://github.com/IAMconsortium/common-definitions.git/
definitions:
  region:
    repository: common-definitions
    repository-filters:
      hierarchy: R5
  variable:
    repository: common-definitions
    repository-filters:
      name: Final Energy*
    country: true

This would import all R5 regions from common-definitions and all variables starting with Final Energy*.

The above format would also allow for more complex filtering such as:

repositories:
  common-definitions:
    url: https://github.com/IAMconsortium/common-definitions.git/
  legacy-definitions:
    url: https://github.com/IAMconsortium/legacy-definitions.git/
definitions:
  variable:
    repository: common-definitions
    repository-filters:
      - repository: common-definitions
        tier: 1
      - name: Final Energy*
    country: true

here we have multiple filters for the variable dimension:

  1. We take all variables from common-definitions that have the attribute tier with the value 1.
  2. We take all variables from common-definitions and legacy-definitions (no repository filter) that match the pattern Final Energy*

Would love to hear your thoughts @danielhuppmann, @dc-almeida.

@phackstock phackstock added the enhancement New feature or request label Feb 12, 2024
@phackstock phackstock self-assigned this Feb 12, 2024
@phackstock phackstock changed the title Allow hierarchy filtering in nomenclature.yaml for importing regions form external repo Allow attribute filtering in nomenclature.yaml for importing definitions form external repo Aug 5, 2024
@danielhuppmann
Copy link
Member

danielhuppmann commented Aug 5, 2024

This looks great, but I'm wondering about two issues.

  1. Wouldn't be more intuitive to have the filters as an attribute of the repository, instead of repeating the repository-attribute many times?
  2. Not clear whether the list of filters would work as AND or OR...?

See a more explicit

definitions:
  variable:
    repository:
      common-definitions:
        filters:
          - name: Primary Energy*
            tier: 1
          - name: Final Energy*

to get all final-energy variables and only primary-energy-variables at tier 1.

@phackstock
Copy link
Contributor Author

phackstock commented Aug 5, 2024

Good points.

Regarding your first point, you're right, it does look better to me as well. The reason I did intentionally opt against it in my proposed structure is that this would require bigger changes to the code. Nothing crazy but more difficult to implement than just adding another attribute at the repository level. I do agree though that it's nicer that way.

For your second point, I'd take your example exactly the way you suggested. Meaning that within a filter entry it's an AND and between filters it's an OR.

One point that's remaining is to cover is if we allow lists as filter values, and if so how they're evaluated:

definitions:
  variable:
    repository:
      common-definitions:
        filters:
          - name: Primary Energy*
            tier: [1, 2]

i.e. would the above translate to: "Everything that starts with Primary Energy* and has the tier attribute [1, 2]" or "Everything that starts with Primary Energy* and has the tier attribute 1 or 2". In this example only the latter makes sense but there might be attributes where we actually want to match a list.

Alternatively, we could also only allow for single values, so if you wanted to achieve the above you'd have to use:

definitions:
  variable:
    repository:
      common-definitions:
        filters:
          - name: Primary Energy*
            tier: 1
          - name: Primary Energy*
            tier: 2

in this example we could even allow for list values but then they have to match exactly.

@phackstock phackstock reopened this Aug 5, 2024
@danielhuppmann
Copy link
Member

I guess we will quickly run into a use case like "give me primary energy, final energy, CO2 emissions, GDP, ..." from an upstream-repo, so repeating the "name" attribute many time will be tedious. So I would say the following logic makes most sense:

  • OR within a filter-dimension if it's a list
  • AND across filter-dimensions within one filters-list items
  • OR across filters-list items

@phackstock
Copy link
Contributor Author

Sounds good, that should cover what we need. I cannot think of a use case where we'd need to explicitly match for a list anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants