Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schema validation and configuration for conda-forge config yaml #1756

Merged
merged 127 commits into from
Mar 12, 2024

Conversation

viniciusdc
Copy link
Contributor

@viniciusdc viniciusdc commented Aug 1, 2023

Checklist

  • Added a news entry

In this pull request, we propose a comprehensive refactor of the configure_feedstock::_load_forge_config function and its related dependencies. The primary objective is to replace the current reliance on a hardcoded dictionary with a more robust approach using Pydantic Models for the forge YAML configuration.

Key Changes:

  • I've included a new schema file in the codebase, housing Pydantic models that represent various aspects of the ConfigModel, which is the core schema for the forge YAML. These models also include schemas for nested fields like bot, azure, and more.

  • Created a separate function called _read_forge_config responsible for loading the YAML file itself and initializing the models with the respective configurations as well as handling the deep_merge dictionary section.

  • Moved legacy checks to their dedicated function, _legacy_compatibility_checks.

  • Ive added test cases to ensure the correctness and reliability of the new schema and code changes.
    Edited

  • Given the nature of having pydanticas a runtime dependency, we adopted a mixed mode of jsonschema and the model to generate both a valid json schema mirroring the model, and versioned, conda-forge.v2.json and a default value that is dynamically generated by the model itself, conda-forge.v2.yml. This is used for type validation inside the new _read_forge_config function and to help future users who would like to access the schema by an API (which we can now do, by hosting the new json schema file)

Why this now?

The rationale behind this refactor is to address issues that have surfaced over the years with the YAML content in forge configurations across numerous hosted feedstocks. These issues have resulted in minor exceptions and inconsistencies. Some problems were mitigated through hardcoded conditionals, while others slipped through, causing disruptions in the ecosystem.

The primary objectives of this refactor are as follows:

  • By employing a Pydantic schema, we aim to significantly reduce unforeseen configuration errors from syntax errors or incorrect attributes in the forge YAML.

  • This refactor will make it easier for future maintainers to introduce new settings, such as improving the list of available platforms or other heavily coupled configurations across the build. These changes can be seamlessly integrated into the configuration fields as validators, simplifying the management of previously scattered checks throughout the codebase. As well as deprecation warnings and migration checks can also be performed by the validators.

In summary, this pull request represents a substantial step towards enhancing the reliability and maintainability of the configuration handling in conda-smithy.

What is not included in this PR?

  • To keep the diff minimal in the first iteration, this PR only integrates the new schema (pydantic) and its artifacts (*.json and .yml files). In a subsequent PR, we can re-arrange some internal validations to be part of the generated schema, reducing some hard-coded logic from within the code.

@jaimergp jaimergp marked this pull request as draft August 1, 2023 13:54
@viniciusdc
Copy link
Contributor Author

Hi @jaimergp, this will be on draft this week as I fix some issues with the Providers enum logic (to accept and validate aliases). In the meantime, if you want to have a look at what is currently available, free free to do so 😄



class CondaForgeDocker(BaseModel):
model_config = ConfigDict(extra="allow")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to allow extra keys? Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conda-forge docker is a dict where the user/consumer passes keys to the docker build arg, which should follow the docker schema. As this would be an exact copy of what already exists, I was not sure if we wanted to import the Pydantic schema from docker and use it instead, so I kept it as general as possible to check with you which of those fields we expect to be customizable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed and moved all fields to optional values, in case one of the supported config values is passed validation will occur, in case of extra variables in the config, right now the behavior is to ignore it (will not show up when dumping to dict/json)

Copy link
Member

@jaimergp jaimergp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Vini! As you can see I have added a few comments, but most of them are repeated and then I stopped the spam. It all boils down to these two items:

  • Default values that are mutable (e.g. lists or dicts) should use default_factory to return a fresh instance anytime. Otherwise all instances of the model share the same instance (defined at class definition time when the module was imported), which is then a nightmare. I don't think we would have concurrent use of more than one instance in the code, but you never know and it's good practice anyway.
  • You are mixing usage of Field both as a value and Annotated. I am not familiar with Annotated; I think it might have performance benefits, but I do find it trickier to read. I prefer it as the value of the attribute, instead of in-annotation. If there are strong reasons for in-annotation, I am willing to listen, but what we can't do is to mix two styles because it's confusing :D

Let's fix these items and then we can keep iterating.

@jaimergp
Copy link
Member

Ah, and:

  • we need to merge back the changes added in Add switch to make a swap file #1751 to resolve the conflicts. Please update accordingly 🙏
  • fix the CI to install pydantic as it's required now

ytausch added a commit to ytausch/cf-scripts that referenced this pull request Mar 11, 2024
Copy link
Member

@jaimergp jaimergp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to go!

Thanks everyone involved, this has been one of the good ones :P

I'll merge later today or tomorrow in case there are any last minute comments.

@jaimergp jaimergp merged commit 113418a into conda-forge:main Mar 12, 2024
2 checks passed
@jaimergp
Copy link
Member

ytausch added a commit to ytausch/cf-scripts that referenced this pull request Mar 13, 2024
ytausch added a commit to ytausch/cf-scripts that referenced this pull request Mar 13, 2024
@jakirkham
Copy link
Member

This was neat to see: conda-forge/coverage-feedstock#112 (comment)

Very descriptive 🤩

@bollwyvl
Copy link
Contributor

Yes, have started getting those, looking great.

Some notes:

  • it doesn't tell which file this is all about
    • it could include a link to that exact file in the PR branch, such that a maintainer could click to edit
  • as the output is for humans, it seems like rendering title, description, etc. as Markdown would be valuable.
    • however, as it was written (or dumped to) JSON Schema, it's would possible to generate a canonical link to the generated documentation for that schema item, once available
    • similarly, the raw repr that gets dumped could be a table
      • or, indeed, the whole thing could be a table
    • it may be worth putting these inside details tags

Now all we gotta do is schematize the other user-edited .yml :)

@jaimergp
Copy link
Member

Thanks @bollwyvl, I'll take your comment and open a new issue for tracking! Thanks!

@0xbe7a
Copy link
Member

0xbe7a commented Mar 16, 2024

pydantic/pydantic#8237 has been merged and will probably be released as part of pydantic 2.7 and will make the custom deprecation validator obsolete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants