Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ORM: Switch to pydantic for code schema definition #6190

Merged
merged 1 commit into from
Mar 11, 2024

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Nov 22, 2023

The verdi code create command dynamically generates a subcommand for each registered entry point that is a subclass of the AbstractCode data plugin base class. The options for each subcommand are generated automatically for each plugin using the DynamicEntryPointCommandGroup.

When first developed, this click.Group subclass would rely on the plugin defining the get_cli_options method to return a dictionary with a spec for each of the options. This specification used an ad-hoc custom schema making it not very useful for any other applications.

Recently, the class added support for using pydantic models to define the specification instead. This was already used for plugins of storage backends. Here, the AbstractCode and its subclasses are also migrated to use pydantic instead to define their model.

Most of the data that is required to create click options from the pydantic model can be communicated using the default properties of pydantic's Field class. However, there was the need for a few additional metadata properties:

  • priority: To control the order in which options are prompted for. This used to be controlled by the _get_cli_options of each plugin. It could define the options in the order required and could also determine whether they came before or after the options that could potentially be inherited from a base class. The way the pydantic models work, the fields of a subclass will always come after those of the base class and there is no way to control this.

  • short_name: The short form of the option name. The option name is derived from the title attribute of the Field. In addition to a full name, options often want to provide a short form option. Since there is no algorithmic method of deducing this from the title, a dedicated metadata keyword is added.

  • option_cls: To customize the class to be used to create the option. This can be used by options that should use a different subclass of the click.Option base class.

The aiida.common.pydantic.MetadataField utility function is added which provides a transparent way to define these metadata arguments when declaring a field in the model. The alternative is to use Annotated but this quickly makes the model difficult to read if multiple metadata are provided.

The changes introduce almost no difference in behavior of the verdi code create command. There is one exception and that is that the callbacks of the options are now replaced by the validators of the models. The downside is that the validators are only called once all options are specified, whereas the callbacks would be called immediately once the respective option was defined. This is not really a problem except for the label of the InstalledCode. The callback would be called immediately and so if an invalid label was provided during an interactive session, the user would be immediately prompted to provide a new label. It is not clear how this behavior can be reproduced using the pydantic validators.

@sphuber sphuber requested a review from mbercx November 22, 2023 14:08
@sphuber sphuber force-pushed the fix/code-model-pydantic branch 2 times, most recently from b2f7b5c to 5d8b9ae Compare November 22, 2023 16:39
@sphuber
Copy link
Contributor Author

sphuber commented Nov 22, 2023

This functionality will be very interesting for other downstream applications that want to inspect how a particular plugin can be constructed. Think, for example, of a REST API that can now easily obtain a JSON schema and provide that to a client:

In [1]: from aiida.orm import InstalledCode

In [2]: InstalledCode.Configuration.model_json_schema()
Out[2]: 
{'description': 'Model describing required information to create an instance.',
 'properties': {'label': {'description': 'A unique label to identify the code by.',
   'title': 'Label',
   'type': 'string'},
  'description': {'default': '',
   'description': 'Human-readable description, ideally including version and compilation environment.',
   'title': 'Description',
   'type': 'string'},
  'default_calc_job_plugin': {'anyOf': [{'type': 'string'}, {'type': 'null'}],
   'default': None,
   'description': 'Entry point name of the default plugin (as listed in `verdi plugin list aiida.calculations`).',
   'title': 'Default `CalcJob` plugin'},
  'use_double_quotes': {'default': False,
   'description': 'Whether the executable and arguments of the code in the submission script should be escaped with single or double quotes.',
   'title': 'Escape using double quotes',
   'type': 'boolean'},
  'with_mpi': {'anyOf': [{'type': 'boolean'}, {'type': 'null'}],
   'default': None,
   'description': 'Whether the executable should be run as an MPI program. This option can be left unspecified in which case `None` will be set and it is left up to the calculation job plugin or inputs whether to run with MPI.',
   'title': 'Run with MPI'},
  'prepend_text': {'default': '',
   'description': 'Bash commands that should be prepended to the run line in all submit scripts for this code.',
   'title': 'Prepend script',
   'type': 'string'},
  'append_text': {'default': '',
   'description': 'Bash commands that should be appended to the run line in all submit scripts for this code.',
   'title': 'Append script',
   'type': 'string'},
  'computer': {'description': 'The remote computer on which the executable resides.',
   'title': 'Computer',
   'type': 'string'},
  'filepath_executable': {'description': 'Filepath of the executable on the remote computer.',
   'title': 'Filepath executable',
   'type': 'string'}},
 'required': ['label', 'computer', 'filepath_executable'],
 'title': 'Configuration',
 'type': 'object'}

@sphuber sphuber force-pushed the fix/code-model-pydantic branch 2 times, most recently from 3bebb46 to cda5b28 Compare November 23, 2023 09:56
@sphuber sphuber force-pushed the fix/code-model-pydantic branch 5 times, most recently from 6730a38 to 90bbe1c Compare December 21, 2023 18:18
@sphuber sphuber force-pushed the fix/code-model-pydantic branch from 90bbe1c to 623b654 Compare January 11, 2024 07:57
@sphuber
Copy link
Contributor Author

sphuber commented Jan 12, 2024

While working on #6245 we realized that we could use the pydantic BaseModel instead of custom ad-hoc solution, just as is being done in this PR. The idea is now to give all ORM classes a pydantic model to define their schema. In that light, naming this class attribute Model might be better than Configuration. The name Configuration was already released with v2.5.0, but was only used for the StorageBackend which is unlikely to have been used already in external packages (not even aiida-s3 by yours truly which is perhaps the only one out there customizing storage backends). So it should still be safe and ok to rename Configuration to Model.

@sphuber sphuber force-pushed the fix/code-model-pydantic branch 2 times, most recently from 894d7a7 to 1b5a71a Compare February 2, 2024 09:28
@sphuber sphuber mentioned this pull request Feb 7, 2024
@sphuber sphuber force-pushed the fix/code-model-pydantic branch 2 times, most recently from 8c1b094 to e3ebf71 Compare February 8, 2024 11:11
@mbercx
Copy link
Member

mbercx commented Feb 8, 2024

Thanks @sphuber, sorry for taking so long to review this. I'll make it a priority for the coding days next week.

The `verdi code create` command dynamically generates a subcommand for
each registered entry point that is a subclass of the `AbstractCode`
data plugin base class. The options for each subcommand are generated
automatically for each plugin using the `DynamicEntryPointCommandGroup`.

When first developed, this `click.Group` subclass would rely on the
plugin defining the `get_cli_options` method to return a dictionary with
a spec for each of the options. This specification used an ad-hoc custom
schema making it not very useful for any other applications.

Recently, the class added support for using `pydantic` models to define
the specification instead. This was already used for plugins of storage
backends. Here, the `AbstractCode` and its subclasses are also migrated
to use `pydantic` instead to define their model.

Most of the data that is required to create `click` options from the
pydantic model can be communicated using the default properties of
pydantic's `Field` class. However, there was the need for a few
additional metadata properties:

* `priority`: To control the order in which options are prompted for.
This used to be controlled by the `_get_cli_options` of each plugin. It
could define the options in the order required and could also determine
whether they came before or after the options that could potentially be
inherited from a base class. The way the pydantic models work, the
fields of a subclass will always come _after_ those of the base class
and there is no way to control this.

* `short_name`: The short form of the option name.
The option name is derived from the `title` attribute of the `Field`. In
addition to a full name, options often want to provide a short form
option. Since there is no algorithmic method of deducing this from the
title, a dedicated metadata keyword is added.

* `option_cls`: To customize the class to be used to create the option.
This can be used by options that should use a different subclass of the
`click.Option` base class.

The `aiida.common.pydantic.MetadataField` utility function is added
which provides a transparent way to define these metadata arguments when
declaring a field in the model. The alternative is to use `Annotated`
but this quickly makes the model difficult to read if multiple metadata
are provided.

The changes introduce _almost_ no difference in behavior of the `verdi
code create` command. There is one exception and that is that the
callbacks of the options are now replaced by the validators of the
models. The downside is that the validators are only called once all
options are specified, whereas the callbacks would be called immediately
once the respective option was defined. This is not really a problem
except for the `label` of the `InstalledCode`. The callback would be
called immediately and so if an invalid label was provided during an
interactive session, the user would be immediately prompted to provide a
new label. It is not clear how this behavior can be reproduced using the
pydantic validators.
@sphuber sphuber force-pushed the fix/code-model-pydantic branch from 49eb110 to d50cbd9 Compare March 8, 2024 11:25
@sphuber sphuber merged commit 06189d5 into aiidateam:main Mar 11, 2024
20 checks passed
@sphuber sphuber deleted the fix/code-model-pydantic branch March 11, 2024 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants