Skip to content

AiiDA 1.0 plugin migration guide

Sebastiaan Huber edited this page Apr 3, 2020 · 45 revisions

Table of contents

Migrating imports

Before you start modifying imports of AiiDA objects, please run the AiiDA plugin migrator (click on the link to find the instructions) in order to take care of a number of tedious search/replace operations.

Note: The plugin migrator will bring you only some of the way. If you discover some replacements that are missing, it's easy to add them!

Currently not covered by plugin migrator

  • DataFactory('parameter') => DataFactory('dict')

Migrating Data subclasses

In aiida-core<=0.12.*, the Data Node class implemented some magic to automatically call certain methods if the corresponding keywords were passed in the constructor (also making sure that those specific keywords were not passed on to the constructor of the parent class).

Take Dict (formerly ParameterData) as an example: if constructed with the keyword dict, the constructor would call set_dict and remove the keyword from the kwargs before calling the constructor of the Data base class.

This magic has been dropped in favor of the standard python approach - you now have the freedom (but also the duty) to implement the constructor as needed by your class. Take as an example the new Dict data sub class:

class Dict(Data):
    """`Data` sub class to represent a dictionary."""

    def __init__(self, **kwargs):
        """Store a dictionary as a `Node` instance.

        :param dict: the dictionary to set
        """
        dictionary = kwargs.pop('dict', None)
        super(Dict, self).__init__(**kwargs)
        if dictionary:
            self.set_dict(dictionary)

Note that we allow the user to pass a dictionary using the dict keyword. We first pop this value from the kwargs, using the pop(key, None) syntax to make sure it will simply return None instead of excepting if the key is not present. Then we pass the remaining kwargs to the parent constructor.

Note: If you overwrite the constructor, don't forget to call the parent constructor.

Finally, after having called super, we can process the dictionary (if it was actually passed). You can choose to continue to do so in a set_dict method but you can in principle choose whatever you like.

An important note on the __init__ method of Data nodes

Note that not __init__ is called only when constructing a new node, but not when reloading an existing node.

Therefore, you cannot use __init__ to set properties on the class that you need to use also for reloaded nodes, as these will not be there when (re)loading a node from the database. E.g., by doing by self.my_property = xxx in the __init__, then load_node(yyy).my_property will not be available. Instead, for instance, define my_property as a property:

class MyData(Data):
    @property
    def my_property(self):
        return xxx

Migrating JobCalculation to CalcJob

1. Subclass from aiida.engine.CalcJob instead of JobCalculation

In case the plugin migrator hasn't already taken care of this, replace:

from aiida.orm.calculation.job import JobCalculation

with:

from aiida.engine import CalcJob

You can keep the name of your subclass.

2. Replace _init_internal_params method by define class method

Instead of defining default variables in _init_internal_params:

def _init_internal_params(self):
    super(SomeCalculation, self)._init_internal_params()
    self._INPUT_FILE_NAME = 'aiida.inp'
    self._OUTPUT_FILE_NAME = 'aiida.out'
    self._default_parser = 'quantumespresso.pw'

include them via class variables & the metadata of the input spec:

class SomeCalcJob(engine.CalcJob):

    # Default input and output files
    _DEFAULT_INPUT_FILE = 'aiida.in'
    _DEFAULT_OUTPUT_FILE = 'aiida.out'

    @classmethod
    def define(cls, spec):
        super(SomeCalcJob, cls).define(spec)
        spec.input('metadata.options.input_filename', valid_type=six.string_types, default=cls._DEFAULT_INPUT_FILE)
        spec.input('metadata.options.output_filename', valid_type=six.string_types, default=cls._DEFAULT_OUTPUT_FILE)
        spec.input('metadata.options.parser_name', valid_type=six.string_types, default='quantumespresso.pw')
        # withmpi defaults to "False" in aiida-core 1.0. Below, we override to default to withmpi=True
        spec.input('metadata.options.withmpi', valid_type=bool, default=True)

The parser_name key will be used by the engine to load the correct parser class after the calculation has completed. In this example, the engine will call ParserFactory('quantumespresso.pw') which will load the PwParser class of the aiida-quantumespresso package.

To access the value of the other metadata options:

  • if you are inside a method of the CalcJob class (e.g. in prepare_for_submission), you can do self.inputs.metadata.options.output_filename);
  • from a stored CalcJobNode, instead, you can do node.get_option('output_filename').

Note: The code above is just an example - you don't need to expose filenames via metadata.options if you don't want users to be able to modify them.

3. Replace use_methods with the define class method.

The define method works exactly as for WorkChains, see its documentation for details.

Consider the following use method:

@classproperty
def _use_methods(cls):
    return {
        'structure': {
            'valid_types': StructureData,
            'additional_parameter': None,
            'linkname': 'structure',
            'docstring': 'the input structure',
        }
    }

This translates to the define method:

@classmethod
def define(cls, spec):
    super(SomeCalcJob, cls).define(spec)
    spec.input('some_input', valid_type=orm.Int,
        help='A simple input')

All input ports that are defined via spec.input are required by default. Use required=False in order to make an input port optional.

For use_methods that used the additional_parameter keyword, spec provides input namespaces. Consider the following use_method:

@classproperty
def _use_methods(cls):
    return {
        'structure': {
            'valid_types': UpfData,
            'additional_parameter': 'kind',
            'linkname': 'pseudos',
            'docstring': 'the input pseudo potentials',
        }
    }

This can be translated to the new process spec as follows:

@classmethod
def define(cls, spec):
    super(SomeCalcJob, cls).define(spec)
    spec.input_namespace('pseudos', valid_type=UpfData,
        help='the input pseudo potentials', dynamic=True)

The spec.input_namespace and the dynamic=True keyword lets the engine know that the namespace can receive inputs that are not yet explicitly defined, because at the time of definition we do not know how many or under which keys the UpfData will be passed. Example usage when setting up the calculation:

inputs = {
     ...
     'pseudos': { 'Si': si_upf, 'C': c_upf },
     ...
}

Note: some inputs are pre-defined by CalcJob class. Check here for the full list of default inputs.

4. Change name, signature and implementation of the method _prepare_for_submission

Please remove the leading underscore and adjust to the new signature:

def prepare_for_submission(self, folder):
    """Create the input files from the input nodes passed to this instance of the `CalcJob`.

    :param folder: an `aiida.common.folders.Folder` to temporarily write files on disk
    :return: `aiida.common.datastructures.CalcInfo` instance
    """

Inputs are no longer passed in as a dictionary but retrieved through self.inputs (same as with WorkChains).

Importantly, the inputs provided as well as their type have already been validated - if the spec defined an input as required the input is guaranteed to be present in self.inputs. All boilerplate code for validation of presence and type can be removed in prepare_for_submission.

For example, if the spec defines an input structure of type StructureData that is required, instead of:

try:
    structure = inputdict.pop('structure')
except KeyError:
    raise InputValidationError('No structure was passed in the inputs')

if not isinstance(structure, StructureData):
    raise InputValidationError('the input structure should be a StructureData')

Simply do:

structure = self.inputs.structure

Only for input ports that are not required and do not specify a default you still need to check for the presence of the key in the dictionary.

@classmethod
def define(cls, spec):
    super(SomeCalcJob, cls).define(spec)
    spec.input('optional', valid_type=Int, required=False, help='an optional input')

def prepare_for_submission(self, folder):
    if 'optional' in self.inputs:
        optional = self.inputs.optional
    else:
        optional = None

5. Changes to the local_copy_list

This is an example of adding a SinglefileData to the local_copy_list of the CalcInfo in 0.12:

single_file = SinglefileData()
local_copy_list = [(single_file.get_file_abs_path(), 
    os.path.join('some/relative/folder', single_file.filename)]

The get_file_abs_path method has been removed, and the structure of the local_copy_list has changed to accommodate this. You can now do:

single_file = SinglefileData()
local_copy_list = [(single_file.uuid, single_file.filename, single_file.filename)]

Each tuple in the local_copy_list should have length 3 and contain:

  1. the UUID of the node (SinglefileNode or FolderData)
  2. the relative file path within the node repository (for the SinglefileData this is given by its filename attribute)
  3. the relative path where the file should be copied in the remote folder used for the execution of the CalcJob

Naturally, this procedure also works for subclasses of SinglefileData such as UpfData, CifData etc.

Note: If you are creating an input file inside the Calculation and don't want to go through a node, you can simply use the folder argument of prepare_for_submission:

import io
with io.StringIO("my string") as handle:
    folder.create_file_from_filelike(handle, filename='input.txt', mode='w')

6. Replacing the deprecated retrieve_singlefile_list

The retrieve_singlefile_list has been deprecated. The reason for its existence was to fix an intrinsic inefficiency with the retrieve_list. Imagine a code that produces an output file that you want to retrieve, but you do not want to parse and store a derivation of the content as a node, but rather you just want to store the file as a SinglefileData node as a whole. Any file that gets retrieved through the retrieve_list is also stored in the repo of the calculation node. When you then create a SinglefileData node out of those, you are again storing the content of the file, effectively duplicating the content. If this is done often, the repository bloats unnecessarily.

The retrieve_singlefile_list was the solution, where the specified files would not be stored in the retrieved folder but the engine would automatically turn them into SinglefileData nodes and attach them as outputs.

This behavior can be reproduced with the more general retrieve_temporary_list. Just like files in the retrieve_singlefile_list, they will be retrieved, but not permanently stored in the retrieved folder. The files will be stored in a temporary folder and passed to the Parser.parse method as the retrieved_temporary_folder argument. Here you will find the files from the retrieve_temporary_list. You can now do whatever you want with these. Parse their content and store them in whatever node types you want. For the SinglefileData it would look something like:

def parse(self, **kwargs):
    from aiida.orm import SinglefileData

    temporary_folder = kwargs['retrieved_temporary_folder']

    with temporary_folder.open('some_output_file.txt', 'rb') as handle:
        node = orm.SinglefileData(file=handle)
        self.out('output_file', node)

7. Restarting from a previous calculation

There are two ways of restarting from a previous calculation

  1. Make a symlink to the folder with the previous calculation.
  2. Copy the folder from the previous calculation.

The advantage of approach 1 is that symlinking is fast and it does not occupy additional disk space. The disadvantage is that it won't work if the parent calculation was run on a different machine, and you shouldn't use it if your new calculation can modify data in the symlinked folder.

The old way of symlinking was:

calcinfo.remote_symlink_list = []
if parent_calc_folder is not None:
    comp_uuid = parent_calc_folder.get_computer().uuid
    remote_path = parent_calc_folder.get_remote_path()
    calcinfo.remote_symlink_list.append((comp_uuid, remote_path, link_name)) # where the link_name is decided by you

Replace this by:

calcinfo.remote_symlink_list = []
if 'parent_calc_folder' in self.inputs:
    comp_uuid = self.inputs.parent_calc_folder.computer.uuid
    remote_path = self.inputs.parent_calc_folder.get_remote_path()
    calcinfo.remote_symlink_list.append((comp_uuid, remote_path, link_name)) # where the link_name is decided by you

If you want to run the calculation on a different machine or you are afraid that the old data could be modified by a new run you should choose approach 2, taking into account that this requires time and disk space to copy the data. To implement this in your plugin you should add same information to calcinfo.remote_copy_list instead of calcinfo.remote_symlink_list:

calcinfo.remote_copy_list = []
if 'parent_calc_folder' in self.inputs:
    comp_uuid = self.inputs.parent_calc_folder.computer.uuid
    remote_path = self.inputs.parent_calc_folder.get_remote_path()
    calcinfo.remote_copy_list.append((comp_uuid, remote_path, folder_name)) # where the folder_name is decided by you

Migrating the Parser

1. Change the name and signature of the method parse_with_retrieved

The method has changed name from parse_with_retrieved to parse and the signature is now parse(self, retrieved_temporary_folder=None, **kwargs), the retrieved_temporary_folder argument will be passed as a keyword argument to the parse function.

The retrieved_temporary_folder is an absolute path to a temporary folder on disk (that will be automatically deleted after the parsing is done) that is by default empty (but can contain data if you specify file to temporarily retrieve in the retrieve_temporary_list attribute on the CalcInfo instance in CalcJob.prepare_for_submission).

2. Accessing the raw output files retrieved by the engine

output_folder.open(relative_path) replaces the deprecated output_folder.get_abs_path(relative_path).

To get the FolderData node with the raw data retrieved by the engine, use the following:

try:
    output_folder = self.retrieved
except exceptions.NotExistent:
    return self.exit_codes.ERROR_NO_RETRIEVED_FOLDER

with output_folder.open('output_file_name', 'rb') as handle:
    self.out('output_link_label', SinglefileData(file=handle))

Note that if you use this method of passing a filelike object to the SinglefileData constructor, it is best to open it in binary mode. That is why we define rb in the open call.

3. Signalling errors during parsing

As shown in the above example, the return signature of parse has changed as well. In aiida 0.12., we used to return a boolean (signalling whether parsing was successful) plus the list of output nodes:

def parse(self, **kwargs):
    success = False
    node_list = []

    if some_problem:
        self.logger.error("No retrieved folder found")
        return success, node_list

This has been replaced by returning an aiida.engine.ExitCode - or nothing, if parsing is successful. Adding output nodes is handled by self.out (see next section).

def parse(self, **kwargs):
    if some_error:
        return self.exit_codes.ERROR_NO_RETRIEVED_FOLDER

Here, we are using an exit code defined in the spec of a CalcJob like so:

class SomeCalcJob(engine.CalcJob):

    @classmethod
    def define(cls, spec):
        super(SomeCalcJob, cls).define(spec)
        spec.exit_code(100, 'ERROR_NO_RETRIEVED_FOLDER', message='The retrieved folder data node could not be accessed.')

Note: We recommend defining exit codes in the spec. It is also possible to define them directly in the parser, however:

def parse(self, **kwargs):
    if some_error:
        return aiida.engine.ExitCode(418)

4. Adding output nodes to the calculation

Instead of returning a tuple of outputs after parsing is completed, use the function self.out at any point in the parse function in order to attach an output node to the CalcJobNode representing the execution of the CalcJob class.

For example, this adds a Dict node as an output with the link label results.

output_results = {'some_key': 1}
self.out('results', Dict(dict=output_results))

By default, you will need to declare your outputs in the spec of the Process:

@classmethod
def define(cls, spec):
    super(SomeCalcJob, cls).define(spec)
    spec.output('results', valid_type=Dict, required=True, help='the results of the calculation')
    spec.output('structure', valid_type=StructureData, required=False, help='optional relaxed structure')
    spec.default_output_node = 'results'

Like inputs, outputs can be required or not. If a required output is missing after parsing, the calculation is marked as failed.

You can choose to forego this check to gain flexibility by making the outputs dynamic in the spec:

    spec.outputs.dynamic = True
    spec.outputs.valid_type = Data

5. Accessing the CalcJobNode and its inputs from the parser

In order to access inputs of the original calculation, you can use the property self.node to get the CalcJobNode. Example: self.node.inputs.structure.

Clone this wiki locally