Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add implemention of CalcJobImporter for pw.x calculations #847

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Oct 8, 2022

No description provided.

Copy link
Member

@mbercx mbercx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sphuber! Besides the comments in the review, there is also the following:

  1. We should also update the corresponding documentation. Happy to have a crack at this!
  2. Can we now remove/deprecate the outdated PwimmigrantCalculation class? Does this even still work?
  3. After seeing the calculation job importing in action, I was wondering about the choice to base the design on a remote_folder input that is recognised by the engine. A side effect of this is that all calculation jobs will now have the remote_folder input, which is only really used for the import feature and can potentially be confused with other inputs (e.g. parent_folder). Making the code input optional is also potentially confusing for users, as I note in one of my comments below. I do see the argument from a provenance perspective: it makes sense that the remote_folder is an input instead of an output for imported calculations. On the other hand, this potentially makes analysis on a combination of imported and AiiDA-run jobs more challenging since their provenance will be different.
  4. Maybe we can add another method called import_calculation? Although having a parse_remote_data method is nice, the first thing I would look for as a user is a method to directly import the folder and return the node.
  5. It would be useful if the user could pass an (already open) transport to the method, in case they need to import a large number of calculations in a loop.

src/aiida_quantumespresso/calculations/importers/pw.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/calculations/importers/pw.py Outdated Show resolved Hide resolved
src/aiida_quantumespresso/calculations/importers/pw.py Outdated Show resolved Hide resolved
transport.copytree(remote_data.get_remote_path(), dirpath)

builder = create_builder_from_file(
str(dirpath), input_file_name, code, metadata or {}, pseudo_folder_path=pseudo_folder_path, **kwargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few considerations here:

  • Currently the create_builder_from_file doesn't seem to use the pseudo_dir input.
  • pseudo_folder_path doesn't accept a Path.
  • It might be possible that the pseudo folder is on a remote machine that is not a subpath of the calculation folder. I suppose we can also allow for a RemoteData input here, which can then use the same transport to retrieve the pseudo files.

@sphuber
Copy link
Contributor Author

sphuber commented Oct 9, 2022

Thanks for the review. I only opened this PR because we got a question about importing on the mailing list. I already had the branch but hadn't opened a PR yet, especially since it wasn't really done yet. That's also why I hadn't requested a review yet 😅 since I didn't want to cause too much unnecessary work. Still, it is quite useful actually that you already gave your thoughts!

We should also update the corresponding documentation. Happy to have a crack at this!

Fully agree. I had a look at the docs and simply didn't know where to add it. It is such a mess. I actually bit the bullet this weekend and started reworking the docs from scratch. See this PR #848
It is not done yet, but the scaffolding is more or less in place. I will now work on putting the existing documentation that is still valuable in place.

Can we now remove/deprecate the outdated PwimmigrantCalculation class? Does this even still work?

We really should. With #848 I am anyway getting rid of the documentation.

After seeing the calculation job importing in action, I was wondering about the choice to base the design on a remote_folder input that is recognised by the engine. A side effect of this is that all calculation jobs will now have the remote_folder input, which is only really used for the import feature and can potentially be confused with other inputs (e.g. parent_folder). Making the code input optional is also potentially confusing for users, as I note in one of my comments below.

I think the solution here is to simply improve the error message. If there is no Code defined and no remote_folder then it should treat the code as required and message appropriately.

I do see the argument from a provenance perspective: it makes sense that the remote_folder is an input instead of an output for imported calculations. On the other hand, this potentially makes analysis on a combination of imported and AiiDA-run jobs more challenging since their provenance will be different.

Fair enough, but for the alternative case of the remote_folder also being an output for an imported calculation, it can be confusing just as well. One might mistake an imported calculation for an imported one.

Maybe we can add another method called import_calculation? Although having a parse_remote_data method is nice, the first thing I would look for as a user is a method to directly import the folder and return the node.

This is intentional. A user wanting to import a calculation is not going to call parse_remote_data directly. Instead they will actually submit a CalcJob with the folder containing the inputs and outputs as the remote_folder input. Have a look at the unit test that shows what the user interface would look like.

It would be useful if the user could pass an (already open) transport to the method, in case they need to import a large number of calculations in a loop.

This is a good point. Actually, we should probably go a step further and only pass a transport in. The engine can open it and call chdir to change in the root directory. The implementation can then simply use the transport interface to read the files of interest and reconstruct the input. In doing this, the opening of transports will go through the transport pool and connection throttling of the engine. This is the whole reason why I coupled the design of importing so closely to actually running a CalcJob, to profit as much as possible of all the functionality of normal CalcJob runs. Will open an issue on aiida-core as it needs to be addressed there.

@mbercx
Copy link
Member

mbercx commented May 5, 2023

I suppose this one will also fix #53?

@mbercx
Copy link
Member

mbercx commented May 7, 2023

And also #448?

@sphuber sphuber force-pushed the feature/809/calc-job-importer-pw branch from ead1dff to 1ae4b68 Compare September 22, 2023 06:28
The `code` input was made optional in the `CalcJob` exactly for the
purpose of the importer functionality. However, the plugins themselves
often actually try to access it through `self.inputs.code` since they
have to set the code UUID on the `CodeInfo`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants