-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler
: Refactor interface to make it more generic
#6043
Scheduler
: Refactor interface to make it more generic
#6043
Conversation
342e2c1
to
8643ffb
Compare
8643ffb
to
2adfcf7
Compare
As mentioned, this is working with https://github.com/aiidateam/aiida-firecrest |
In principle, these methods should no longer be required on the transport, but not needed for this PR: https://github.com/aiidateam/aiida-firecrest/blob/eb7f7518857794a99bae3568c74f0981b695dd79/aiida_firecrest/transport.py#L521-L533 |
Also, as a follow-up for the transport, here I moved the currently hard-coded |
2adfcf7
to
afa5cda
Compare
afa5cda
to
276c185
Compare
276c185
to
c320c64
Compare
c320c64
to
f719bdb
Compare
@sphuber I tried this PR to submit a job using aiida-firecrests on CSCS, and it works. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through this changes, apart from overlooking things, also manually tested the main functionalities of the slurm scheduler with an HPC through core.ssh
transport plugin.
tested:
engine.submit()
& engine.run()
verdi process kill
works as expected.
In addition, I directly test scheduler functions:
from aiida import load_profile
from aiida.orm.utils.loaders import load_computer
load_profile()
computer = load_computer('cscs-eiger')
scheduler = computer.get_scheduler()
transport = computer.get_transport()
transport.open()
scheduler.set_transport(transport)
jobid = scheduler.submit_job('/capstor/scratch/cscs/akhosrav/test-scheduler/with_sleep/', '_aiidasubmit.sh')
scheduler.get_jobs(jobid)
scheduler.kill_job(jobid)
All is good.
I haven't manually tested other scheduler plugins refactored in this PR, but I don't foresee any problems with them. Changes made in this PR are reasonable and straightforward to track down.
I would suggest having this PR merged before the release.
Once again, I believe this could get merged. I've been using it for a while with no issues. |
91cf481
to
3081fce
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6043 +/- ##
==========================================
+ Coverage 77.51% 77.83% +0.32%
==========================================
Files 560 564 +4
Lines 41444 41911 +467
==========================================
+ Hits 32120 32616 +496
+ Misses 9324 9295 -29 ☔ View full report in Codecov by Sentry. |
The original `Scheduler` interface made the assumption that all interfaces would interact with the scheduler through a command line interface that would be invoked through a bash shell. However, this is not always the case. Prime example is the new FirecREST service, being developed by CSCS, that will allow to interact with the scheduler through a REST API. Due to the assumptions of the `Scheduler` interface, it was difficult to implement it for this use case. The `Scheduler` interface is made more generic, by removing the following (abstract) methods: * `_get_joblist_command` * `_parse_joblist_output` * `_get_submit_command` * `_parse_submit_output` * `submit_from_script` * `kill` * `_get_kill_command` * `_parse_kill_output` They are replaced by three abstract methods: * `submit_job` * `get_jobs` * `kill_job` The new interface no longer makes an assumption about how a plugin implements these methods. The first one should simply submit the job, given the location of the submission script on the remote computer. The second should return the status of the list of active jobs. And the final should kill a job and return the result. Unfortunately, this change is backwards incompatible and will break existing scheduler plugins. To simplify the migration pathway, a subclass `BashCliScheduler` is added. This implements the new `Scheduler` interface while maintaining the old interface. This means that this new class is a drop-in replacement of the old `Scheduler` class for existing plugins. The plugins that ship with `aiida-core` are all updated to subclass from `BashCliScheduler`. Any existing plugins that subclassed from these plugins will therefore not be affected whatsoever by these changes.
3081fce
to
e3620e0
Compare
Cheers @sphuber |
The original `Scheduler` interface made the assumption that all interfaces would interact with the scheduler through a command line interface that would be invoked through a bash shell. However, this is not always the case. Prime example is the new FirecREST service, being developed by CSCS, that will allow to interact with the scheduler through a REST API. Due to the assumptions of the `Scheduler` interface, it was difficult to implement it for this use case. The `Scheduler` interface is made more generic, by removing the following (abstract) methods: * `_get_joblist_command` * `_parse_joblist_output` * `_get_submit_command` * `_parse_submit_output` * `submit_from_script` * `kill` * `_get_kill_command` * `_parse_kill_output` They are replaced by three abstract methods: * `submit_job` * `get_jobs` * `kill_job` The new interface no longer makes an assumption about how a plugin implements these methods. The first one should simply submit the job, given the location of the submission script on the remote computer. The second should return the status of the list of active jobs. And the final should kill a job and return the result. Unfortunately, this change is backwards incompatible and will break existing scheduler plugins. To simplify the migration pathway, a subclass `BashCliScheduler` is added. This implements the new `Scheduler` interface while maintaining the old interface. This means that this new class is a drop-in replacement of the old `Scheduler` class for existing plugins. The plugins that ship with `aiida-core` are all updated to subclass from `BashCliScheduler`. Any existing plugins that subclassed from these plugins will therefore not be affected whatsoever by these changes.
The fix in aiidateam#6572 was pushed after the `Scheduler` API has been refactored in PR aiidateam#6043. To not include this breaking change in a minor release we adapt the usage of `Scheduler` in the PR to the old API.
The current implementation only issues a kill command for the parent process, but this can leave child processes orphaned. The child processes are now retrieved and added explicitly to the kill command. Cherry-pick: fddffca Edits: Downgrades scheduler usage to old API in fix aiidateam#6572 The fix in aiidateam#6572 was pushed after the `Scheduler` API has been refactored in PR aiidateam#6043. To not include this breaking change in a minor release we adapt the usage of `Scheduler` in the PR to the old API.
New interface makes fewer assumptions about how an implementation should submit or kill job, or how information about active jobs is to be retrieved.