Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scheduler: Refactor interface to make it more generic #6043

Merged
merged 1 commit into from
Jul 11, 2024

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Jun 1, 2023

New interface makes fewer assumptions about how an implementation should submit or kill job, or how information about active jobs is to be retrieved.

@sphuber sphuber force-pushed the fix/generalize-scheduler-interface branch from 342e2c1 to 8643ffb Compare June 1, 2023 14:01
@sphuber sphuber force-pushed the fix/generalize-scheduler-interface branch from 8643ffb to 2adfcf7 Compare June 29, 2023 12:28
@sphuber sphuber marked this pull request as ready for review June 29, 2023 12:31
@chrisjsewell
Copy link
Member

As mentioned, this is working with https://github.com/aiidateam/aiida-firecrest

@chrisjsewell
Copy link
Member

In principle, these methods should no longer be required on the transport, but not needed for this PR: https://github.com/aiidateam/aiida-firecrest/blob/eb7f7518857794a99bae3568c74f0981b695dd79/aiida_firecrest/transport.py#L521-L533

@chrisjsewell
Copy link
Member

Also, as a follow-up for the transport, here I moved the currently hard-coded verdi computer test logic onto the transport, since some tests are specific to a "bash-type" transport: https://github.com/chrisjsewell/aiida_core/blob/092bf00c327e213b3de3fc1650df54218a88df54/aiida/client/implementation.py#L74

@sphuber sphuber force-pushed the fix/generalize-scheduler-interface branch from 2adfcf7 to afa5cda Compare September 18, 2023 13:23
@sphuber sphuber force-pushed the fix/generalize-scheduler-interface branch from afa5cda to 276c185 Compare October 23, 2023 11:22
@sphuber sphuber force-pushed the fix/generalize-scheduler-interface branch from 276c185 to c320c64 Compare December 21, 2023 15:43
@sphuber sphuber force-pushed the fix/generalize-scheduler-interface branch from c320c64 to f719bdb Compare March 25, 2024 08:34
@khsrali
Copy link
Contributor

khsrali commented Apr 5, 2024

@sphuber I tried this PR to submit a job using aiida-firecrests on CSCS, and it works.
If you think of any set of specific manual testing that you think it might be important to check, let me know, and I'll check.

Copy link
Contributor

@khsrali khsrali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through this changes, apart from overlooking things, also manually tested the main functionalities of the slurm scheduler with an HPC through core.ssh transport plugin.

tested:
engine.submit() & engine.run()
verdi process kill works as expected.

In addition, I directly test scheduler functions:

from aiida import load_profile
from aiida.orm.utils.loaders import load_computer
load_profile()
computer = load_computer('cscs-eiger')
scheduler = computer.get_scheduler()
transport = computer.get_transport()
transport.open()
scheduler.set_transport(transport)
jobid = scheduler.submit_job('/capstor/scratch/cscs/akhosrav/test-scheduler/with_sleep/', '_aiidasubmit.sh')
scheduler.get_jobs(jobid)
scheduler.kill_job(jobid)

All is good.

I haven't manually tested other scheduler plugins refactored in this PR, but I don't foresee any problems with them. Changes made in this PR are reasonable and straightforward to track down.

I would suggest having this PR merged before the release.

src/aiida/schedulers/__init__.py Show resolved Hide resolved
@khsrali
Copy link
Contributor

khsrali commented Jul 10, 2024

Once again, I believe this could get merged. I've been using it for a while with no issues.
@mbercx do you still want to check this for hyperQ?

@sphuber sphuber force-pushed the fix/generalize-scheduler-interface branch 2 times, most recently from 91cf481 to 3081fce Compare July 11, 2024 07:17
Copy link

codecov bot commented Jul 11, 2024

Codecov Report

Attention: Patch coverage is 95.08197% with 3 lines in your changes missing coverage. Please review.

Project coverage is 77.83%. Comparing base (ef60b66) to head (e3620e0).
Report is 116 commits behind head on main.

Files with missing lines Patch % Lines
src/aiida/schedulers/plugins/bash.py 94.74% 2 Missing ⚠️
src/aiida/engine/daemon/execmanager.py 66.67% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6043      +/-   ##
==========================================
+ Coverage   77.51%   77.83%   +0.32%     
==========================================
  Files         560      564       +4     
  Lines       41444    41911     +467     
==========================================
+ Hits        32120    32616     +496     
+ Misses       9324     9295      -29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

The original `Scheduler` interface made the assumption that all
interfaces would interact with the scheduler through a command line
interface that would be invoked through a bash shell. However, this is
not always the case. Prime example is the new FirecREST service, being
developed by CSCS, that will allow to interact with the scheduler
through a REST API. Due to the assumptions of the `Scheduler` interface,
it was difficult to implement it for this use case.

The `Scheduler` interface is made more generic, by removing the
following (abstract) methods:

 * `_get_joblist_command`
 * `_parse_joblist_output`
 * `_get_submit_command`
 * `_parse_submit_output`
 * `submit_from_script`
 * `kill`
 * `_get_kill_command`
 * `_parse_kill_output`

They are replaced by three abstract methods:

 * `submit_job`
 * `get_jobs`
 * `kill_job`

The new interface no longer makes an assumption about how a plugin
implements these methods. The first one should simply submit the job,
given the location of the submission script on the remote computer. The
second should return the status of the list of active jobs. And the
final should kill a job and return the result.

Unfortunately, this change is backwards incompatible and will break
existing scheduler plugins. To simplify the migration pathway, a
subclass `BashCliScheduler` is added. This implements the new `Scheduler`
interface while maintaining the old interface. This means that this new
class is a drop-in replacement of the old `Scheduler` class for existing
plugins. The plugins that ship with `aiida-core` are all updated to
subclass from `BashCliScheduler`. Any existing plugins that subclassed
from these plugins will therefore not be affected whatsoever by these
changes.
@sphuber sphuber force-pushed the fix/generalize-scheduler-interface branch from 3081fce to e3620e0 Compare July 11, 2024 13:54
@sphuber sphuber merged commit 954cbdd into aiidateam:main Jul 11, 2024
11 checks passed
@sphuber sphuber deleted the fix/generalize-scheduler-interface branch July 11, 2024 20:52
@khsrali
Copy link
Contributor

khsrali commented Jul 12, 2024

Cheers @sphuber

mikibonacci pushed a commit to mikibonacci/aiida-core that referenced this pull request Sep 3, 2024
The original `Scheduler` interface made the assumption that all
interfaces would interact with the scheduler through a command line
interface that would be invoked through a bash shell. However, this is
not always the case. Prime example is the new FirecREST service, being
developed by CSCS, that will allow to interact with the scheduler
through a REST API. Due to the assumptions of the `Scheduler` interface,
it was difficult to implement it for this use case.

The `Scheduler` interface is made more generic, by removing the
following (abstract) methods:

 * `_get_joblist_command`
 * `_parse_joblist_output`
 * `_get_submit_command`
 * `_parse_submit_output`
 * `submit_from_script`
 * `kill`
 * `_get_kill_command`
 * `_parse_kill_output`

They are replaced by three abstract methods:

 * `submit_job`
 * `get_jobs`
 * `kill_job`

The new interface no longer makes an assumption about how a plugin
implements these methods. The first one should simply submit the job,
given the location of the submission script on the remote computer. The
second should return the status of the list of active jobs. And the
final should kill a job and return the result.

Unfortunately, this change is backwards incompatible and will break
existing scheduler plugins. To simplify the migration pathway, a
subclass `BashCliScheduler` is added. This implements the new `Scheduler`
interface while maintaining the old interface. This means that this new
class is a drop-in replacement of the old `Scheduler` class for existing
plugins. The plugins that ship with `aiida-core` are all updated to
subclass from `BashCliScheduler`. Any existing plugins that subclassed
from these plugins will therefore not be affected whatsoever by these
changes.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Nov 5, 2024
The fix in aiidateam#6572 was pushed after the `Scheduler` API has been
refactored in PR aiidateam#6043. To not include this breaking change in a minor
release we adapt the usage of `Scheduler` in the PR to the old API.
agoscinski added a commit to agoscinski/aiida-core that referenced this pull request Nov 5, 2024
The current implementation only issues a kill command for the
parent process, but this can leave child processes orphaned. The
child processes are now retrieved and added explicitly to the
kill command.

Cherry-pick: fddffca

Edits: Downgrades scheduler usage to old API in fix aiidateam#6572

The fix in aiidateam#6572 was pushed after the `Scheduler` API has been
refactored in PR aiidateam#6043. To not include this breaking change in a minor
release we adapt the usage of `Scheduler` in the PR to the old API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants