Unifying parallel information for both codes and workflow engines #1881
Replies: 2 comments 12 replies
-
Thank you for raising this comment! I think you will probably find my response unsatisfactory, but nonetheless this is my opinion on the matter.
I agree that this is frustrating and not intuitive. A lot of this is due to poor organization on the ASE side of things, where we don't have a unified approach to handling this.
Correct. I encourage you to check out this recent discussion about a cleaner approach for this. It's not so much that Parsl needs a new mechanism; the old mechanism via HTEX works fine. Rather, the old mechanism is just confusing. Even if one were to use the new MPI-based features in Parsl, the user (or quacc) still has to define an That said, I view this as a separate matter. The first point is about the execution command, including any parallelization flags (e.g.
For most of the calculators that don't use this hacky As for setting things like Parsl commands, it would be a logistical challenge. Quacc supports several workflow orchestration tools, and they all behave very differently depending on a person's compute architecture and computing needs. I am hesitant to get into the business of having quacc interact with the workflow orchestration utilities for this reason. Even if we wanted to go down this road, it is not immediately clear to me how one might achieve this. The job decorators in quacc have minimal logic of their own (by design). They predominantly are an alias for the workflow orchestration tool's decorator. And for some workflow engines, like Dask/Parsl/Prefect, the configuration details can be specified before quacc is ever run. Summary:
|
Beta Was this translation helpful? Give feedback.
-
Reviving this discussion with a clarification. From what I can see, ASE has done away with |
Beta Was this translation helpful? Give feedback.
-
The Problem
Currently, there is no unified approach to specifying parallelization information for every code available in Quacc. Each code has its own way of specifying parallelization information, which can be tedious and potentially confusing for users.
Some workflow engines might require additional information about resources for each job. This seems to be the direction that Parsl is taking (https://parsl.readthedocs.io/en/stable/userguide/mpi_apps.html). When using Parsl with MPI, users now need to specify additional kwargs for each job to define resources such as the number of tasks or nodes. Parsl then uses this information to build the parallel command.
The reason Parsl has to do this is likely due to the pilot job model: Parsl must manage how tasks are deployed. This requirement is not needed when adhering to the condition of 1 task = 1 Slurm job, as it is assumed that the user will correctly attribute resources. In the pilot job model, multiple task run in the same slurm job, when using srun, users should never have to worry about this because even if they oversubscribe their Slurm jobs, srun will gracefully wait when everything is full. In contrast, mpiexec/mpirun will simply oversubscribe everything, leading to suboptimal performance. Hence the need for a new mechanism from Parsl's side.
The Proposal
In Quacc, in the job decorator, or as a kwarg, provide a unique way to specify resources, such as the number of nodes and tasks, as well as the number of CPUs per task. Both the code and the workflow engine (if needed) will receive this information and deal with it as they need it. This way, users only need to specify the resources once.
This also means that we go away from ASE's
parallel_info
which in itself isn't such a big problem because it seems that no one know how to use it anyway... 😅Beta Was this translation helpful? Give feedback.
All reactions