Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Roadmap #1

Open
3 of 8 tasks
jpsamaroo opened this issue Oct 24, 2024 · 11 comments
Open
3 of 8 tasks

Roadmap #1

jpsamaroo opened this issue Oct 24, 2024 · 11 comments

Comments

@jpsamaroo
Copy link
Member

jpsamaroo commented Oct 24, 2024

Required:

Enhancements (speculative, these are up for discussion):

  • @always_everywhere and other pre-loading support
  • Worker state change notifications (worker added, worker exiting, etc.) (Revise support and worker state callbacks #17)
  • Experiment with multi-cluster setups
  • Type-stable interfaces and serialization
  • Observability tools

Bug Fixes:

@mofeing
Copy link

mofeing commented Oct 25, 2024

One problem @clasqui and @exaexa had with Distributed.jl and Extrae.jl is that they needed to trace the communications, which is doable with MPI thanks to dependency injection, but imposible for Distributed. We had to create our own ClusterManager that overdubs start_worker https://github.com/bsc-quantic/Extrae.jl/blob/86e8c1a4ecd1cc0853df47443d486a1fb7767bd8/src/Instrumentation/ExtraeLocalManager.jl#L23-L33 and emits custom Extrae events on different parts of Distributed https://github.com/bsc-quantic/Extrae.jl/blob/86e8c1a4ecd1cc0853df47443d486a1fb7767bd8/src/Instrumentation/Distributed.jl#L68-L112
Of course, performance was horrible and you have to do it for each cluster manager...

So, a hook system on different parts of Distributed, independently of the manager, would be super nice.

@mofeing
Copy link

mofeing commented Oct 25, 2024

Furthermore, a minor feature that could be nice to experiment with is custom worker ids; like multi-dimensional ids (is quite common to represent your problem in a 2D / 3D lattice of workers).

Also some "worker grouping" functionality akin to MPI groups and communicators would be interesting too. Like "broadcast to a group". What would be killer is if a worker can be in multiple groups.

Experiment with multi-cluster setups

Do I understand it correctly that this means that it will stop being a master-worker model? Or at least multi-master? That would be nice.

@exaexa
Copy link
Contributor

exaexa commented Oct 25, 2024

@mofeing thx for ping, this would be indeed relevant. I guess there should be some slightly more general mechanism to control the worker spawning; I recently had similar "fun" with just a custom JLL loaded.

@jpsamaroo one question for the "next" version, would there be any improved support for managing worker-local data? We previously did this to "just do it reliably in the simplest way": https://github.com/LCSB-BioCore/DistributedData.jl . In another iteration (with simpler use-case) we managed to simplify to this: https://github.com/COBREXA/COBREXA.jl/blob/master/src/worker_data.jl , used with CachingPool.

+1 for the other question of @mofeing -- having the worker locality somewhat more exposed (so that people can hopefully improve scheduling of stuff that depends on latency&volume) would be great.

@JamesWrigley
Copy link
Collaborator

Experiment with multi-cluster setups

Do I understand it correctly that this means that it will stop being a master-worker model? Or at least multi-master? That would be nice.

From discussing it with Julian I think the idea is more about better support for workers created under different cluster managers, e.g. some from SSH, some from slurm etc. Another possibility is having 'private clusters' that are not visible to workers() and such, the use-case here would be a library spawning some workers for a specific purpose and not wanting them to be visible to other code that might doingrmprocs() etc.

@jpsamaroo
Copy link
Member Author

Experiment with multi-cluster setups

I think there's a few possibilities here - doing multi-cluster at the level of a single Distributed logical cluster is the option I originally had in mind, but there could also be the possibility of a "multi-master" cluster. We'd probably need to discuss the pros and cons of each approach, or find a more general framework that allows both to exist.

@JamesWrigley
Copy link
Collaborator

One problem @clasqui and @exaexa had with Distributed.jl and Extrae.jl is that they needed to trace the communications, which is doable with MPI thanks to dependency injection, but imposible for Distributed. We had to create our own ClusterManager that overdubs start_worker https://github.com/bsc-quantic/Extrae.jl/blob/86e8c1a4ecd1cc0853df47443d486a1fb7767bd8/src/Instrumentation/ExtraeLocalManager.jl#L23-L33 and emits custom Extrae events on different parts of Distributed https://github.com/bsc-quantic/Extrae.jl/blob/86e8c1a4ecd1cc0853df47443d486a1fb7767bd8/src/Instrumentation/Distributed.jl#L68-L112 Of course, performance was horrible and you have to do it for each cluster manager...

So, a hook system on different parts of Distributed, independently of the manager, would be super nice.

Just thinking out loud:

  • It would be nice for the hook system to allow adding multiple hooks to each event, like one can do with atexit().
  • The existing hooks in Extrae.jl don't need any arguments, but for the sake of flexibility I think we could pass relevant arguments to the hooks.
  • Another use-case: monitoring how much data is sent through Distributed.

@mofeing
Copy link

mofeing commented Nov 22, 2024

  • The existing hooks in Extrae.jl don't need any arguments, but for the sake of flexibility I think we could pass relevant arguments to the hooks.

This was an implementation for what they needed in that moment, but I agree. Nowadays we deleted that instrumentation because it wasn't sustainable. IMO hooks should be like in Cassette: they should expect same args and kwargs to be passed.

  • Another use-case: monitoring how much data is sent through Distributed.

Completely, but this should be known if we pass the args right? As an example, Extrae is capable of measuring message sizes in MPI and we detected that Serialization was getting confused with a bit a data-structure which has repeated refs to arrays, and artificially increased the serialized size by a factor of x10-100.

This is imposible to detect with Distributed as it's right now.

@JamesWrigley
Copy link
Collaborator

So the beginnings of a hook system are coming in #17, @mofeing do you wanna have a look to see if they would be useful for Extrae? Right now they only cover the worker management parts.

@mofeing
Copy link

mofeing commented Jan 7, 2025

That's amazing @JamesWrigley! Sure, I can try add

Correct me if I'm wrong, but in a Extrae+DistributedNext package extension, I just need to do sth like this right?

function __init__()
    DistributedNext.add_worker_starting_callback(extrae_dist_starting_cb)
    DistributedNext.add_worker_started_callback(extrae_dist_start_cb)
    DistributedNext.add_worker_exiting_callback(extrae_dist_exiting_cb)
    DistributedNext.add_worker_exited_callback(extrae_dist_exited_cb)
end

@JamesWrigley
Copy link
Collaborator

Yep, that's it 👍

@mofeing
Copy link

mofeing commented Jan 7, 2025

Okay, I added support here bsc-quantic/Extrae.jl#22

I will give it a try these days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants