-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Learning about point tasks in a process #1746
Comments
One interesting special case here is a task configured with |
How is that different than launching a task which only depends on that future? Tasks are already callbacks that you get when dependences are satisfied. If you don't want to send the task all the way through the pipeline you can even make it a |
A
|
You can't do this at the moment, but it probably wouldn't be too hard to add. Do you really want it to wait on every future in the future map (all point tasks), or are just certain futures enough?
It will actually run on exactly the same processor as the parent task (when the parent is preempted and the futures have all completed).
Mostly this means that you don't have any region requirements. You can have side effects, but like calling |
Well, for the particular use case we have in mind, it would be sufficient to wait on all the futures corresponding to point tasks dispatched by the same
So, to be sure, if the parent task is control-replicated and (as one would expect) all of its shards post the same local function task, that task runs once each on every processor that has been running (one shard of) the parent task?
I suppose you could create entirely new logical regions? |
Right, that is an optimization, but you would probably want to express it as a dependence on the whole future map.
Yes, we implicitly replicate local function tasks so we run a copy on every shard if the parent is control replicated. This is easy to do since the only inputs and outputs are futures and not regions.
You could I suppose, but I think we might enforce that the selected variant be a leaf task variant, so you could make the new logical regions but then not populate it with any data. |
So it sounds like |
What do you mean by "so long as |
I can imagine that idempotent task launches, since the runtime has permission to run them extra times, might complicate the accounting: might a |
Using global variables and the usual thread synchronization, it is easy to take process-local notes as point tasks execute. However, the distribution of point tasks over (Unix) processes is in general known only by the relevant mapper instances, and it could be expensive to communicate it to other processes where the point tasks end up running. As such, no process knows when all its assigned point tasks have executed (so as to finalize/transmit the collected notes). (As a particular example, a process might end up with 0 point tasks such that there is no hint as to when the launch took place at all.)
Could we have a means of learning from Legion (which already has to arrange for the right number of calls to take place) either how many such point tasks from a particular (index) launch have or will run in each process or that, again per-launch, no more point tasks will be started in each process? One form that this interface might take would be a callback function (with access at least to the
TaskArgument
) called once perLegion::Runtime
per task launch. It would either provide the number as an argument or (following the CPS idea) be invoked after all relevant point tasks. In the usual (non-debugging) situation of oneRuntime
per process, this produces one call on each process as desired, independent of sharding patterns.A toy example might be to collect per-process statistics about a field for lightweight in-situ analysis:
Obviously as written this works only if all the launches of
bunny_task
are serialized (though their point tasks can run in parallel!). Passing theLegion::Task*
to the callback allows a more sophisticated (read: realistic) callback to perform separate bookkeeping for separate launches.A very different CPS-like approach (which would be sufficient for our use cases) might be to equip
Legion::Future
with athen
(andLegion::FutureMap
with anall_then
) that performs a callback (in the process that registered the callback, or else in all processes that collectively registered one) when the future is (all the futures are) ready without requiring a separate task launch or blocking.The text was updated successfully, but these errors were encountered: