Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to split execution and memory resources, reduce scope #52

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

hcedwar
Copy link
Collaborator

@hcedwar hcedwar commented May 4, 2018

Split execution and memory resources within topology.
Make execution resources iterable.
Reduce scope by removing specific execution context, which was not needed for affinity exploration.
Tied current naming and discussion to execution resource capable of executing std:;thread.


affinity_query(execution_resource &&, execution_resource &&) noexcept;
affinity_query(execution_resource & const , std::pmr::memory_resource & const ) noexcept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#49 suggests we should get rid of pmr::memory_resource, should we use simply memory_resource here? Or should we do a separate pull request for that change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do a follow up discussion on #49 and follow up PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, lets keep your change until #49 is sorted out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. We decided that for the purposes of affinity-based allocation it didn't make sense to use the pmr memory resource, however after separating the execution and memory topologies, I think the pmr memory resource could be a good choice to represent a node within the memory topology, even if that is then passed to a concrete affinity aware allocator type. Though I think we should (as @hcedwar suggested in #41) make it a type which is derived from the pmr memory resource so that it can be made iterable. I think we need to discuss this further.

@@ -474,7 +402,7 @@ The `affinity_query` class template provides an abstraction for a relative affin

### `affinity_query` constructors

affinity_query(const execution_resource &, const execution_resource &) noexcept;
affinity_query(const execution_resource &, const std::pmr::memory_resource &) noexcept;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment above about #49

@hcedwar hcedwar mentioned this pull request May 5, 2018

The capability of querying underlying *execution resources* of a given *system* is particularly important towards supporting affinity control in C++. The current proposal for executors [[22]][p0443r4] leaves the *execution resource* largely unspecified. This is intentional: *execution resources* will vary greatly between one implementation and another, and it is out of the scope of the current executors proposal to define those. There is current work [[23]][p0737r0] on extending the executors proposal to describe a typical interface for an *execution context*. In this paper a typical *execution context* is defined with an interface for construction and comparison, and for retrieving an *executor*, waiting on submitted work to complete and querying the underlying *execution resource*. Extending the executors interface to provide topology information can serve as a basis for providing a unified interface to expose affinity. This interface cannot mandate a specific architectural definition, and must be generic enough that future architectural evolutions can still be expressed.

Two important considerations when defining a unified interface for querying the *resource topology* of a *system*, are (a) what level of abstraction such an interface should have, and (b) at what granularity it should describe the topology's *execution resources*. As both the level of abstraction of an *execution resource* and the granularity that it is described in will vary greatly from one implementation to another, it’s important for the interface to be generic enough to support any level of abstraction. To achieve this we propose a generic hierarchical structure of *execution resources*, each *execution resource* being composed of other *execution resources* recursively. Each *execution resource* within this hierarchy can be used to place memory (i.e., allocate memory within the *execution resource’s* memory region), place execution (i.e. bind an execution to an *execution resource’s execution agents*), or both.
Two important considerations when defining a unified interface for querying the *resource topology* of a *system*, are (a) what level of abstraction such an interface should have, and (b) at what granularity it should describe the topology's *execution resources* and *memory resources*. As both the level of abstraction of resources and the granularity that they describe in will vary greatly from one implementation to another, it’s important for the interface to be generic enough to support any level of abstraction. To achieve this we propose generic hierarchical structures for *execution resources* and *memory resources*, where each *resource* being composed of other *resources* recursively.
Copy link
Contributor

@AerialMantis AerialMantis May 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should state here that each execution resource should be made up of other execution resources and each memory resource should be made up of other memory resources:

"where each resource being composed of other resources recursively" -> "where each is recursively composed of other execution resources and memory resources respectively"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started down this path and the problem I ran into was having purely hierarchical memory resource. Even without GPU in the picture you will have main memory, high bandwidth memory, and non-volatile memory. A hierarchical memory resource would require a top-level allocator that spans all of them. The path I followed was instead for each execution resource to have a set of memory resources.
I am uneasy rushing this scope, a little more design-discussion time would have been helpful but we are up against the pre-meeting mailing clock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, having a separate set of memory resources for each execution resource could work, though this could lead to many execution resources pointing to the same memory resource, which could be confusing for users to figure out which memory resources are the same and which ones are different. Though perhaps we could maintain a hierarchy of memory resources, but have each execution resource point to the highest level memory resource which it can allocate memory in, and then the lower level memory resources could be traversed from that it would be consistent enough as to avoid confusion.

One way we could get around the problem of needing to have a top-level allocator could be to have a memory resource which represents the top of the topology which is either not capable of allocating (though that would likely mean throwing an exception) or is free to allocate anywhere within the topology.

I agree I think the separation of execution and memory topologies has revealed some difficult design points which I think we will need some further discussion to resolve.


The interface for querying the *resource topology* of a *system* must be flexible enough to allow querying all *execution resources* available under an *execution context*, querying the *execution resources* available to the entire system, and constructing an *execution context* for a particular *execution resource*. This is important, as many standards such as OpenCL [[6]][opencl-2-2] and HSA [[7]][hsa] require the ability to query the *resource topology* available in a *system* before constructing an *execution context* for executing work.
The interface for querying the *resource topology* of a *system*
must be flexible enough to allow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence reads a bit strange, perhaps it's best to split this up now that it's gotten quite long, maybe even have this as bullet points?


An `execution_resource` is a light weight structure which acts as an identifier to particular piece of hardware within a system. It can be queried for whether it can allocate memory via `can_place_memory` and whether it can execute work via `can_place_agents`, and for it's name via `name`. An `execution_resource` can also represent other `execution_resource`s, these are refered to as being *members of* that `execution_resource` and can be queried via `resources`. Additionally the `execution_resource` which another is a *member of* can be queried vis `member_of`. An `execution_resource` can also be queried for the concurrency it can provide; the total number of *threads of execution* supported by that *execution_resource* and all resources it represents.
An **execution agent** executes work, typically implemented by a *callable*,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this taken from #47? If so the same comments apply here.

Below *(Listing 2)* is an example of iterating over the system level resources and priniting out it's capabilities.
A *system* includes execution resources and memory resources.
Execution resources are organized hierarchically.
A particular execution resource may be a partitioned into
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"may be a partitioned into" -> "may be partitioned into"

A particular execution resource may be a partitioned into
a collection of execution resources referred to as *members of*
that execution resource.
The partitioning of execution resources implies a locality relationship;
Copy link
Contributor

@AerialMantis AerialMantis May 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always possible to make this guarantee? Could you, for example, have an architecture which is represented with a hierarchy of {{A, B, C, D},{E, F, G, H}} but where D is more local to E than to A?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't encountered such a thing. If I did see such a situation I'd say the topology definition was in error.


An execution resource has one or more memory resources
which can allocate memory accessible to that execution resource.
When an execution resource hierarchy is traversed the associated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this imply a relationship between the levels of the execution topology with the equivalent levels of the memory topology?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... prior comment about problem with a pure hierarchy for memory resources

size_t count = 0 ;
for (auto member : execution::this_system::execution_resource()) {
count += member.concurrency();
std::cout << member.name() `\n`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing "<<".


> [*Note:* Note that an execution resource is not limited to resources which execute work, but also a general resource where no execution can take place but memory can be allocated such as off-chip memory. *--end note*]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this note has been lost in the changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defines a memory resource vs. an execution resource.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, ignore me :)

```
*Listing 3: Example of querying affinity between two `execution_resource`s.*

> [*Note:* This interface for querying relative affinity is a very low-level interface designed to be abstracted by libraries and later affinity policies. *--end note*]

### Execution context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should remove the wording for the execution_context, this is quite important for the design of being able to construct an execution context which represents part of the system topology.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Challenge is what kind of execution context? Static thread pool? Dynamic thread pool? etc.

Copy link
Contributor

@AerialMantis AerialMantis May 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tough question which I think will require a lot more design discussion. The intention of the current execution_context was to provide the minimum required interface in order to allow constructing a context around an execution resource. The idea was that this could then be built on to provide more customization likely through the use of properties.

Perhaps we should move more towards the execution context we define being a concept which requires a particular interface, i.e. one which allows it to be constructed from an execution resource and to provide an executor and allocator. This approach may actually make it easier for concrete execution context types, such as the static_thread_pool to support this interface by simply satisfying the requirements of the concept.

Though I'm not keen to make such a large change to the paper so close to the mailing deadline, I think we should raise this as an open question for Rapperswil and then discuss it further for the following revision. Maybe if we can resolve this in time we can present the idea at Rapperswil.


## Header `<execution>` synopsis

namespace std {
namespace experimental {
namespace execution {

/* Execution resource */
/* Execution resource capable of executing std::thread */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should couple the execution resource type with std::thread as this could restrict from using other kinds of threads of execution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such a co-routines?
Given the limited time I thought best to start with current-standard foundation of std::thread and just associate thread_execution_resource . We can branch out into other architectures later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of GPU threads, though co-routines would be another example of non-std::thread threads of execution. As I understand it SG1 are moving away from a tight coupling between std::thread and threads of execution, and instead, having std::thread just be one of many ways to invoke threads of execution. I feel we should try to follow this direction in our design.

```
*Listing 3: Example of querying affinity between two `execution_resource`s.*

> [*Note:* This interface for querying relative affinity is a very low-level interface designed to be abstracted by libraries and later affinity policies. *--end note*]

### Execution context
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems you are removing entirely the Execution context (I thought it was just moved elsewhere at first). Is that intentional?
The text refers to the concept of Execution context, so I think we should keep it well defined, even if we make changes to the interface itself later.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussion of execution context is still in the paper, just no immediate proposal for a concrete execution context such as static or dynamic thread pools.

execution_resource &operator=(execution_resource &&);
~execution_resource();
~thread_execution_resource();
thread_execution_resource() = delete;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we decided to make the execution resource type copyable and moveable so that it would be easier to use them in standard algorithms.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this was for the std::vector<execution_context> interface, and with iterate-able thread_execution_resource was not longer necessary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, but we decided that we should keep resources copyable and moveable so that they could be used in standard algorithms, in order to for example sort by affinity or find particular resources. Although this is something we will need to investigate further. Something like this:

std::sort(execRes.begin(), execRes.end(), [&memRes](const foo &lhs, const foo &rhs){
  return affinity_query<read, latency>(lhs, memRes) > affinity_query<read, latency>(rhs, memRes);
});

## Level of abstraction

The current proposal provides an interface for querying whether an `execution_resource` can allocate and/or execute work, it can provide the concurrency it supports and it can provide a name. We also provide the `affinity_query` structure for querying the relative affinity metrics between two `execution_resource`s. However this may not be enough information for users to take full advance of the system, they may also want to know what kind of memory is available or the properties by which work is executed. It was decided that attempting to enumerate the various hardware components would not be ideal as that would make it harder for implementers to support new hardware. It has been discussed that a better approach would be to parameterise the additional properties of hardware such that hardware queries could be much more generic.
## Dynamic topology discovery - deferred
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should leave in the discussion of dynamic topology discovery, as it's an important topic to SG1, we want to show it's still on the roadmap but we just haven't gotten to it yet.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still in the intro material. Seemed prudent to mention but not spend any SG1 time discussing.

@hcedwar
Copy link
Collaborator Author

hcedwar commented May 6, 2018

Rebased pull request the eliminate conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants