From 7057fa5f4b1e71f99adbfc1552bf5c37c66073fc Mon Sep 17 00:00:00 2001 From: Gordon Date: Sun, 29 Sep 2019 00:29:40 +0100 Subject: [PATCH 1/4] Add initial proposal for `this_system::discover_topology` * Add initial wording for system topology term of art. * Add initial proposal for `system_topology` * Add initial proposal for `this_system::discover_topology`. --- affinity/cpp-23/d1795r1.md | 102 ++++++++++++++++++++++++++++++++----- affinity/index.md | 4 +- 2 files changed, 91 insertions(+), 15 deletions(-) diff --git a/affinity/cpp-23/d1795r1.md b/affinity/cpp-23/d1795r1.md index 234dce0..741e0cd 100644 --- a/affinity/cpp-23/d1795r1.md +++ b/affinity/cpp-23/d1795r1.md @@ -1,8 +1,8 @@ # P1795r0: System topology discovery for heterogeneous & distributed computing -**Date: 2019-06-03** +**Date: 2019-09-28** -**Audience: SG1, SG14, LEWG** +**Audience: SG1, SG14** **Authors: Gordon Brown, Ruyman Reyes, Michael Wong, Mark Hoemmen, Jeff Hammond, Tom Scogland** @@ -12,7 +12,7 @@ # Acknowledgements -This paper is the result of discussions from man contributors within the heterogeneous C\+\+ group, including H. Carter Edwards, Thomas Rodgers, Patrice Roy, Carl Cook, Jeff Hammond, Hartmut Kaiser, Christian Trott, Paul Blinzer, Alex Voicu, Nat Goodspeed and Tony Tye. +This paper is the result of discussions from many contributors within the heterogeneous C\+\+ group, including H. Carter Edwards, Thomas Rodgers, Patrice Roy, Carl Cook, Jeff Hammond, Hartmut Kaiser, Christian Trott, Paul Blinzer, Alex Voicu, Nat Goodspeed and Tony Tye. # Changelog @@ -30,7 +30,7 @@ For the earlier changelogs from prior to the split from P0796 see Appendix A. This paper is the result of a request from SG1 at the 2018 San Diego meeting to split [[17]][p0796] into two separate papers, one for the high-level interface and one for the low-level interface. This paper focusses on the low-level interface; a mechanism for discovering the topology and affinity properties of a given system. [[18]][p1436] focusses on the high-level interface, a series of properties for querying affinity relationships and requesting affinity on work being executed. -# Background +# 1. Background Computer systems are no longer homogeneous platforms. From desktop workstations to high-performance supercomputers, and from mobile devices to purpose-built embedded SoCs, every system has some form of co-processor along side the traditional multi-core CPU, and often more than one. Furthermore, the architectures of these co-processors range from many-core CPUs, GPUs, FPGAs and DSPs to specifically designed vision and machine learning processors. In larger supercomputer systems there are thousands of these processors in some configuration of nodes, connected physically or via network adapters. @@ -38,7 +38,7 @@ The way these processors access memory is also far from homogeneous. For example In order to program these new systems and the architectures that inhabit them, it's vital that applications are capable of understating both what architectures are available and the properties of those architectures, namely their observable behaviors, capabilities and limitations. However, the current C\+\+ standard provides no way to achieve this, so developers have to rely entirely on third party and operating system libraries. -# Goals: what this paper is, and what it is not +# 2. Goals: what this paper is, and what it is not This paper seeks to define, within C\+\+, a facility for discovering execution resources available to a system that are capable of executing work, and for querying their properties. @@ -46,13 +46,13 @@ However, it is not the goal of this proposal to introduce support in the C\+\+ l Instead, it seeks to define a single, unified, and stable layer in the C\+\+ Standard Library. Applications, libraries, and programming models (such as SYCL [[3]][sycl-1-2-1], Kokkos [[19]][kokkos], HPX [[13]][hpx] or TBB [[12]][tbb]) can build on this layer; hardware vendors can support it via standards such as OpenCL [[4]][opencl-2-2], CUDA [[20]][cuda], OpenMP [[6]][openmp-5], MPI [[16]][mpi], Hwloc [[2]][hwloc], HSA [[5]][HSA] and HMM [[21]][hmm]; and it can be extended when necessary. -This layer will not be characterized in terms of specific categories of hardware such as CPUs, GPUs and FPGAs as these are broad concepts that are subject to change over time and have no foundation in the C\+\+ machine model. It will instead define a number of abstract properties of system architectures that are not tied to any specific hardward. +This layer will not be characterized in terms of specific categories of hardware such as CPUs, GPUs and FPGAs as these are broad concepts that are subject to change over time and have no foundation in the C\+\+ machine model. It will instead define a number of abstract properties of system architectures that are not tied to any specific hardware. The initial set of properties that this paper would propose be defined in the C\+\+ standard library would reflect a generalization of the observable behaviors, capabilities and limitations of common architectures available in heterogeneous and distributed systems today. However the intention is that the interface be extensible so that that vendors can provide their own extensions to provide visibility into the more niche characteristics of certain architectures. It is intended that this layer be defined as a natural extension of the Executors proposal, a unified interface for execution. The current executors proposal [[14]][p0443] already provides a route to supporting heterogeneous and distributed systems, however it is missing a way to identify what architectures a system has. -# Motivation +# 3. Motivation There are many reasons why such a feature within C\+\+ would benefit developers and the C\+\+ ecosystem as a whole, and those can differ from one domain to another. We've attempted to outline some of these benefits here. @@ -98,11 +98,11 @@ For example, a unified C\+\+ interface for topology discovery could provide acce Another example of this is that while Hwloc is highly used in many domains, it now does not always accurately represent existing systems. This is because Hwloc presents their topology as strictly hierarchical, which no longer accurately describes many systems. A unified C\+\+ interface does not need to be bound to the limitations of a single library, and can provide a much broader representation of a system's execution resource topology. -# Proposed direction +# 5. Proposed direction -Below we outline a proposed direction: +This paper aims to build on the unified executors proposal, detailed in P0443 [[14]][p0443], so this proposal and any others that stem from it will target P0443 as a baseline, and aim to integrate with its direction as closely as possible. -* Align with the direction of the unified executors proposal [[14]][p0443]. +Below we outline a proposed direction: * Propose an abstract definition of an execution resource, as a hardware or software abstraction capable of creating execution agents. @@ -124,11 +124,87 @@ As a result of the above this paper may also: * Propose a lifetime model for execution agents. * Propose some additions to the C\+\+ machine model to facilitate describing these additional properties. -# Suggested straw polls +# 6. Proposal + +## Header `` synopsis + +```cpp +namespace std { +namespace experimental { + +/* system_topology */ + +class system_topology { + + system_topology() = delete; + + std::chrono::time_point timestamp() const noexcept; + +}; + +/* this_system::discover_topology */ + +namespace this_system { + +system_topology discover_topology(); + +} // namespace this_system + +} // experimental +} // std +``` + +## System topology + +The term *system topology* refers to a non-acyclic graph of abstract execution, memory, network and I/O resources within a system connected to the abstract machine, and their various properties. + +> [*Note:* The current definition of *system topology* is currently incomplete and will be developed over the course of this proposal as the various C\+\+ domains are represented. *--end note*] + +## Class `system_topology` + +The `system_topology` class provides an abstraction of a read-only snapshot of the *system topology* at a particular point in time. A `system_topology` may not maintain or otherwise be associated with the lifetime of operating system or third party library resources. + +### `system_topology` constructors + +```cpp +system_topology() = delete; +``` + +*Effects:* Explicitly deleted. + +### `timestamp` member function + +```cpp +std::chrono::time_point timestamp() const noexcept; +``` + +*Returns:* A `std::chrono::time_point` object representing the time at which the runtime discovery of the system topology performed to construct the `system_topology` object was completed. + +*Throws:* May not throw. + +## Free functions + +### `this_system::discover_topology` + +The free function `this_system::discover_topology` is provided for performing runtime discovery of the system topology, returning an instance of `system_topology`. + +```cpp +namespace this_system { + system_topology discover_topology(); +} // namespace this_system +``` + +*Returns:* A `system_topology` object representing a snapshot of the *system topology* at the current point in time. + +*Requires:* Calls to `this_system::discover_topology()` may not introduce a data race with any other call to `this_system::discover_topology()`. + +*Effects:* Performs runtime discovery of the system topology and constructs a `system_topology` object. May invoke the operating system or third party libraries in discovering topology information, but must release any resources acquired for this purpose before returning. + +*Throws:* Any exception thrown as a result of performing runtime discovery of the system topology. -Would SG1 like to see a continued effort to pursue the goals outlined in this paper? +# 7. Open questions -Does SG1 believe the proposed direction laid out in this paper is suitable to achieve those goals? +* Which perspectives are important for representing the resources of a *system topology*? # References diff --git a/affinity/index.md b/affinity/index.md index add378e..283e011 100644 --- a/affinity/index.md +++ b/affinity/index.md @@ -51,7 +51,7 @@ This paper is the result of a request from SG1 at the 2018 San Diego meeting to [p1436r0]: https://wg21.link/p1436r0 [p1436r1]: https://wg21.link/p1436r1 -[p1436-latest]: \cpp-23\d1436r2.md +[p1436-latest]: /cpp-23/d1436r2.md [p1437r0]: https://wg21.link/p1437r0 -[p1437-latest]: \cpp-23\d1795r1.md +[p1437-latest]: /cpp-23/d1795r1.md From c0ccc513f1a04ba169d8409009f2067c6747c897 Mon Sep 17 00:00:00 2001 From: Gordon Brown Date: Sat, 5 Oct 2019 02:37:03 +0100 Subject: [PATCH 2/4] CP013: Expand on initial design for P1795 proposal. * Introduce terms of art for system resource and topology discovery policy. * Introduce minimal `system_resource` class. * Introduce free function `traverse_topology` for traversing a `system_topology` according to a topology traversal policy. * Update the changelog. --- affinity/cpp-23/d1795r1.md | 74 +++++++++++++++++++++++++++++++++----- 1 file changed, 65 insertions(+), 9 deletions(-) diff --git a/affinity/cpp-23/d1795r1.md b/affinity/cpp-23/d1795r1.md index 741e0cd..76744e1 100644 --- a/affinity/cpp-23/d1795r1.md +++ b/affinity/cpp-23/d1795r1.md @@ -1,12 +1,12 @@ -# P1795r0: System topology discovery for heterogeneous & distributed computing +# P1795r1: System topology discovery for heterogeneous & distributed computing -**Date: 2019-09-28** +**Date: 2019-10-05** **Audience: SG1, SG14** -**Authors: Gordon Brown, Ruyman Reyes, Michael Wong, Mark Hoemmen, Jeff Hammond, Tom Scogland** +**Authors: Gordon Brown, Ruyman Reyes, Michael Wong, Mark Hoemmen, Jeff Hammond, Tom Scogland, Domagoj Šarić** -**Emails: gordon@codeplay.com, ruyman@codeplay.com, michael@codeplay.com, mhoemme@sandia.gov, jeff.science@gmail.com, tscogland@llnl.gov** +**Emails: gordon@codeplay.com, ruyman@codeplay.com, michael@codeplay.com, mhoemme@sandia.gov, jeff.science@gmail.com, tscogland@llnl.gov, domagoj.saric@microblink.com** **Reply to: gordon@codeplay.com** @@ -16,6 +16,14 @@ This paper is the result of discussions from many contributors within the hetero # Changelog +### P1437r1 (BEL 2019) + +* Introduce terms of art for *system topology*, *system resource* and *topology traversal policy*. +* Introduce minimal design for `system_topology` class. +* Introduce minimal design for `system_resource` class. +* Introduce free function `this_system::discover_topology` for performing runtime system topology discovery. +* Introduce free function `traverse_topology` for traversing a `system_topology` using a *topology traversal policy* to return a collection of `execution_resource`s, + ### P1437r0 (COL 2019) * Split off from [[17]][p0796], focussing on a mechanism for discovering the topology and affinity properties of a given system. @@ -142,6 +150,19 @@ class system_topology { }; +/* system_resource */ + +class system_resource { + + system_resource() = delete; + +}; + +/* traverse_topology */ + +template +std::vector traverse_topology(const system_topology &, const T &) noexcept; + /* this_system::discover_topology */ namespace this_system { @@ -154,15 +175,19 @@ system_topology discover_topology(); } // std ``` -## System topology +## Terms of art + +The term *system resource* refers to a hardware or software abstraction of an execution, memory, network or I/O resource within a system. -The term *system topology* refers to a non-acyclic graph of abstract execution, memory, network and I/O resources within a system connected to the abstract machine, and their various properties. +The term *system topology* refers to a possibly cyclic graph of *execution resources* connected to the abstract machine, and their various properties. > [*Note:* The current definition of *system topology* is currently incomplete and will be developed over the course of this proposal as the various C\+\+ domains are represented. *--end note*] +The term *topology traversal policy* refers to a policy that describes the way in which a *system topology* is traversed in order to to produce a collection of *system resources*. + ## Class `system_topology` -The `system_topology` class provides an abstraction of a read-only snapshot of the *system topology* at a particular point in time. A `system_topology` may not maintain or otherwise be associated with the lifetime of operating system or third party library resources. +The `system_topology` class provides an abstraction of a read-only snapshot of the *system topology* at a particular point in time. A `system_topology` object may not maintain or otherwise be associated with the lifetime of operating system or third party library resources. ### `system_topology` constructors @@ -182,11 +207,23 @@ std::chrono::time_point timestamp() const noexcept; *Throws:* May not throw. +## Class `system_resource` + +The `system_resource` class provides an abstraction of a read-only snapshot of a *system resource* from the *system topology* at a particular point in time. A `system_resource` object may not maintain or otherwise be associated with the lifetime of operating system or third party library resources. + +### `system_resource` constructors + +```cpp +system_resource() = delete; +``` + +*Effects:* Explicitly deleted. + ## Free functions ### `this_system::discover_topology` -The free function `this_system::discover_topology` is provided for performing runtime discovery of the system topology, returning an instance of `system_topology`. +The free function `this_system::discover_topology` performs runtime discovery of the *system topology* and returns a `system_topology` object. ```cpp namespace this_system { @@ -202,9 +239,28 @@ namespace this_system { *Throws:* Any exception thrown as a result of performing runtime discovery of the system topology. +### `traverse_topology` + +The free function `traverse_topology` performs a traversal of a `system_topology` object using a *topology traversal policy* specified by the tag type `T` and returns a `std::vector`. + +```cpp +template +std::vector traverse_topology(const system_topology &, const T &) noexcept; +``` + +*Returns:* A `std::vector` object representing the *system resources* matching the criteria of the *topology traversal policy*. + +*Effects:* Traverses the `system_topology` object provided and identifies any *system resources* which match the criteria of the *topology traversal policy*, storing a single `system_resource` object in the returned `std::vector` for each match found. + +*Throws:* May not throw. + # 7. Open questions -* Which perspectives are important for representing the resources of a *system topology*? +> What kind of *topology traversal policies* would people list to see standardized? + +> How should we support notification of a topology update, polling or callback? + +> Should we also provide an interface for compile-time topology discovery? # References From 0e2cd7fcc8b9fe77e0919388f35f4137b96bf81a Mon Sep 17 00:00:00 2001 From: Gordon Brown Date: Mon, 7 Oct 2019 03:21:53 +0100 Subject: [PATCH 3/4] Make changes based on feedback. * Remove `timestamp` member function until we can decide on the granularity of `discover_topology`. * Update link to P0443 to point to latest revision. * Change return value of `traverse_topology` to a ranges::view`. --- affinity/cpp-23/d1795r1.md | 32 ++++++++++---------------------- 1 file changed, 10 insertions(+), 22 deletions(-) diff --git a/affinity/cpp-23/d1795r1.md b/affinity/cpp-23/d1795r1.md index 76744e1..2e94b7a 100644 --- a/affinity/cpp-23/d1795r1.md +++ b/affinity/cpp-23/d1795r1.md @@ -1,6 +1,6 @@ # P1795r1: System topology discovery for heterogeneous & distributed computing -**Date: 2019-10-05** +**Date: 2019-10-07** **Audience: SG1, SG14** @@ -58,7 +58,7 @@ This layer will not be characterized in terms of specific categories of hardware The initial set of properties that this paper would propose be defined in the C\+\+ standard library would reflect a generalization of the observable behaviors, capabilities and limitations of common architectures available in heterogeneous and distributed systems today. However the intention is that the interface be extensible so that that vendors can provide their own extensions to provide visibility into the more niche characteristics of certain architectures. -It is intended that this layer be defined as a natural extension of the Executors proposal, a unified interface for execution. The current executors proposal [[14]][p0443] already provides a route to supporting heterogeneous and distributed systems, however it is missing a way to identify what architectures a system has. +It is intended that this layer be defined as a natural extension of the Executors proposal, a unified interface for execution. The current executors proposal [[14]][p0443r11] already provides a route to supporting heterogeneous and distributed systems, however it is missing a way to identify what architectures a system has. # 3. Motivation @@ -108,7 +108,7 @@ Another example of this is that while Hwloc is highly used in many domains, it n # 5. Proposed direction -This paper aims to build on the unified executors proposal, detailed in P0443 [[14]][p0443], so this proposal and any others that stem from it will target P0443 as a baseline, and aim to integrate with its direction as closely as possible. +This paper aims to build on the unified executors proposal, detailed in P0443 [[14]][p0443r11], so this proposal and any others that stem from it will target P0443 as a baseline, and aim to integrate with its direction as closely as possible. Below we outline a proposed direction: @@ -146,8 +146,6 @@ class system_topology { system_topology() = delete; - std::chrono::time_point timestamp() const noexcept; - }; /* system_resource */ @@ -161,7 +159,7 @@ class system_resource { /* traverse_topology */ template -std::vector traverse_topology(const system_topology &, const T &) noexcept; +ranges::view traverse_topology(const system_topology &, const T &) noexcept; /* this_system::discover_topology */ @@ -197,16 +195,6 @@ system_topology() = delete; *Effects:* Explicitly deleted. -### `timestamp` member function - -```cpp -std::chrono::time_point timestamp() const noexcept; -``` - -*Returns:* A `std::chrono::time_point` object representing the time at which the runtime discovery of the system topology performed to construct the `system_topology` object was completed. - -*Throws:* May not throw. - ## Class `system_resource` The `system_resource` class provides an abstraction of a read-only snapshot of a *system resource* from the *system topology* at a particular point in time. A `system_resource` object may not maintain or otherwise be associated with the lifetime of operating system or third party library resources. @@ -245,12 +233,12 @@ The free function `traverse_topology` performs a traversal of a `system_topology ```cpp template -std::vector traverse_topology(const system_topology &, const T &) noexcept; +ranges::view traverse_topology(const system_topology &, const T &) noexcept; ``` -*Returns:* A `std::vector` object representing the *system resources* matching the criteria of the *topology traversal policy*. +*Returns:* A view to a sequence of `system_resource` objects representing the *system resources* matching the criteria of the *topology traversal policy*. -*Effects:* Traverses the `system_topology` object provided and identifies any *system resources* which match the criteria of the *topology traversal policy*, storing a single `system_resource` object in the returned `std::vector` for each match found. +*Effects:* Traverses the `system_topology` object provided and identifies any *system resources* which match the criteria of the *topology traversal policy*, adding a single `system_resource` to the sequence returned for each match found. *Throws:* May not throw. @@ -303,9 +291,9 @@ std::vector traverse_topology(const system_topology &, const T [hpx]: https://github.com/STEllAR-GROUP/hpx [[13]][hpx] HPX -[p0443]: -http://wg21.link/p0443 -[[14]][p0443] A Unified Executors Proposal for C\+\+ +[p0443r11]: +http://wg21.link/p0443r11 +[[14]][p0443r11] A Unified Executors Proposal for C\+\+ [exposing-locality]: https://docs.google.com/viewer?a=v&pid=sites&srcid=bGJsLmdvdnxwYWRhbC13b3Jrc2hvcHxneDozOWE0MjZjOTMxOTk3NGU3 [[15]][exposing-locality] Exposing the Locality of new Memory Hierarchies to HPC Applications From 01c03ebd8f1dcc7de31c43652d5ef6ad21d3605b Mon Sep 17 00:00:00 2001 From: Gordon Brown Date: Tue, 8 Oct 2019 02:28:16 +0100 Subject: [PATCH 4/4] Add further improvements based on feedback. * Make the return type of `traverse_topology` be to be defined and add note about potential options. * Add more detail to open questions section. --- affinity/cpp-23/d1795r1.md | 31 ++++++++++++------------------- 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/affinity/cpp-23/d1795r1.md b/affinity/cpp-23/d1795r1.md index 2e94b7a..bb65a51 100644 --- a/affinity/cpp-23/d1795r1.md +++ b/affinity/cpp-23/d1795r1.md @@ -137,7 +137,6 @@ As a result of the above this paper may also: ## Header `` synopsis ```cpp -namespace std { namespace experimental { /* system_topology */ @@ -152,14 +151,14 @@ class system_topology { class system_resource { - system_resource() = delete; + /* to be defined */ }; /* traverse_topology */ template -ranges::view traverse_topology(const system_topology &, const T &) noexcept; +to-be-decided traverse_topology(const system_topology &, const T &) noexcept; /* this_system::discover_topology */ @@ -170,7 +169,6 @@ system_topology discover_topology(); } // namespace this_system } // experimental -} // std ``` ## Terms of art @@ -199,13 +197,7 @@ system_topology() = delete; The `system_resource` class provides an abstraction of a read-only snapshot of a *system resource* from the *system topology* at a particular point in time. A `system_resource` object may not maintain or otherwise be associated with the lifetime of operating system or third party library resources. -### `system_resource` constructors - -```cpp -system_resource() = delete; -``` - -*Effects:* Explicitly deleted. +> [*Note:* The `system_resource` class is intended to reflect the properties of a *system resource* and it's relationships with other *system resources*, however the precise definition is still to be decided. *--end note*] ## Free functions @@ -229,26 +221,27 @@ namespace this_system { ### `traverse_topology` -The free function `traverse_topology` performs a traversal of a `system_topology` object using a *topology traversal policy* specified by the tag type `T` and returns a `std::vector`. +The free function `traverse_topology` performs a traversal of a `system_topology` object using a *topology traversal policy* specified by the tag type `T` and returns a sequence of `system_resource` objects. ```cpp template -ranges::view traverse_topology(const system_topology &, const T &) noexcept; +to-be-decided traverse_topology(const system_topology &, const T &) noexcept; ``` -*Returns:* A view to a sequence of `system_resource` objects representing the *system resources* matching the criteria of the *topology traversal policy*. +*Returns:* A sequence of `system_resource` objects representing the *system resources* matching the criteria of the *topology traversal policy*. *Effects:* Traverses the `system_topology` object provided and identifies any *system resources* which match the criteria of the *topology traversal policy*, adding a single `system_resource` to the sequence returned for each match found. *Throws:* May not throw. -# 7. Open questions - -> What kind of *topology traversal policies* would people list to see standardized? +> [*Note:* The exact representation of *system resources* returned by `traverse_topology` is still to be decided as this will have implications on lifetimes. One option is to return a container of *system_resource* objects by-value such as a `vector`, however this would require some form of reference counting. Another option is to return a reference to a reference to the *system_resource* objects via a `span ` or a `ranges::view`, however this would require the `system_topology` object to remain alive. *--end note*] -> How should we support notification of a topology update, polling or callback? +# 7. Open questions -> Should we also provide an interface for compile-time topology discovery? +* How granular should topology discovery be, should they whole topology be discovered in a single operation or should it be done in multiple nested operations, only discovering what is needed at each layer? +* What kind of *topology traversal policies* would people list to see standardized? +* How should we support notification of a topology update, polling or callback? +* Should we also provide an interface for compile-time topology discovery? # References