The full stack of concepts defined by the alpaka library and their inheritance hierarchy is shown in the third column of the preceding figure. Default implementations for those concepts can be seen in the blueish columns. The various accelerator implementations, shown in the lower half of the figure, only differ in some of their underlying concepts but can share most of the base implementations. The default implementations can, but do not have to be used at all. They can be replaced by user code in arbitrary granularity. By substituting, for instance, the atomic operation implementation of an accelerator, the execution can be fine-tuned, to better utilize the hardware instruction set of a specific processor. However, also complete accelerators, devices and all of the other concepts can be implemented by the user without the need to change any part of the alpaka library itself. The way this and other things are implemented is explained in the following paragraphs.
The alpaka library has been implemented with extensibility in mind. This means that there are no predefined classes, modeling the concepts, the alpaka functions require as input parameters. They allow arbitrary types as parameters, as long as they model the required concept.
C++ provides a language inherent object oriented abstraction allowing to check that parameters to a function comply with the concept they are required to model.
By defining interface classes, which model the alpaka concepts, the user would be able to inherit his extension classes from the interfaces he wants to model and implement the abstract virtual methods the interfaces define.
The alpaka functions in turn would use the corresponding interface types as their parameter types.
For example, the Buffer
concept requires methods for getting the pitch or changing the memory pinning state.
With this intrusive object oriented design pattern the BufCpu
or BufCudaRt
classes would have to inherit from an IBuffer
interface and implement the abstract methods it declares.
An example of this basic pattern is shown in the following source snippet:
struct IBuffer
{
virtual std::size_t getPitch() const = 0;
virtual void pin() = 0;
virtual void unpin() = 0;
...
};
struct BufCpu : public IBuffer
{
virtual std::size_t getPitch() const override { ... }
virtual void pin() override { ... }
virtual void unpin() override { ... }
...
};
ALPAKA_FN_HOST auto copy(
IBuffer & dst,
IBuffer const & src)
-> void
{
...
}
The compiler can then check at compile time that the objects the user wants to use as function parameters can be implicitly cast to the interface type, which is the case for inherited base classes. The compiler returns an error message on a type mismatch. However, if the alpaka library were using those language inherent object oriented abstractions, the extensibility and optimizability it promises would not be possible. Classes and run-time polymorphism require the implementer of extensions to intrusively inherit from predefined interfaces and override special virtual functions.
This is feasible for user defined classes or types where the source code is available and where it can be changed.
The std::vector
class template on the other hand would not be able to model the Buffer
concept because we can not change its definition to inherit from the IBuffer
interface class since it is part of the standard library.
The standard inheritance based object orientation of C++ only works well when all the code it is to interoperate with can be changed to implement the interfaces.
It does not enable interaction with unalterable or existing code that is too complex to change, which is the reality in the majority of software projects.
Another option to implement an extensible library is to follow the way the C++ standard library uses.
It allows to specialize function templates for user types to model concepts without altering the types themselves.
For example, the std::begin
and std::end
free function templates can be specialized for user defined types.
With those functions specialized, the C++11 range-based for loops (for(auto & i : userContainer){...}
) see C++ Standard 6.5.4/1 can be used with user defined types.
Equally specializations of std::swap
and other standard library function templates can be defined to extend those with support for user types.
One Problem with function specialization is, that only full specializations are allowed.
A partial function template specialization is not allowed by the standard.
Another problem can emerge due to users carelessly overloading the template functions instead of specializing them.
Mixing function overloading and function template specialization on the same base template function can result in unexpected results.
The reasons and effects of this are described more closely in an article from H. Sutter (currently convener of the ISO C++ committee) called Sutter's Mill: Why Not Specialize Function Templates? in the C/C++ Users Journal in July 2001.
The solution given in the article is to provide "a single function template that should never be specialized or overloaded". This function simply forwards its arguments "to a class template containing a static function with the same signature". This template class can fully or partially be specialized without affecting overload resolution.
The way the alpaka library implements this is by not using the C++ inherent object orientation but lifting those abstractions to a higher level.
Instead of using a non-extensibleclass
/struct
for defining the interface, a namespace is utilized.
In place of abstract virtual member functions of the interface, alpaka defines free functions within those namespaces.
All those functions are templates allowing the user to call them with arbitrary self defined types and not only those inheriting from a special interface type.
Unlike member functions, they have no implicit this
pointer, so the object instance has to be explicitly given as a parameter.
Overriding the abstract virtual interface methods is replaced by the specialization of a template type that is defined for each such namespace function.
A concept is completely implemented by specializing the predefined template types.
This allows to extend and fine-tune the implementation non-intrusively.
For example, the corresponding pitch and memory pinning template types can be specialized for std::vector
.
After doing this, the std::vector
can be used everywhere a buffer is accepted as argument throughout the whole alpaka library without ever touching its definition.
A simple function allowing arbitrary tasks to be enqueued into a queue can be implemented in the way shown in the following code.
The TSfinae
template parameter will be explained in a following section.
namespace queue
{
template<
typename TQueue,
typename TTask,
typename TSfinae = void>
struct Enqueue;
template<
typename TQueue,
typename TTask>
ALPAKA_FN_HOST auto enqueue(
TQueue & queue,
TTask & task)
-> void
{
Enqueue<
TQueue,
TTask>
::enqueue(
queue,
task);
}
}
A user who wants his queue type to be used with this enqueue
function has to specialize the Enqueue
template struct.
This can be either done partially by only replacing the TQueue
template parameter and accepting arbitrary tasks or by fully specializing and replacing both TQueue
and TTask
. This gives the user complete freedom of choice.
The example given in the following code shows this by specializing the Enqueue
type for a user queue type UserQueue
and arbitrary tasks.
struct UserQueue{};
namespace queue
{
// partial specialization
template<
typename TTask>
struct Enqueue<
UserQueue
TTask>
{
ALPAKA_FN_HOST static auto enqueue(
UserQueue & queue,
TTask & task)
-> void
{
//...
}
};
}
In addition the subsequent code shows a full specialization of the Enqueue
type for a given UserQueue
and a UserTask
.
struct UserQueue{};
struct UserTask{};
namespace queue
{
// full specialization
template<>
struct Enqueue<
UserQueue
UserTask>
{
ALPAKA_FN_HOST static auto enqueue(
UserQueue & queue,
UserTask & task)
-> void
{
//...
}
};
}
When the enqueue
function template is called with an instance of UserQueue
, the most specialized version of the Enqueue
template is selected depending on the type of the task TTask
it is called with.
A type can model the queue concept completely by defining specializations for alpaka::queue::Enqueue
and alpaka::queue::Empty
.
This functionality can be accessed by the corresponding alpaka::queue::enqueue
and alpaka::queue::empty
template functions.
Currently there is no native language support for describing and checking concepts in C++ at compile time. A study group (SG8) is working on the ISO specification for conecpts and compiler forks implementing them do exist. For usage in current C++ there are libraries like Boost.ConceptCheck which try to emulate requirement checking of concept types. Those libraries often exploit the preprocessor and require non-trivial changes to the function declaration syntax. Therefore the alpaka library does not currently make use of Boost.ConceptCheck. Neither does it facilitate the proposed concept specification due to its dependency on non-standard compilers.
The usage of concepts as described in the working draft would often dramatically enhance the compiler error messages in case of violation of concept requirements. Currently the error messages are pointing deeply inside the stack of library template invocations where the missing method or the like is called. Instead of this, with concept checking it would directly fail at the point of invocation of the outermost template function with an expressive error message about the parameter and its violation of the concept requirements. This would simplify especially the work with extendable template libraries like Boost or alpaka. However, in the way concept checking would be used in the alpaka library, omitting it does not change the semantic of the program, only the compile time error diagnostics. In the future when the standard incorporates concept checking and the major compilers support it, it will be added to the alpaka library.
Basic template specialization only allows for a selection of the most specialized version where all explicitly stated types have to be matched identically.
It is not possible to enable or disable a specialization based on arbitrary compile time expressions depending on the parameter types.
To allow such conditions, alpaka adds a defaulted and unused TSfinae
template parameter to all declarations of the implementation template structs.
This was shown using the example of the Enqueue
template type.
The C++ technique called SFINAE, an acronym for Substitution failure is not an error allows to disable arbitrary specializations depending on compile time conditions.
Specializations where the substitution of the parameter types by the deduced types would result in invalid code will not result in a compile error, but will simply be omitted.
An example in the context of the Enqueue
template type is shown in the following code.
struct UserQueue{};
namespace queue
{
template<
typename TQueue,
typename TTask>
struct Enqueue<
TQueue
TTask,
typename std::enable_if<
std::is_base_of<UserQueue, TQueue>::value
&& (TTask::TaskId == 1u)
>::type>
{
ALPAKA_FN_HOST static auto enqueue(
TQueue & queue,
TTask & task)
-> void
{
//...
}
};
}
The Enqueue
specialization shown here does not require any direct type match for the TQueue
or the TTask
template parameter.
It will be used in all contexts where TQueue
has inherited from UserQueue
and where the TTask
has a static const integral member value TaskId
that equals one.
If the TTask
type does not have a TaskId
member, this code would be invalid and the substitution would fail.
However, due to SFINAE, this would not result in a compiler error but rather only in omitting this specialization.
The std::enable_if
template results in a valid expression, if the condition it contains evaluates to true, and an invalid expression if it is false.
Therefore it can be used to disable specializations depending on arbitrary boolean conditions.
It is utilized in the case where the TaskId
member is unequal one or the TQueue
does not inherit from UserQueue
.
In this cirumstances, the condition itself results in valid code but because it evaluates to false, the std::enable_if
specialization results in invalid code and the whole Enqueue
template specialization gets omitted.