Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend the documentation with more information about multidimensional ranges #1569

Merged
merged 6 commits into from
Nov 28, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -72,4 +72,39 @@ along its longest axis. When used with ``parallel_for``, it causes the
loop to be "recursively blocked" in a way that improves cache usage.
This nice cache behavior means that using ``parallel_for`` over a
``blocked_range2d<T>`` can make a loop run faster than the sequential
equivalent, even on a single processor.
equivalent, even on a single processor.

Also, ``blocked_range2d`` allows to use different value types across
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blocked_range2d allows you to use different value types for its two dimensions: rows (the first dimension) and columns (the second dimension).

its first dimension (called "rows") and the second one ("columns").
That allows combining indexes, pointers, and iterators into a joint
akukanov marked this conversation as resolved.
Show resolved Hide resolved
iteration space. The method functions ``rows()`` and ``cols()`` return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To get the range for each dimension, use the rows() and cols() methods that return the respective dimensions as blocked_range objects.

corresponding dimensions in the form of a ``blocked_range``.

The ``blocked_range3d`` class template extends this approach to 3D by adding
``pages()`` as the first dimension, followed by ``rows()`` and ``cols()``.

The ``blocked_nd_range<T,N>`` class template represents a blocked iteration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that I got it right, but maybe smth like:

The blocked_nd_range<T,N> class template represents a blocked iteration space of any dimensionality. However, unlike other classes, all dimensions in blocked_nd_range must share the same value type. Instead of specifying boundary values directly, you pass N instances of blocked_range<T> to the constructor. This difference is also reflected in its different naming pattern.

space of any dimensionality, but in a slightly different way. All dimensions
of ``blocked_nd_range`` must be specified over the same value type, and the
constructor takes N instances of ``blocked_range<T>``, not individual boundary
values. To indicate the distinctions, the different naming pattern was chosen.


An Example of a Multidimensional Iteration Space
akukanov marked this conversation as resolved.
Show resolved Hide resolved
------------------------------------------------

The example demonstrates calculation of a 3-dimensional filter over the pack
of feature maps, applying a kernel to a subrange of features.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
of feature maps, applying a kernel to a subrange of features.
of feature maps by applying a kernel to a subrange of features.


The ``convolution3d`` function iterates over the output cells and sets cell
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convolution3d function processes each output cell, setting its value based on the result of the kernel3d function. This function combines values from the feature maps to produce the final result.

values to the result of the ``kernel3d`` function, which summarizes values
from feature maps.

For the computation to be performed in parallel, ``tbb::parallel_for`` is called
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For the computation to be performed in parallel, ``tbb::parallel_for`` is called
To enable parallel computation, ``tbb::parallel_for`` is called

with ``tbb::blocked_nd_range<int,3>`` as an argument. The body function then
iterates over the received 3-dimensional subrange in a loop nest, using
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within the body function, a nested loop iterates over the 3D subrange received. The dim method is used to get the loop boundaries for each dimension.

the ``dim`` method function to obtain loop boundaries for each dimension.


.. literalinclude:: ./snippets/blocked_nd_range_example.h
:language: c++
5 changes: 3 additions & 2 deletions doc/main/tbb_userguide/parallel_for_os.rst
Original file line number Diff line number Diff line change
Expand Up @@ -55,8 +55,9 @@ before each identifier. The rest of the examples assume that such a
Note the argument to ``operator()``. A ``blocked_range<T>`` is a
template class provided by the library. It describes a one-dimensional
iteration space over type ``T``. Class ``parallel_for`` works with other
kinds of iteration spaces too. The library provides ``blocked_range2d``
for two-dimensional spaces. You can define your own spaces as explained
kinds of iteration spaces too. The library provides ``blocked_range2d``,
``blocked_range3d``, and ``blocked_nd_range`` for multidimensional spaces.
You can define your own spaces as explained
in :ref:`Advanced_Topic_Other_Kinds_of_Iteration_Spaces`.


Expand Down
37 changes: 37 additions & 0 deletions doc/main/tbb_userguide/snippets/blocked_nd_range_example.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#include "blocked_nd_range_example.h"
#include <vector>
#include <cassert>

int main() {
const int kernel_length = 9;
const int kernel_width = 5;
const int kernel_height = 5;

const int feature_maps_length = 128;
const int feature_maps_width = 16;
const int feature_maps_heigth = 16;

const int out_length = feature_maps_length - kernel_length + 1;
const int out_width = feature_maps_width - kernel_width + 1;
const int out_heigth = feature_maps_heigth - kernel_height + 1;

// Initializes feature maps with 1 in each cell and out with zeros.
std::vector<std::vector<std::vector<float>>> feature_maps(feature_maps_length, std::vector<std::vector<float>>(feature_maps_width, std::vector<float>(feature_maps_heigth, 1.0f)));
std::vector<std::vector<std::vector<float>>> out(out_length, std::vector<std::vector<float>>(out_width, std::vector<float>(out_heigth, 0.f)));

// 3D convolution calculates sum of all elements in kernel
akukanov marked this conversation as resolved.
Show resolved Hide resolved
convolution3d(feature_maps, out,
out_length, out_width, out_heigth,
kernel_length, kernel_width, kernel_height);

// Checks correctness of convolution by equality to expected sum of elements
akukanov marked this conversation as resolved.
Show resolved Hide resolved
float expected = float(kernel_length * kernel_height * kernel_width);
for (auto i : out) {
for (auto j : i) {
for (auto k : j) {
assert(k == expected && "convolution failed to calculate correctly");
}
}
}
return 0;
}
37 changes: 37 additions & 0 deletions doc/main/tbb_userguide/snippets/blocked_nd_range_example.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
#include "oneapi/tbb/blocked_nd_range.h"
#include "oneapi/tbb/parallel_for.h"

template<typename Features>
float kernel3d(const Features& feature_maps, int i, int j, int k,
int kernel_length, int kernel_width, int kernel_height) {
float result = 0.f;

for (int feature_i = i; feature_i < i + kernel_length; ++feature_i)
for (int feature_j = j; feature_j < j + kernel_width; ++feature_j)
for (int feature_k = k; feature_k < k + kernel_width; ++feature_k)
result += feature_maps[feature_i][feature_j][feature_k];

return result;
}

template<typename Features, typename Output>
void convolution3d(const Features& feature_maps, Output& out,
int out_length, int out_width, int out_heigth,
int kernel_length, int kernel_width, int kernel_height) {
using range_t = oneapi::tbb::blocked_nd_range<int, 3>;

oneapi::tbb::parallel_for(
range_t({0, out_length}, {0, out_width}, {0, out_heigth}),
[&](const range_t& out_range) {
auto out_x = out_range.dim(0);
auto out_y = out_range.dim(1);
auto out_z = out_range.dim(2);

for (int i = out_x.begin(); i < out_x.end(); ++i)
for (int j = out_y.begin(); j < out_y.end(); ++j)
for (int k = out_z.begin(); k < out_z.end(); ++k)
out[i][j][k] = kernel3d(feature_maps, i, j, k,
kernel_length, kernel_width, kernel_height);
}
);
}
Loading