Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make babelstream runnable and test for different Accs #2299

Merged

Conversation

mehmetyusufoglu
Copy link
Contributor

@mehmetyusufoglu mehmetyusufoglu commented Jun 24, 2024

babelstream benchmarking was creating errors and had unnecessary parts (because it was developed as an extention of the test at Bristol Uni. )

  • Unnecesssary command line options are removed, 2 command line custom arguments are available
  • Run with Custom arguments:
    ./babelstream --array-size=1280000 --number-runs=10
  • Run with Catch2 arguments:
    ./babelstream --success
    ./babelstream -r a.xml
  • Run with Custom and catch2 arguments together:
    ./babelstream --success --array-size=1280000 --number-runs=10
  • Run to list custom and catch2 arguments:
    ./babelstream --help (or -h or -?)
  • Babelstream has 5 simple kernels for vector operations. Add, Mul, Copy, Triad and Dot product. And there is an Init kernel.

  • RunTimes are measured by using std::chrono

  • RESULTS for ./babelstream --array-size=33554432 --number-runs=100 are below. They are equal ( or slightly better) compared to previous implementation which runs 100 times and uses minimum time as well.


AcceleratorType:AccCpuSerial<1,unsigned int>
NumberOfRuns:100
Precision:double
DataSize(items):33554432
DeviceName:13th Gen Intel(R) Core(TM) i7-1360P
WorkDivInit :{gridBlockExtent: (33554432), blockThreadExtent: (1), threadElemExtent: (1)}
WorkDivCopy :{gridBlockExtent: (33554432), blockThreadExtent: (1), threadElemExtent: (1)}
WorkDivMult :{gridBlockExtent: (33554432), blockThreadExtent: (1), threadElemExtent: (1)}
WorkDivAdd  :{gridBlockExtent: (33554432), blockThreadExtent: (1), threadElemExtent: (1)}
WorkDivTriad:{gridBlockExtent: (33554432), blockThreadExtent: (1), threadElemExtent: (1)}
Kernels         Bandwidths(GB/s) MinTime(s) MaxTime(s) AvgTime(s) DataUsage(MB) 
 InitKernel      12.626          0.042521 0.044909 0.045561  536.87 
 CopyKernel      21.185          0.025342 0.027276 0.025485  536.87 
 MultKernel      21.357          0.025138 0.026368 0.025286  536.87 
 AddKernel       25.272          0.031865 0.041926 0.032137  805.31 
 TriadKernel     25.174          0.03199  0.033468 0.032193  805.31 



AcceleratorType:AccGpuCudaRt<1,unsigned int>
NumberOfRuns:100
Precision:double
DataSize(items):33554432
DeviceName:NVIDIA RTX A500 Laptop GPU
WorkDivInit :{gridBlockExtent: (32768), blockThreadExtent: (1024), threadElemExtent: (1)}
WorkDivCopy :{gridBlockExtent: (32768), blockThreadExtent: (1024), threadElemExtent: (1)}
WorkDivMult :{gridBlockExtent: (32768), blockThreadExtent: (1024), threadElemExtent: (1)}
WorkDivAdd  :{gridBlockExtent: (32768), blockThreadExtent: (1024), threadElemExtent: (1)}
WorkDivTriad:{gridBlockExtent: (32768), blockThreadExtent: (1024), threadElemExtent: (1)}
WorkDivDot  :{gridBlockExtent: (256), blockThreadExtent: (1024), threadElemExtent: (1)}
Kernels         Bandwidths(GB/s) MinTime(s) MaxTime(s) AvgTime(s) DataUsage(MB) 
 InitKernel      62.451          0.0085967 0.0086461 0.0086267  536.87 
 CopyKernel      89.58           0.0059932 0.0060739 0.0060465  536.87 
 MultKernel      89.359          0.006008 0.0060836 0.0060491  536.87 
 AddKernel       90.551          0.0088934 0.0089586 0.008937  805.31 
 TriadKernel     90.61           0.0088876 0.0089629 0.0089369  805.31 
 DotKernel       93.256          0.0057569 0.0058079 0.0057916  536.87 

Full output file:

babelStreamOut.txt

Old implementation values for 4 kernels 100 runs in the following image:

Screenshot from 2024-07-31 15-58-20

@mehmetyusufoglu mehmetyusufoglu marked this pull request as draft June 24, 2024 13:59
@mehmetyusufoglu mehmetyusufoglu force-pushed the updateBenchmarkBabelStr branch 11 times, most recently from 1965c6b to b38be37 Compare June 27, 2024 14:26
@mehmetyusufoglu mehmetyusufoglu changed the title [Wip] Make babelstream runnable and use Catch2 advanced benchmarking Make babelstream runnable and use Catch2 advanced benchmarking Jun 27, 2024
@mehmetyusufoglu mehmetyusufoglu marked this pull request as ready for review June 27, 2024 15:00
@mehmetyusufoglu mehmetyusufoglu marked this pull request as draft June 27, 2024 15:00
@mehmetyusufoglu mehmetyusufoglu marked this pull request as ready for review June 27, 2024 15:48
@mehmetyusufoglu mehmetyusufoglu force-pushed the updateBenchmarkBabelStr branch 2 times, most recently from 6ffd219 to 0a5f6bd Compare June 28, 2024 12:27
{
dataTypeStr = "double-precision";
}
std::cout << "Data type: " << typeid(DataType).name() << " " << dataTypeStr << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know, if there is some reporting function to add the information as key value pair to the output? At the moment, it simply captures the stdout and write a string to the xml, which make it hard to parse.

<?xml version="1.0" encoding="UTF-8"?>
<Catch2TestRun name="babelstream" rng-seed="1921823939" xml-format-version="2" catch2-version="3.3.2">
  <TestCase name="Babelstream Bechmark by Alpaka Kernel - Single P. - TestAccs1D - 0" tags="[benchmark-test]" filename="/home/simeon/projects/alpaka/benchmarks/babelstream/src/benchmarkTest.cpp" line="418">
    <BenchmarkResults name="CopyKernel" samples="100" resamples="100000" iterations="1" clockResolution="32.5898" estimatedDuration="1.62794e+07">
      <!-- All values in nano seconds -->
      <mean value="162549" lowerBound="162235" upperBound="163277" ci="0.95"/>
      <standardDeviation value="2305.52" lowerBound="1253.62" upperBound="4617.02" ci="0.95"/>
      <outliers variance="0.0750291" lowMild="0" lowSevere="0" highMild="1" highSevere="8"/>
    </BenchmarkResults>
    <BenchmarkResults name="AddKernel" samples="100" resamples="100000" iterations="1" clockResolution="32.5898" estimatedDuration="2.07714e+07">
      <!-- All values in nano seconds -->
      <mean value="208480" lowerBound="207810" upperBound="209855" ci="0.95"/>
      <standardDeviation value="4688.39" lowerBound="2825.43" upperBound="8760.11" ci="0.95"/>
      <outliers variance="0.161071" lowMild="0" lowSevere="0" highMild="1" highSevere="6"/>
    </BenchmarkResults>
    <BenchmarkResults name="MulKernel" samples="100" resamples="100000" iterations="1" clockResolution="32.5898" estimatedDuration="1.19507e+07">
      <!-- All values in nano seconds -->
      <mean value="122044" lowerBound="121594" upperBound="122670" ci="0.95"/>
      <standardDeviation value="2676.42" lowerBound="2061.12" upperBound="3957.46" ci="0.95"/>
      <outliers variance="0.151683" lowMild="0" lowSevere="0" highMild="1" highSevere="1"/>
    </BenchmarkResults>
    <BenchmarkResults name="TriadKernel" samples="100" resamples="100000" iterations="1" clockResolution="32.5898" estimatedDuration="2.11882e+07">
      <!-- All values in nano seconds -->
      <mean value="212046" lowerBound="211493" upperBound="212948" ci="0.95"/>
      <standardDeviation value="3544.51" lowerBound="2546.6" upperBound="5707.04" ci="0.95"/>
      <outliers variance="0.094408" lowMild="0" lowSevere="0" highMild="1" highSevere="9"/>
    </BenchmarkResults>
    <OverallResult success="true" skips="0">
      <StdOut>
========================
Time:2024-07-01 14:21:45

Using alpaka accelerator: AccCpuSerial&lt;1,unsigned int>
Data type: f single-precision
Data size: 1048576 elements
Work division: {gridBlockExtent: (1048576), blockThreadExtent: (1), threadElemExtent: (1)}
      </StdOut>
    </OverallResult>
  </TestCase>

Copy link
Contributor Author

@mehmetyusufoglu mehmetyusufoglu Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I will use "INFO" macro of catch2. Good idea thanks. It turned out to be -r json also works for catch2 3.5.2. I am doing it for json, more readable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json reporter seem to not include benchmark results only records REQUIRE results etc. XML reporter prints benchmark results as expected.

Copy link
Contributor Author

@mehmetyusufoglu mehmetyusufoglu Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only WARN has worked to log output other than using cout. I will use WARN. Because others only workd when there is REQUIRE/CHECK and it fails.

TEST_CASE("TESTCASE tesing trial", "[testcase-test]")
{
    std::string dataType = "floating-point";
    std::string dataSize =   std::to_string(100000);
    int arraySize=1234;
    WARN(dataType);  // This will be included as custom XML in the report
    INFO("\n normal info Data type is" << dataType);
    UNSCOPED_INFO("unscoped Data type is" << dataType);
    INFO("info example");
    UNSCOPED_INFO("unscoped example");
    INFO("normal info array size is" << arraySize);
    UNSCOPED_INFO("unscoped array size is" << arraySize);
    INFO("normal info data size is" << dataSize);
    UNSCOPED_INFO("unscoped data size is" << dataSize);
    CAPTURE(dataSize);   
    CAPTURE(dataType);
  // !!!!
    CHECK(arraySize!=1234); // If this fails all INFO, capture and UNSCOPED_INFO prints. Otherwise only WARN prints
    SUCCEED("SUCCEED");  
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need additional operation to extract data from fields. May be I should use cout output directly and convert it to json by a script instead of dealing with this...

   <Warning filename="/home/yusufo81/projects/alpaka-dir/alpaka/benchmarks/babelstream/src/benchmarkTest.cpp" line="146">
      TimeStamp:2024-07-09 15:06:59
    </Warning>
    <Warning filename="/home/yusufo81/projects/alpaka-dir/alpaka/benchmarks/babelstream/src/benchmarkTest.cpp" line="147">
      DataSize:1048576
    </Warning>
    <Warning filename="/home/yusufo81/projects/alpaka-dir/alpaka/benchmarks/babelstream/src/benchmarkTest.cpp" line="148">
      DataSize:single-precision
    </Warning>

}
}

std::string getCurrentTimestamp();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the forward declaration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a warning at one compiler "warning: no previous declaration for ‘std::string getCurrentTimestamp()’ [-Wmissing-declarations]" therefore i added a declaration. Alternatively, I can make the function static as well. Somehow one of the compilers forced for declaration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the function static. Like functions in some other tests: e.g. alpaka/test/unit/mem/buf/src/BufTest.cpp

benchmarks/babelstream/src/benchmarkTest.cpp Outdated Show resolved Hide resolved
benchmarks/babelstream/src/benchmarkTest.cpp Outdated Show resolved Hide resolved
@mehmetyusufoglu mehmetyusufoglu force-pushed the updateBenchmarkBabelStr branch 11 times, most recently from 897c296 to 97286cc Compare July 10, 2024 11:05
@@ -0,0 +1,433 @@
#pragma once
#include <boost/tokenizer.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is part of boost compile time header.
If not this will introduce additional dependencies for alpaka, at least benchmarking.
Please remove the include and use best stl to split strings, I assume this is the reason why it is included.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI is maybe passing but the reason is only that all boost packages are installed. I tried it on our development system HAL

error:

 fatal error: boost/tokenizer.hpp: No such file or directory

Copy link
Contributor Author

@mehmetyusufoglu mehmetyusufoglu Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i used stl removed boost tokenizer after seeing the CI fail. Now removed the header too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it, the include can be removed because it is not used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed yes, i found CI error in the morning and removed boost tokenizer usage; but forgot tto remove header. now removed.

@psychocoderHPC
Copy link
Member

@mehmetyusufoglu The CMake file for the benchmark is missing to include catch in case you compile without -DBUILD_TESTING=ON and only with -Dalpaka_BUILD_BENCHMARKS=ON compiling fails due to missing catch2 headers.

@psychocoderHPC
Copy link
Member

psychocoderHPC commented Sep 3, 2024

~~The output on the terminal should looks like the old output. ~~
This helps external users if they compare this bable stream against other upstream implementations.
The user should not search in the code if we use min/max or average for the results.

Sry my fold, the output is nearly equal, only MB is now GB/s

@psychocoderHPC
Copy link
Member

psychocoderHPC commented Sep 3, 2024

The output on the terminal should looks like the old output. This helps external users if they compare this bable stream against other upstream implementations. The user should not search in the code if we use min/max or average for the results.

Yes it is the same output. Additionally, i can add that we used minimum time as an extra comment. ok.

I updated my cmment above.

i can add that we used minimum time as an extra comment. ok.

Is this different compared too the original code?

  • If not not need to add the comment
  • If yes, it should show us ethe same as the old code

@mehmetyusufoglu
Copy link
Contributor Author

mehmetyusufoglu commented Sep 3, 2024

The output on the terminal should looks like the old output. This helps external users if they compare this bable stream against other upstream implementations. The user should not search in the code if we use min/max or average for the results.

Yes it is the same output. Additionally, i can add that we used minimum time as an extra comment. ok.

I updated my cmment above.

i can add that we used minimum time as an extra comment. ok.

Is this different compared too the original code? If not no need to add the comment

No, same as original. It was not stated in the output. Therefore i did not mention. i will not state in the output as well so not change the code.

@mehmetyusufoglu
Copy link
Contributor Author

@mehmetyusufoglu The CMake file for the benchmark is missing to include catch in case you compile without -DBUILD_TESTING=ON and only with -Dalpaka_BUILD_BENCHMARKS=ON compiling fails due to missing catch2 headers.

@mehmetyusufoglu
Copy link
Contributor Author

mehmetyusufoglu commented Sep 5, 2024

The output on the terminal should looks like the old output. This helps external users if they compare this bable stream against other upstream implementations. The user should not search in the code if we use min/max or average for the results.

Yes it is the same output. Additionally, i can add that we used minimum time as an extra comment. ok.

I updated my cmment above.

i can add that we used minimum time as an extra comment. ok.

Is this different compared too the original code?

* If not not need to add the comment

* If yes, it should show us ethe same as the old code

It is the same with the previous code's output. Besides there are 4 additional fields as info: 1. Acc type, 2. data item size, 3. workdiv and 4. correct data-throughput for each kernel.

@mehmetyusufoglu
Copy link
Contributor Author

@mehmetyusufoglu The CMake file for the benchmark is missing to include catch in case you compile without -DBUILD_TESTING=ON and only with -Dalpaka_BUILD_BENCHMARKS=ON compiling fails due to missing catch2 headers.

* I am investigating but that was not creating a problem in CI. Here everything passed. https://gitlab.com/hzdr/crp/alpaka/-/commit/472bfed5d7045b6a94b04e9913b252ce1fd5e3dc

* Locally no problem observed.

All tests pass at gitlab and github. No problem.

@psychocoderHPC
Copy link
Member

@mehmetyusufoglu please amend and force push this PR again, the CI is fixed

@psychocoderHPC
Copy link
Member

psychocoderHPC commented Sep 11, 2024

-Dalpaka_BUILD_BENCHMARKS=ON

The problem still exists. As I wrote if you only enable benchmarking catch will not found if you not provide an installed catch version externally.

cmake ~/workspace/alpaka -Dalpaka_ACC_GPU_CUDA_ENABLE=ON -Dalpaka_ACC_GPU_CUDA_ONLY_MODE=ON -Dalpaka_BUILD_BENCHMARKS=ON
alpaka/benchmarks/babelstream/src/babelStreamMainTest.cpp:3:10: fatal error: catch2/catch_session.hpp: No such file or directory
    3 | #include "catch2/catch_session.hpp"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
make[2]: *** [benchmarks/babelstream/CMakeFiles/babelstream.dir/build.make:77: benchmarks/babelstream/CMakeFiles/babelstream.dir/src/babelStreamMainTest.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:158: benchmarks/babelstream/CMakeFiles/babelstream.dir/all] Error 2

Copy link
Member

@psychocoderHPC psychocoderHPC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2299 (comment) must be fixed before we can merge this PR.
This error will not show up in the CI because we enable tests and benchmarking in the CI always together.

@mehmetyusufoglu
Copy link
Contributor Author

#2299 (comment) must be fixed before we can merge this PR. This error will not show up in the CI because we enable tests and benchmarking in the CI always together.

@SimeonEhrig my latest commit would not solve the issue i think. What is your opinion?

@mehmetyusufoglu mehmetyusufoglu force-pushed the updateBenchmarkBabelStr branch 4 times, most recently from 8fde9e5 to 93fccb0 Compare September 12, 2024 11:07
@SimeonEhrig
Copy link
Member

#2299 (comment) must be fixed before we can merge this PR. This error will not show up in the CI because we enable tests and benchmarking in the CI always together.

@SimeonEhrig my latest commit would not solve the issue i think. What is your opinion?

Something get broken. If I run cmake -Dalpaka_BUILD_BENCHMARKS=ON or cmake -DBUILD_TESTING=ON I get the following error message:

CMake Error at thirdParty/CMakeLists.txt:42 (find_package):
  Could not find a package configuration file provided by "Catch2" (requested
  version 3.5.2) with any of the following names:

    Catch2Config.cmake
    catch2-config.cmake

  Add the installation prefix of "Catch2" to CMAKE_PREFIX_PATH or set
  "Catch2_DIR" to a directory containing one of the above files.  If "Catch2"
  provides a separate development package or SDK, be sure it has been
  installed.

This CMake path should be used, if the CMake variable alpaka_USE_INTERNAL_CATCH2 is OFF. But setting -Dalpaka_USE_INTERNAL_CATCH2=ON also does not solve the problem.

CMakeLists.txt Outdated
cmake_dependent_option(alpaka_USE_INTERNAL_CATCH2 "Use internally shipped Catch2" ON BUILD_TESTING OFF)
cmake_dependent_option(alpaka_USE_INTERNAL_CATCH2 "Use internally shipped Catch2" ON alpaka_BUILD_BENCHMARKS OFF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you must merge this line and the line above see https://cmake.org/cmake/help/latest/module/CMakeDependentOption.html

If you add two lines the last line will have the highest priority.

CMakeLists.txt Outdated
@@ -48,7 +48,9 @@ option(alpaka_INSTALL_TEST_HEADER "Install headers of the namespace alpaka::test

include(CMakeDependentOption)
cmake_dependent_option(alpaka_CHECK_HEADERS "Check all alpaka headers as part of the tests whether they can be compiled standalone." OFF BUILD_TESTING OFF)
cmake_dependent_option(alpaka_CHECK_HEADERS "Check all alpaka headers as part of the benchmarks whether they can be compiled standalone." OFF alpaka_BUILD_BENCHMARKS OFF)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you must merge this line and the line above see https://cmake.org/cmake/help/latest/module/CMakeDependentOption.html

If you add two lines the last line will have the highest priority.

Copy link
Contributor Author

@mehmetyusufoglu mehmetyusufoglu Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I have added an additional local variable as (alpaka_BUILD_BENCHMARKS OR BUILD_TESTING).

But; Benchmarks were locally not compilable and i got the error,
/home/yusufo81/projects/alpaka-dir/alpaka/benchmarks/babelstream/src/babelStreamMainTest.cpp:5:10: fatal error: alpaka/alpaka.hpp: No such file or directory 5 | #include <alpaka/alpaka.hpp> | ^~~~~~~~~~~~~~~~~~~

  • to solve this i had to change the condition to call add_subdirectory("test/") at line 167. But with this change, setting benchmark cmake variable became the same thing of setting BUILD_TESTING. It is a kind of work-around.
    \# Only build the tests if alpaka is the top-level project and BUILD_TESTING or alpaka_BUILD_BENCHMARKS is ON \# Note: alpaka_BUILD_BENCHMARKS is added with an OR besides BUILD_TESTING below. \# Because the benchmark which uses catch2 is not compilable locally without adding test directory as below \# or without setting BUILD_TESTING cmake variable to ON. See PR2299.
if(CMAKE_PROJECT_NAME STREQUAL PROJECT_NAME AND (BUILD_TESTING OR alpaka_BUILD_BENCHMARKS))
    add_subdirectory("test/")
endif()

CMakeLists.txt Outdated
# Note: alpaka_BUILD_BENCHMARKS is added with an OR besides BUILD_TESTING below.
# Because the benchmark which uses catch2 is not compilable locally without adding test directory as below
# or without setting BUILD_TESTING cmake variable to ON. See PR2299.
if(CMAKE_PROJECT_NAME STREQUAL PROJECT_NAME AND (BUILD_TESTING OR alpaka_BUILD_BENCHMARKS))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do not want to build testing if benchmark only is enabled.

@psychocoderHPC
Copy link
Member

I will provide a fix after lunch!

Co-authored-by: René Widera <[email protected]>
Co-authored-by: Simeon Ehrig <[email protected]>
@psychocoderHPC psychocoderHPC merged commit a017c60 into alpaka-group:develop Sep 17, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants