Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bench] IO Encode Benchmark #5

Merged
merged 1 commit into from
Apr 22, 2024

Conversation

CaiusDai
Copy link
Collaborator

@CaiusDai CaiusDai commented Apr 21, 2024

What I have done in this PR

  • Benchmarks for File IO, Encoding & Decoding written in C++

Note

This file largely referenced existing benchmarks in arrow-parquet. Two major sources are arrow/cpp/src/parquet/encoding_benchmarks.cc and arrow/cpp/src/parquet/column_io_benchmarks.cc.
The reasons I write a new file instead of using the original benchmarks are:

  1. To support ratio comparison, input data need to be the same. We need a general method to generate data. (Original benchmarks generate data using different methods)
  2. Original column io benchmark only tested cases for Int64 Plain encoding.
  3. We can now run benchmark in one go.

Warning

  1. In order to run the benchmark, we need to either write a new CMakeLists or incorporate this file in arrow source code, the correct folder to place it is thirdparty/arrow/cpp/src/parquet/. Also, thirdparty/arrow/cpp/src/parquet/CMakeLists.txt needs to add this benchmark file by adding the following command at line 433.
add_parquet_benchmark(io_encode_benchmark SOURCES io_encode_benchmark.cc
                      benchmark_util.cc)
  1. Current build_third_party.sh disabled the parquet benchmark generation. On my local machine, I successfully build the benchmarks by executing the following code in arrow/cpp/build.
cmake .. -DARROW_PARQUET=ON \
      -DARROW_OPTIONAL_INSTALL=ON \
      -DARROW_BUILD_BENCHMARKS=ON
make parquet-benchmarks 

In docker environment, I tried to use same command to build benchmarks using ninja parquet-benchmarks but the process will fail in linking stage.

Relating Issues:

#6
#7

@CaiusDai CaiusDai added help wanted Extra attention is needed in progress This issue is currently under active development. Work has been started to address this issue, but i labels Apr 21, 2024
@CaiusDai CaiusDai requested a review from TatianaJin April 21, 2024 06:48
@TatianaJin TatianaJin merged commit 898fdd6 into TatianaJin:benchmark Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed in progress This issue is currently under active development. Work has been started to address this issue, but i
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants