Skip to content

Commit

Permalink
Merges advanced benchmarking branch
Browse files Browse the repository at this point in the history
Major changes
    + Introduces benchmarking with chrono
    + Sets constant time benchmarks execution
  • Loading branch information
Arseny Aprelev committed Apr 21, 2016
2 parents ece37d7 + 351b2af commit c8351d9
Show file tree
Hide file tree
Showing 9 changed files with 234 additions and 145 deletions.
60 changes: 51 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,30 +11,72 @@ New cipher itself has a structure of SP-network with fixed byte-to-byte S-box an

compared to its predecessor GOST 28147.

This project provides several C99 versions of implementation with minimal or no dependencies while achieving high performance:
This repository hosts C99 library `libgost15` which provides three (interchangeable) versions of basic implementations:

* compact implementation,
* optimised implementation,
* SIMD implementation.

### Performance

Performance is measured by a separate tool residing in benchmark subdirectory. It links with `libgost15` and measures speed of these operations:

* block encryption,
* block decryption.

All functions provided by `libgost15` are thread-safe thus measuring takes place in single thread.

#### Benchmark data (Intel Core i5 Sandy Bridge @ 2.60 GHz, single core)

| Operation | `compact` | `optimised` | `SIMD` |
|:---------------- |:----------- |:------------- |:------------- |
| Block encryption | 4.4321 MB/s | 100.8338 MB/s | 158.8720 MB/s |
| Block decryption | 4.3837 MB/s | 102.0845 MB/s | 157.5190 MB/s |

#### Benchmark data (Intel Core i7-2677M Sandy Bridge @ 1.80 GHz, single core)

| Operation | `compact` | `optimised` | `SIMD` |
|:---------------- |:----------- |:------------- |:------------- |
| Block encryption | 1.2840 MB/s | 62.6575 MB/s | 112.2875 MB/s |
| Block decryption | 1.2676 MB/s | 64.4036 MB/s | 114.6625 MB/s |


### Implementations

#### Compact implementation

Straightforward implementation of block encryption and decryption routines, with little or no major optimisations. Has lowest memory requirements.
Straightforward implementation of block encryption and decryption routines, with little or no major optimisations. Has lowest memory requirements. Does not require SSE instructions.

Why use this and not [official TC26 implementation](http://tc26.ru/standard/gost/PR_GOSTR_bch_v6.zip)?

* It works on any platform, not just Windows.
* All sixteen R transformations are merged into single L transformation thus cutting out rotations.
* Better grammar and code organisation.

This implementation is build by default and it does not require any special predefined variables.

#### Optimised implementation

To use optimised implementation, define `USE_OPTIMISED_IMPLEMENTATION` environment variable before compiling.
Optimised implementation employs vector-by-matrix multiplication precomutation technique described in [no link yet], similar to one in 64KB versions of AES. This implementation is much faster that the compact one, but requires 128KB os additional memory in data segment for storing precomputed tables. Does not require SSE instructions.

Optimised implementation employs vector-by-matrix multiplication precomutation technique described in [add link], similar to one in 64KB versions of AES. This implementation is much faster that the compact one, but requires 128KB os additional memory in data segment for storing precomputed tables.
To use optimised implementation, define `ENABLE_PRECALCULATIONS` environment variable before building:

```
cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_PRECALCULATIONS=ON ...
```

#### SIMD implementation

SIMD implementation automatically enables when `USE_OPTIMISED_IMPLEMENTATION` is defined and Intel (at least) SSE2 instruction set is supported by processor.
SIMD implementation utilises SSE instruction set, a set of extended processor instructions which enable one to operate over 128-bit XMM registers, thus further speeding up optimised implementation. Requires SSE2 or higher.

SIMD implementation utilises SSE instruction set, a set of extended processor instructions which enable one to operate over 128-bit XMM registers. Combined with vector-by-matrix multiplication, SSE instructions help to achieve incredible performance.
To use optimised implementation, define both `ENABLE_PRECALCULATIONS` and `ENABLE_SIMD` environment variables before building:

### Portability
```
cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_PRECALCULATIONS=ON -DENABLE_SIMD=ON ...
```

Source code is by no means portable on all platforms out-of-the-box, though it should be fairly easy to port compact version of implementation on any platform with a few minor tweaks.
Future versions of `libgost15` might enable this implementation version by default when optimised version is selected and SSE instruction set (SSE2+) is available.

### Portability

Porting optimised and SIMD versions on platform with a different endianness requires rotating each vector in precalculated long tables.
I am working as hard as I can to make this code portable and test it on as many platforms as I can. You are welcome to contribute.
4 changes: 2 additions & 2 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
cmake_minimum_required(VERSION 3.2)

## gosthopper project declaration
project(gosthopper VERSION 0.3.6 LANGUAGES C)
## libgost15-lib project declaration
project(aprelev-libgost15 VERSION 0.3.6)

## libgost15 library and selftests
add_subdirectory(libgost15)
Expand Down
10 changes: 8 additions & 2 deletions src/benchmark/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,11 @@
## ##
##########################################################################################

add_executable(benchmark src/benchmark.c)
target_link_libraries(benchmark gosthopper::libgost15)
cmake_minimum_required(VERSION 3.2)

## libgost15 project declaration
project(libgost15-benchmark VERSION 0.3.5 LANGUAGES CXX)

add_executable(benchmark src/benchmark.cpp)
set_target_properties(benchmark PROPERTIES CXX_STANDARD 11)
target_link_libraries(benchmark libgost15)
121 changes: 0 additions & 121 deletions src/benchmark/src/benchmark.c

This file was deleted.

158 changes: 158 additions & 0 deletions src/benchmark/src/benchmark.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
#include <iostream>
#include <chrono>
#include <thread>
#include <iomanip>
#include <sstream>
#include <random>
#include <libgost15/gost15.h>

const auto defaultDuration = std::chrono::duration<double, std::milli>(2000.);

enum units_t {
kilobitsPerSecond
};


static void generateRandomBytes(uint8_t *bytes, size_t numberOfBytes) {
std::random_device device_;
std::mt19937 engine_(device_());
std::uniform_int_distribution<int> distribution_(0x00, 0xff);
auto generator_ = std::bind(distribution_, engine_);

std::generate_n(bytes, numberOfBytes, generator_);
};


static std::string reportPerformance(std::string operation, std::string performance, bool isInProgress = false) {
std::string result_ = std::string(80, ' ');

if (!isInProgress) {
std::copy(operation.begin(), operation.end(), result_.begin() + 3);
std::copy(performance.begin(), performance.end(), result_.begin() + 55);
result_[79] = '\n';
}
else {
result_[1] = '.';
std::copy(operation.begin(), operation.end(), result_.begin() + 3);
result_[79] = '\r';
}

return result_;
}


static std::string toHumanReadable(double performance, enum units_t units) {
std::ostringstream stream_;

switch (units) {
case kilobitsPerSecond: {
stream_ << std::fixed;
stream_ << std::setprecision(4);

if (performance >= 1100.) {
stream_ << performance / 1000;
stream_ << " ";
stream_ << "MB/s";
}
else {
stream_ << performance;
stream_ << " ";
stream_ << "kB/s";
}
}
break;
default:
break;
}

return stream_.str();
}


void benchmarkEncryption(std::chrono::duration<double, std::milli> minimumDuration) {
std::string operation_ = "Block encryption";
std::chrono::duration<double, std::milli> duration_(.0);
double kBPerSecond_ = .0;

/* Resources allocation. */
uint8_t *roundKeys_ = new uint8_t[BlockLengthInBytes * NumberOfRounds];
uint8_t *block_ = new uint8_t[BlockLengthInBytes];

/* Initialisation. */
generateRandomBytes(roundKeys_, sizeof roundKeys_);
generateRandomBytes(block_, sizeof block_);

/* Measurement-in-progress output. */
std::cout << reportPerformance(operation_, "", true);

/* Measurement cycle. */
for (size_t iterations_ = 1; duration_ < minimumDuration; iterations_ *= 2) {
auto startedAt_ = std::chrono::high_resolution_clock::now();

for (size_t iterationIndex_ = 0; iterationIndex_ < iterations_; ++iterationIndex_) {
encryptBlock(roundKeys_, block_);
}

auto finishedAt_ = std::chrono::high_resolution_clock::now();
duration_ = finishedAt_ - startedAt_;
kBPerSecond_ = (iterations_ * BlockLengthInBytes) / (duration_.count());
}

/* Result output. */
std::cout << reportPerformance(operation_, toHumanReadable(kBPerSecond_, kilobitsPerSecond));

/* Resources releasing. */
delete[] roundKeys_;
delete[] block_;
}


void benchmarkDecryption(std::chrono::duration<double, std::milli> minimumDuration) {
std::string operation_ = "Block decryption";
std::chrono::duration<double, std::milli> duration_(.0);
double kBPerSecond_ = .0;

/* Resources allocation. */
uint8_t *roundKeys_ = new uint8_t[BlockLengthInBytes * NumberOfRounds];
uint8_t *block_ = new uint8_t[BlockLengthInBytes];

/* Initialisation. */
generateRandomBytes(roundKeys_, sizeof roundKeys_);
generateRandomBytes(block_, sizeof block_);

/* Measurement-in-progress output. */
std::cout << reportPerformance(operation_, "", true);

/* Measurement cycle. */
for (size_t iterations_ = 1; duration_ < minimumDuration; iterations_ *= 2) {
auto startedAt_ = std::chrono::high_resolution_clock::now();

for (size_t iterationIndex_ = 0; iterationIndex_ < iterations_; ++iterationIndex_) {
encryptBlock(roundKeys_, block_);
}

auto finishedAt_ = std::chrono::high_resolution_clock::now();
duration_ = finishedAt_ - startedAt_;
kBPerSecond_ = (iterations_ * BlockLengthInBytes) / (duration_.count());
}

/* Result output. */
std::cout << reportPerformance(operation_, toHumanReadable(kBPerSecond_, kilobitsPerSecond));

/* Resources releasing. */
delete[] roundKeys_;
delete[] block_;
}


int main() {
std::cout << " ---------------------------------------------------------------------------- " << std::endl;
std::cout << " libgost15 operation performance " << std::endl;
std::cout << " ---------------------------------------------------------------------------- " << std::endl;

benchmarkEncryption(defaultDuration);
benchmarkDecryption(defaultDuration);

std::cout << " ---------------------------------------------------------------------------- " << std::endl;
return 0;
}
Loading

0 comments on commit c8351d9

Please sign in to comment.