From ada85736c2c65053b66de7c579be7d3939052d96 Mon Sep 17 00:00:00 2001 From: Arseny Aprelev Date: Wed, 20 Apr 2016 15:53:55 +0300 Subject: [PATCH 1/6] Updates README Major changes + Introduces libgost15 library + Introduces new benchmarking engine Minor changes + Adds benchmark data on Intel Core i5 Sandy Bridge @ 2.6 GHz + Changes implementation selecting environment variables + Adds more detailed description on implementations --- README.md | 58 +++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 46 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index 621a517..56d3653 100644 --- a/README.md +++ b/README.md @@ -12,30 +12,64 @@ New cipher itself has a structure of SP-network with fixed byte-to-byte S-box an compared to its predecessor GOST 28147. -This project provides several C99 versions of implementation with minimal or no dependencies while achieving high performance: +This repository hosts C99 library `libgost15` which provides three (interchangeable) versions of basic implementations: * compact implementation, * optimised implementation, * SIMD implementation. -#### Compact implementation +### Performance -Straightforward implementation of block encryption and decryption routines, with little or no major optimisations. Has lowest memory requirements. +Performance is measured by a separate tool residing in benchmark subdirectory. It links with `libgost15` and measures speed of these operations: -#### Optimised implementation +* block encryption, +* block decryption. -To use optimised implementation, define `USE_OPTIMISED_IMPLEMENTATION` environment variable before compiling. +All functions provided by `libgost15` are thread-safe thus measuring takes place in single thread. -Optimised implementation employs vector-by-matrix multiplication precomutation technique described in [add link], similar to one in 64KB versions of AES. This implementation is much faster that the compact one, but requires 128KB os additional memory in data segment for storing precomputed tables. +##### Benchmark data (Intel Core i5 Sandy Bridge @ 2.6 GHz, single core) -#### SIMD implementation +| Operation | `compact` | `optimised` | `SIMD` | +|:---------------- |:----------- |:------------- |:------------- | +| Block encryption | 4.4321 MB/s | 100.8338 MB/s | 158.8720 MB/s | +| Block decryption | 4.3837 MB/s | 102.0845 MB/s | 157.5190 MB/s | -SIMD implementation automatically enables when `USE_OPTIMISED_IMPLEMENTATION` is defined and Intel (at least) SSE2 instruction set is supported by processor. +### Implementations -SIMD implementation utilises SSE instruction set, a set of extended processor instructions which enable one to operate over 128-bit XMM registers. Combined with vector-by-matrix multiplication, SSE instructions help to achieve incredible performance. +##### Compact implementation -### Portability +Straightforward implementation of block encryption and decryption routines, with little or no major optimisations. Has lowest memory requirements. Does not require SSE instructions. + +Why use this and not [official TC26 implementation](http://tc26.ru/standard/gost/PR_GOSTR_bch_v6.zip)? + +* It works on any platform, not just Windows. +* All sixteen R transformations are merged into single L transformation thus cutting out rotations. +* Better grammar and code organisation. + +This implementation is build by default and it does not require any special predefined variables. + +##### Optimised implementation + +Optimised implementation employs vector-by-matrix multiplication precomutation technique described in [no link yet], similar to one in 64KB versions of AES. This implementation is much faster that the compact one, but requires 128KB os additional memory in data segment for storing precomputed tables. Does not require SSE instructions. + +To use optimised implementation, define `ENABLE_PRECALCULATIONS` environment variable before building: -Source code is by no means portable on all platforms out-of-the-box, though it should be fairly easy to port compact version of implementation on any platform with a few minor tweaks. +``` +cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_PRECALCULATIONS=ON ... +``` + +##### SIMD implementation + +SIMD implementation utilises SSE instruction set, a set of extended processor instructions which enable one to operate over 128-bit XMM registers, thus further speeding up optimised implementation. Requires SSE2 or higher. + +To use optimised implementation, define both `ENABLE_PRECALCULATIONS` and `ENABLE_SIMD` environment variables before building: + +``` +cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_PRECALCULATIONS=ON -DENABLE_SIMD=ON ... +``` + +Future versions of `libgost15` might enable this implementation version by default when optimised version is selected and SSE instruction set (SSE2+) is available. + +### Portability -Porting optimised and SIMD versions on platform with a different endianness requires rotating each vector in precalculated long tables. +I am working as hard as I can to make this code portable and test it on as many platforms as I can. You are welcome to contribute. From 75d75b06e708e743e269841105e6224ebe1e24d5 Mon Sep 17 00:00:00 2001 From: Arseny Aprelev Date: Wed, 20 Apr 2016 21:37:24 +0300 Subject: [PATCH 2/6] Fixes libgost15 linkage Major updates + Provides extern C linkage for mixing with C++ projects Minor updates + Removes redundant operation modes enumeration --- src/libgost15/include/libgost15/gost15.h | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/src/libgost15/include/libgost15/gost15.h b/src/libgost15/include/libgost15/gost15.h index 2438e05..1f394a0 100644 --- a/src/libgost15/include/libgost15/gost15.h +++ b/src/libgost15/include/libgost15/gost15.h @@ -13,15 +13,12 @@ enum { KeyLengthInBytes = 256 / 8, }; -enum operationMode_t { - ECB, - CBC, - CFB, - OFB -}; - extern const size_t WorkspaceOfScheduleRoundKeys; +#ifdef __cplusplus +extern "C" { +#endif + void encryptBlock( const void *roundKeys, void *block @@ -43,4 +40,8 @@ void scheduleDecryptionRoundKeys( void *memory ); +#ifdef __cplusplus +} +#endif + #endif From 0ccd3f07a74247cfdbfdea67e3061d7738d5b470 Mon Sep 17 00:00:00 2001 From: Arseny Aprelev Date: Wed, 20 Apr 2016 21:42:38 +0300 Subject: [PATCH 3/6] Removes old title references Minor changes + Removes references to gosthopper title + Wraps c_restrict compiler feature in if guard Comments + Wrapping c_restrict requirement avoids bug of CMake 3.2 when CMake couldn't detect compiler features of GCC 5.3.0 even though required c_restrict feature is present in compiler. This issue was fixed in CMake 3.4, maybe it is worth to uplift the minimum acceptable version of CMake. --- src/CMakeLists.txt | 4 ++-- src/libgost15/CMakeLists.txt | 7 +++++-- src/libgost15/tests/CMakeLists.txt | 2 +- 3 files changed, 8 insertions(+), 5 deletions(-) diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 36b2f8e..bef109e 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -1,7 +1,7 @@ cmake_minimum_required(VERSION 3.2) -## gosthopper project declaration -project(gosthopper VERSION 0.3.6 LANGUAGES C) +## libgost15-lib project declaration +project(aprelev-libgost15 VERSION 0.3.6) ## libgost15 library and selftests add_subdirectory(libgost15) diff --git a/src/libgost15/CMakeLists.txt b/src/libgost15/CMakeLists.txt index 6733fb5..d83ca87 100644 --- a/src/libgost15/CMakeLists.txt +++ b/src/libgost15/CMakeLists.txt @@ -15,7 +15,6 @@ option(ENABLE_SIMD "Enable SIMD code optimisations" ## libgost15 definition add_library(libgost15) -add_library(gosthopper::libgost15 ALIAS libgost15) ## libgost15 source files if (ENABLE_PRECALCULATIONS AND ENABLE_SIMD) @@ -33,7 +32,11 @@ endif() target_include_directories(libgost15 PUBLIC $) target_include_directories(libgost15 PUBLIC $) target_include_directories(libgost15 PRIVATE src) -target_compile_features(libgost15 PUBLIC c_restrict) + +## libgost15 compile feature: c_restrict +if (CMAKE_C_COMPILE_FEATURES) + target_compile_features(libgost15 PUBLIC c_restrict) +endif() ## Falling back to strict C standard set_target_properties(libgost15 PROPERTIES C_EXTENSIONS OFF) diff --git a/src/libgost15/tests/CMakeLists.txt b/src/libgost15/tests/CMakeLists.txt index d4bbb0a..bdb5847 100644 --- a/src/libgost15/tests/CMakeLists.txt +++ b/src/libgost15/tests/CMakeLists.txt @@ -8,7 +8,7 @@ add_executable(selftests_gost selftests_gost.c) ## Linking libgost15 -target_link_libraries(selftests_gost PUBLIC gosthopper::libgost15) +target_link_libraries(selftests_gost PUBLIC libgost15) ## Enabling and adding tests enable_testing() From 8331670a16def0f1f876f993153a5042450dbf2f Mon Sep 17 00:00:00 2001 From: Arseny Aprelev Date: Wed, 20 Apr 2016 21:48:39 +0300 Subject: [PATCH 4/6] Adds new benchmark engine Major changes + Crude benchmark now() function was replaced with modern C++11 chrono high-resolution timer, which is (hopefully) more portable and provides more human-readable benchmark data such as throughput speed of encryption in kilobytes per second, rather than cycles per byte. + Randomises plaintext and round keys for encryption and decryption benchmarks Minor changes + Redesigns benchmark output format Comments + Forces CMake to use C++11 standard, which is generally a bad practice (obsolete) --- src/benchmark/CMakeLists.txt | 10 +- src/benchmark/src/benchmark.c | 121 ------------------------ src/benchmark/src/benchmark.cpp | 158 ++++++++++++++++++++++++++++++++ 3 files changed, 166 insertions(+), 123 deletions(-) delete mode 100644 src/benchmark/src/benchmark.c create mode 100644 src/benchmark/src/benchmark.cpp diff --git a/src/benchmark/CMakeLists.txt b/src/benchmark/CMakeLists.txt index 87f026a..1ad6dd5 100644 --- a/src/benchmark/CMakeLists.txt +++ b/src/benchmark/CMakeLists.txt @@ -4,5 +4,11 @@ ## ## ########################################################################################## -add_executable(benchmark src/benchmark.c) -target_link_libraries(benchmark gosthopper::libgost15) \ No newline at end of file +cmake_minimum_required(VERSION 3.2) + +## libgost15 project declaration +project(libgost15-benchmark VERSION 0.3.5 LANGUAGES CXX) + +add_executable(benchmark src/benchmark.cpp) +set_target_properties(benchmark PROPERTIES CXX_STANDARD 11) +target_link_libraries(benchmark libgost15) \ No newline at end of file diff --git a/src/benchmark/src/benchmark.c b/src/benchmark/src/benchmark.c deleted file mode 100644 index 995a3bf..0000000 --- a/src/benchmark/src/benchmark.c +++ /dev/null @@ -1,121 +0,0 @@ -#include -#include -#include - -#if defined __APPLE__ - #include -#elif defined COMPILER_MSVC && !defined _M_IX86 - extern "C" uint64_t __rdtsc(); -#elif !defined _WIN32 - #include -#endif - - -int64_t getCyclesCount() { -#if defined __APPLE__ - return mach_absolute_time(); -#elif defined __i386__ - int64_t ret; - __asm__ volatile("rdtsc" : "=A"(ret)); - return ret; -#elif defined __x86_64__ || defined __amd64__ - uint64_t low, high; - __asm__ volatile("rdtsc" : "=a"(low), "=d"(high)); - return (high << 32) | low; -#elif defined __powerpc__ || defined __ppc__ - /* This returns a time-base, which is not always precisely a cycle-count. */ - int64_t tbl, tbu0, tbu1; - asm("mftbu %0" : "=r"(tbu0)); - asm("mftb %0" : "=r"(tbl)); - asm("mftbu %0" : "=r"(tbu1)); - tbl &= -static_cast(tbu0 == tbu1); - return (tbu1 << 32) | tbl; -#elif defined __sparc__ - int64_t tick; - asm(".byte 0x83, 0x41, 0x00, 0x00"); - asm("mov %%g1, %0" : "=r"(tick)); - return tick; -#elif defined __ia64__ - int64_t itc; - asm("mov %0 = ar.itc" : "=r"(itc)); - return itc; -#elif defined COMPILER_MSVC && defined _M_IX86 - /* Older MSVC compilers (like 7.x) don't seem to support the - __rdtsc intrinsic properly, so _asm usage is preferred instead. */ - _asm rdtsc -#elif defined COMPILER_MSVC - return __rdtsc(); -#elif defined __aarch64__ - int64_t virtual_timer_value; - asm volatile("mrs %0, cntvct_el0" : "=r"(virtual_timer_value)); - return virtual_timer_value; -#elif defined __ARM_ARCH -#if (__ARM_ARCH >= 6) /* V6 is the earliest arch that has a standard cyclecount. */ - uint32_t pmccntr; - uint32_t pmuseren; - uint32_t pmcntenset; - /* Read the user mode perf monitor counter access permissions. */ - asm("mrc p15, 0, %0, c9, c14, 0" : "=r"(pmuseren)); - if (pmuseren & 1) { /* Allows reading perfmon counters for user mode code? */ - asm("mrc p15, 0, %0, c9, c12, 1" : "=r"(pmcntenset)); - if (pmcntenset & 0x80000000ul) { /* Is it counting? */ - asm("mrc p15, 0, %0, c9, c13, 0" : "=r"(pmccntr)); - /* The counter is set up to count every 64th cycle. */ - return static_cast(pmccntr) * 64; - } - } -#endif - struct timeval tv; - gettimeofday(&tv, nullptr); - return static_cast(tv.tv_sec) * 1000000 + tv.tv_usec; -#elif defined __mips__ - /* mips only allows rdtsc for superusers */ - struct timeval tv; - gettimeofday(&tv, nullptr); - return static_cast(tv.tv_sec) * 1000000 + tv.tv_usec; -#else - #error Define cycle timer for your platform -#endif -} - - -void benchmarkEncryption(unsigned long iterations) { - int64_t startCyclesCounter_, finishCyclesCounter_; - size_t iterationIndex_ = 0; - const uint8_t roundKeys_[NumberOfRounds * BlockLengthInBytes] = { - 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff, 0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, - 0xfe, 0xdc, 0xba, 0x98, 0x76, 0x54, 0x32, 0x10, 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, - 0xdb, 0x31, 0x48, 0x53, 0x15, 0x69, 0x43, 0x43, 0x22, 0x8d, 0x6a, 0xef, 0x8c, 0xc7, 0x8c, 0x44, - 0x3d, 0x45, 0x53, 0xd8, 0xe9, 0xcf, 0xec, 0x68, 0x15, 0xeb, 0xad, 0xc4, 0x0a, 0x9f, 0xfd, 0x04, - 0x57, 0x64, 0x64, 0x68, 0xc4, 0x4a, 0x5e, 0x28, 0xd3, 0xe5, 0x92, 0x46, 0xf4, 0x29, 0xf1, 0xac, - 0xbd, 0x07, 0x94, 0x35, 0x16, 0x5c, 0x64, 0x32, 0xb5, 0x32, 0xe8, 0x28, 0x34, 0xda, 0x58, 0x1b, - 0x51, 0xe6, 0x40, 0x75, 0x7e, 0x87, 0x45, 0xde, 0x70, 0x57, 0x27, 0x26, 0x5a, 0x00, 0x98, 0xb1, - 0x5a, 0x79, 0x25, 0x01, 0x7b, 0x9f, 0xdd, 0x3e, 0xd7, 0x2a, 0x91, 0xa2, 0x22, 0x86, 0xf9, 0x84, - 0xbb, 0x44, 0xe2, 0x53, 0x78, 0xc7, 0x31, 0x23, 0xa5, 0xf3, 0x2f, 0x73, 0xcd, 0xb6, 0xe5, 0x17, - 0x72, 0xe9, 0xdd, 0x74, 0x16, 0xbc, 0xf4, 0x5b, 0x75, 0x5d, 0xba, 0xa8, 0x8e, 0x4a, 0x40, 0x43, - }; - uint8_t block_[BlockLengthInBytes] = { - 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, - }; - - - startCyclesCounter_ = getCyclesCount(); - while (iterationIndex_++ < iterations) { - encryptBlock(roundKeys_, block_); - } - finishCyclesCounter_ = getCyclesCount(); - - printf("\nBenchmark results:\n"); - printf("%lu bytes processed within %lli cycles with average speed of %04.4f cycles/byte.\n", - iterations * BlockLengthInBytes, - finishCyclesCounter_ - startCyclesCounter_, - (double) (finishCyclesCounter_ - startCyclesCounter_) / (iterations * BlockLengthInBytes)); - - return; -} - - -int main(void) { - benchmarkEncryption(1024 * 1024 * 16); - return 0; -} \ No newline at end of file diff --git a/src/benchmark/src/benchmark.cpp b/src/benchmark/src/benchmark.cpp new file mode 100644 index 0000000..0fda530 --- /dev/null +++ b/src/benchmark/src/benchmark.cpp @@ -0,0 +1,158 @@ +#include +#include +#include +#include +#include +#include +#include + +const auto defaultDuration = std::chrono::duration(2000.); + +enum units_t { + kilobitsPerSecond +}; + + +static void generateRandomBytes(uint8_t *bytes, size_t numberOfBytes) { + std::random_device device_; + std::mt19937 engine_(device_()); + std::uniform_int_distribution distribution_(0x00, 0xff); + auto generator_ = std::bind(distribution_, engine_); + + std::generate_n(bytes, numberOfBytes, generator_); +}; + + +static std::string reportPerformance(std::string operation, std::string performance, bool isInProgress = false) { + std::string result_ = std::string(80, ' '); + + if (!isInProgress) { + std::copy(operation.begin(), operation.end(), result_.begin() + 3); + std::copy(performance.begin(), performance.end(), result_.begin() + 55); + result_[79] = '\n'; + } + else { + result_[1] = '.'; + std::copy(operation.begin(), operation.end(), result_.begin() + 3); + result_[79] = '\r'; + } + + return result_; +} + + +static std::string toHumanReadable(double performance, enum units_t units) { + std::ostringstream stream_; + + switch (units) { + case kilobitsPerSecond: { + stream_ << std::fixed; + stream_ << std::setprecision(4); + + if (performance >= 1100.) { + stream_ << performance / 1000; + stream_ << " "; + stream_ << "MB/s"; + } + else { + stream_ << performance; + stream_ << " "; + stream_ << "kB/s"; + } + } + break; + default: + break; + } + + return stream_.str(); +} + + +void benchmarkEncryption(std::chrono::duration minimumDuration) { + std::string operation_ = "Block encryption"; + std::chrono::duration duration_(.0); + double kBPerSecond_ = .0; + + /* Resources allocation. */ + uint8_t *roundKeys_ = new uint8_t[BlockLengthInBytes * NumberOfRounds]; + uint8_t *block_ = new uint8_t[BlockLengthInBytes]; + + /* Initialisation. */ + generateRandomBytes(roundKeys_, sizeof roundKeys_); + generateRandomBytes(block_, sizeof block_); + + /* Measurement-in-progress output. */ + std::cout << reportPerformance(operation_, "", true); + + /* Measurement cycle. */ + for (size_t iterations_ = 1; duration_ < minimumDuration; iterations_ *= 2) { + auto startedAt_ = std::chrono::high_resolution_clock::now(); + + for (size_t iterationIndex_ = 0; iterationIndex_ < iterations_; ++iterationIndex_) { + encryptBlock(roundKeys_, block_); + } + + auto finishedAt_ = std::chrono::high_resolution_clock::now(); + duration_ = finishedAt_ - startedAt_; + kBPerSecond_ = (iterations_ * BlockLengthInBytes) / (duration_.count()); + } + + /* Result output. */ + std::cout << reportPerformance(operation_, toHumanReadable(kBPerSecond_, kilobitsPerSecond)); + + /* Resources releasing. */ + delete[] roundKeys_; + delete[] block_; +} + + +void benchmarkDecryption(std::chrono::duration minimumDuration) { + std::string operation_ = "Block decryption"; + std::chrono::duration duration_(.0); + double kBPerSecond_ = .0; + + /* Resources allocation. */ + uint8_t *roundKeys_ = new uint8_t[BlockLengthInBytes * NumberOfRounds]; + uint8_t *block_ = new uint8_t[BlockLengthInBytes]; + + /* Initialisation. */ + generateRandomBytes(roundKeys_, sizeof roundKeys_); + generateRandomBytes(block_, sizeof block_); + + /* Measurement-in-progress output. */ + std::cout << reportPerformance(operation_, "", true); + + /* Measurement cycle. */ + for (size_t iterations_ = 1; duration_ < minimumDuration; iterations_ *= 2) { + auto startedAt_ = std::chrono::high_resolution_clock::now(); + + for (size_t iterationIndex_ = 0; iterationIndex_ < iterations_; ++iterationIndex_) { + encryptBlock(roundKeys_, block_); + } + + auto finishedAt_ = std::chrono::high_resolution_clock::now(); + duration_ = finishedAt_ - startedAt_; + kBPerSecond_ = (iterations_ * BlockLengthInBytes) / (duration_.count()); + } + + /* Result output. */ + std::cout << reportPerformance(operation_, toHumanReadable(kBPerSecond_, kilobitsPerSecond)); + + /* Resources releasing. */ + delete[] roundKeys_; + delete[] block_; +} + + +int main() { + std::cout << " ---------------------------------------------------------------------------- " << std::endl; + std::cout << " libgost15 operation performance " << std::endl; + std::cout << " ---------------------------------------------------------------------------- " << std::endl; + + benchmarkEncryption(defaultDuration); + benchmarkDecryption(defaultDuration); + + std::cout << " ---------------------------------------------------------------------------- " << std::endl; + return 0; +} \ No newline at end of file From c0fac66653c82ffd5a2034c51ff0360f813fd337 Mon Sep 17 00:00:00 2001 From: Arseny Aprelev Date: Wed, 20 Apr 2016 21:49:32 +0300 Subject: [PATCH 5/6] Updates selftests source Minor changes + Replaces libgost15 header include style --- src/libgost15/tests/selftests_gost.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/libgost15/tests/selftests_gost.c b/src/libgost15/tests/selftests_gost.c index 4744445..1472c5e 100644 --- a/src/libgost15/tests/selftests_gost.c +++ b/src/libgost15/tests/selftests_gost.c @@ -1,7 +1,7 @@ #include #include #include -#include "libgost15/gost15.h" +#include int testKeyScheduling(void) { From 351b2af2f0452574ba4ff2cbf7449c77cc691ba8 Mon Sep 17 00:00:00 2001 From: Arseny Aprelev Date: Wed, 20 Apr 2016 21:52:05 +0300 Subject: [PATCH 6/6] Updates README with Intel Core i7 benchmarks Minor changes + Adds benchmark data measured in Intel Core i7 @ 1.80 GHz (MacBook Air mid-2011) + Fixes title levels --- README.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 56d3653..c8553af 100644 --- a/README.md +++ b/README.md @@ -27,16 +27,24 @@ Performance is measured by a separate tool residing in benchmark subdirectory. I All functions provided by `libgost15` are thread-safe thus measuring takes place in single thread. -##### Benchmark data (Intel Core i5 Sandy Bridge @ 2.6 GHz, single core) +#### Benchmark data (Intel Core i5 Sandy Bridge @ 2.60 GHz, single core) | Operation | `compact` | `optimised` | `SIMD` | |:---------------- |:----------- |:------------- |:------------- | | Block encryption | 4.4321 MB/s | 100.8338 MB/s | 158.8720 MB/s | | Block decryption | 4.3837 MB/s | 102.0845 MB/s | 157.5190 MB/s | +#### Benchmark data (Intel Core i7-2677M Sandy Bridge @ 1.80 GHz, single core) + +| Operation | `compact` | `optimised` | `SIMD` | +|:---------------- |:----------- |:------------- |:------------- | +| Block encryption | 1.2840 MB/s | 62.6575 MB/s | 112.2875 MB/s | +| Block decryption | 1.2676 MB/s | 64.4036 MB/s | 114.6625 MB/s | + + ### Implementations -##### Compact implementation +#### Compact implementation Straightforward implementation of block encryption and decryption routines, with little or no major optimisations. Has lowest memory requirements. Does not require SSE instructions. @@ -48,7 +56,7 @@ Why use this and not [official TC26 implementation](http://tc26.ru/standard/gost This implementation is build by default and it does not require any special predefined variables. -##### Optimised implementation +#### Optimised implementation Optimised implementation employs vector-by-matrix multiplication precomutation technique described in [no link yet], similar to one in 64KB versions of AES. This implementation is much faster that the compact one, but requires 128KB os additional memory in data segment for storing precomputed tables. Does not require SSE instructions. @@ -58,7 +66,7 @@ To use optimised implementation, define `ENABLE_PRECALCULATIONS` environment var cmake -DCMAKE_BUILD_TYPE=Release -DENABLE_PRECALCULATIONS=ON ... ``` -##### SIMD implementation +#### SIMD implementation SIMD implementation utilises SSE instruction set, a set of extended processor instructions which enable one to operate over 128-bit XMM registers, thus further speeding up optimised implementation. Requires SSE2 or higher.