-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add compile-time support for AVX2/512 streaming operations in LQ (#664)
* Add support for compile-time generation of streaming AVX kernels * Add streaming and tuning docs * Auto update version * Trigger CI * Update overloads * Auto update version * Auto update version * Trigger CI * Update doc/lightning_qubit/development/avx_kernels/kernel_tuning.rst Co-authored-by: Amintor Dusko <[email protected]> * Update changelog * Auto update version * Trigger CI * Update doc/lightning_qubit/development/avx_kernels/kernel_tuning.rst Co-authored-by: Vincent Michaud-Rioux <[email protected]> * Update doc/lightning_qubit/development/avx_kernels/kernel_tuning.rst Co-authored-by: Vincent Michaud-Rioux <[email protected]> * Auto update version from '0.36.0-dev34' to '0.36.0-dev37' * Updates from code review * Auto update version from '0.36.0-dev37' to '0.36.0-dev38' * Auto update version from '0.36.0-dev38' to '0.36.0-dev39' * Auto update version from '0.36.0-dev40' to '0.36.0-dev41' * Update doc/lightning_qubit/development/avx_kernels/kernel_tuning.rst Co-authored-by: Ali Asadi <[email protected]> * Update doc/lightning_qubit/development/avx_kernels/kernel_tuning.rst Co-authored-by: Ali Asadi <[email protected]> --------- Co-authored-by: Dev version update bot <github-actions[bot]@users.noreply.github.com> Co-authored-by: Amintor Dusko <[email protected]> Co-authored-by: Vincent Michaud-Rioux <[email protected]> Co-authored-by: ringo-but-quantum <[email protected]> Co-authored-by: Ali Asadi <[email protected]>
- Loading branch information
1 parent
5feb4a1
commit 6260d59
Showing
8 changed files
with
120 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -22,3 +22,4 @@ AVX2/AVX512 kernels | |
|
||
implementation | ||
build_system | ||
kernel_tuning |
13 changes: 13 additions & 0 deletions
13
doc/lightning_qubit/development/avx_kernels/kernel_tuning.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
Kernel performance tuning | ||
######################### | ||
|
||
Lightning-Qubit's kernel implementations are by default tuned for high throughput single-threaded performance with gradient workloads. To enable this, we add OpenMP threading within the adjoint differentiation method implementation and use SIMD-level intrinsics to ensure fast performance for each given circuit in such a workload. | ||
|
||
However, sometimes we may want to modify the above defaults to favour a given workload, such as by enabling multi-threaded execution of the gate kernels instead. For this, we have several compile-time flags to change the operating behaviour of Lightning-Qubit kernels. | ||
|
||
OpenMP threaded kernels | ||
----------------------- | ||
|
||
To enable OpenMP acceleration of the gate kernels, Lightning-Qubit can be compiled with the ``-DLQ_ENABLE_KERNEL_OMP=ON`` CMake flag. Not, that for gradient workloads with many observables, this may reduce performance in comparison with the default mode, so this behaviour is opt-in only. | ||
|
||
For workloads that show benefit from the use of threaded gate kernels, sometimes updating the CPU cache to accommodate recently modified data can become a bottleneck, and saturates the performance gained at high thread counts. This may be alleviated somewhat on systems supporting AVX2 and AVX-512 operations using the ``-DLQ_ENABLE_KERNEL_AVX_STREAMING=on`` CMake flag. This forces the data to avoid updating the CPU cache and can improve performance for larger workloads. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,4 +16,4 @@ | |
Version number (major.minor.patch[-label]) | ||
""" | ||
|
||
__version__ = "0.36.0-dev40" | ||
__version__ = "0.36.0-dev41" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters