Skip to content

Commit

Permalink
Refactor the remaining topics and copy editing (#3429) (#3466)
Browse files Browse the repository at this point in the history
(cherry picked from commit 781f8ff)

Co-authored-by: BrianHarrisonAMD <[email protected]>
  • Loading branch information
amd-jnovotny and BrianHarrisonAMD authored Jan 10, 2025
1 parent 5e791ce commit 4b72d4d
Show file tree
Hide file tree
Showing 8 changed files with 286 additions and 284 deletions.
18 changes: 9 additions & 9 deletions docs/conceptual/MI200-alt-implementation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,25 +6,25 @@
MI200 matrix fused multiply-add (MFMA) behavior specifics
***********************************************************************************

The MI200 ``MFMA_F16``, ``MFMA_BF16``, and ``MFMA_BF16_1K`` flush subnormal input/output data to
On the MI200, ``MFMA_F16``, ``MFMA_BF16``, and ``MFMA_BF16_1K`` flush subnormal input/output data to
zero. This behavior might affect the convolution operation in certain workloads due to the limited
exponent range of the half-precision floating-point datatypes.

MIOpen offers an alternate implementation for the half-precision datatype via conversion instructions
to utilize the BFloat16 datatype's larger exponent range, albeit with reduced accuracy. The following
salients apply to this alternate implementation:
to utilize the larger exponent range of the ``BFloat16`` data type, albeit with reduced accuracy. The following
caveats apply to this alternate implementation:

* It's disabled by default in the forward convolution operations.

* It's enabled by default in the backward data and backward weights convolution operations.

* You can override the default MIOpen behaviors by using the ``miopenSetConvolutionAttribute`` API
call: Pass the convolution descriptor for the appropriate convolution operation, and the
``MIOPEN_CONVOLUTION_ATTRIB_FP16_ALT_IMPL`` convolution attribute (with a non-zero value), to
engage the alternate implementation.
* You can override the default MIOpen behavior by using the ``miopenSetConvolutionAttribute`` API
call. To use the alternate implementation, pass the convolution descriptor for the appropriate
convolution operation and the ``MIOPEN_CONVOLUTION_ATTRIB_FP16_ALT_IMPL`` convolution
attribute (with a non-zero value).

* You can also override the behavior using the
``MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL`` environment variable. When set to ``1``,
``MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL`` engages the alternate implementation;
when set to ``0``, it's disabled. Keep in mind that the environment variable impacts the convolution
``MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL`` engages the alternate implementation.
When set to ``0``, it's disabled. This environment variable impacts the convolution
operation in all directions.
47 changes: 25 additions & 22 deletions docs/conceptual/cache.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
.. meta::
:description: Using kernel cache
:description: Using the MIOpen kernel cache
:keywords: MIOpen, ROCm, API, documentation, kernel cache

********************************************************************
Expand All @@ -13,50 +13,53 @@ build time by setting the ``MIOPEN_CACHE_DIR`` CMake variable.
Clear the cache
====================================================

You can clear the cache by deleting the cache directory (e.g., ``$HOME/.cache/miopen``). We
recommend that you only do this for development purposes or to free disk space. You don't need to
You can clear the cache by deleting the cache directory (for example, ``$HOME/.cache/miopen``). However,
you should only do this for development purposes or to free disk space. You don't need to
clear the cache when upgrading MIOpen.

Disabling the cache
====================================================

Disabling the cache is generally useful for development purposes. You can disable the cache:
Disabling the cache is generally useful for development purposes. You can disable the cache
in this following situations:

* During **build**, either by setting ``MIOPEN_CACHE_DIR`` to an empty string or setting
``BUILD_DEV=ON`` when configuring CMake
* At **runtime** by setting the ``MIOPEN_DISABLE_CACHE`` environment variable to ``true``.
* During the build, either set ``MIOPEN_CACHE_DIR`` to an empty string or set
``BUILD_DEV=ON`` when configuring CMake.
* At runtime, set the ``MIOPEN_DISABLE_CACHE`` environment variable to ``true``.

Updating MIOpen and removing the cache
===============================================================

For MIOpen version 2.3 and earlier, if the compiler changes or you modify the kernels. then you must
delete the cache for the existing MIOpen version
(e.g., ``rm -rf $HOME/.cache/miopen/<miopen-version-number>``).

For MIOpen version 2.4 and later, MIOpen's kernel cache directory is versioned, so cached kernels
For MIOpen version 2.4 and later, MIOpen's kernel cache directory is versioned, so any existing cached kernels
won't collide when upgrading.

Installing pre-compiled kernels
.. note::

For MIOpen version 2.3 and earlier, if the compiler changes or you modify the kernels, then you must
delete the cache for the existing MIOpen version using the command
``rm -rf $HOME/.cache/miopen/<miopen-version-number>``.

Installing precompiled kernels
====================================================

GPU architecture-specific, pre-compiled kernel packages are available in the ROCm package
repositories. These reduce the startup latency of MIOpen kernels (they contain the kernel cache file
and install it in the ROCm installation directory along with other MIOpen artifacts). When launching a
kernel, MIOpen first checks for a kernel in the kernel cache within the MIOpen installation directory. If
the file doesn't exist, or the required kernel isn't found, the kernel is compiled and placed in your
GPU architecture-specific, precompiled kernel packages are available in the ROCm package
repositories. These packages reduce the startup latency of MIOpen kernels. They contain a kernel cache file,
which they install in the ROCm installation directory along with other MIOpen artifacts. When MIOpen launches a
kernel, it first checks for a kernel in the kernel cache within the MIOpen installation directory. If
the file doesn't exist, or the required kernel isn't found, it compiles the kernel and places it in the
kernel cache.

These packages are optional and must be separately installed from MIOpen. If you want to conserve
disk space, you may choose not to install these packages (though without them, you'll have higher
These packages are optional and must be separately installed from MIOpen. To conserve
disk space, you can choose not to install these packages (which would result in higher
startup latency). You also have the option to only install kernel packages for your device architecture,
which helps save disk space.

If the MIOpen kernels package is not installed, or if the kernel doens't match the GPU, you'll get a
If the MIOpen kernels package is not installed, or if the kernel doesn't match the GPU, you'll get a
warning message similar to:

.. code:: bash
> MIOpen(HIP): Warning [SQLiteBase] Missing system database file:gfx906_60.kdb Performance may degrade
> MIOpen(HIP): Warning [SQLiteBase] Missing system database file:gfx906_60.kdb Performance may degrade
The performance degradation mentioned in the warning only affects the network start-up time (the
"initial iteration time") and can be safely ignored.
Expand Down
63 changes: 30 additions & 33 deletions docs/conceptual/finddb.rst
Original file line number Diff line number Diff line change
@@ -1,75 +1,72 @@
.. meta::
:description: Using the Find Database
:description: Using the MIOpen Find Database
:keywords: MIOpen, ROCm, API, documentation

********************************************************************
Using the find database
********************************************************************

Prior to MIOpen 2.0, you could use calls (such as ``miopenFindConvolution*Algorithm()``) to gather a
set of convolution algorithms in the form of an array of ``miopenConvSolution_t`` structs. This process
is time-consuming because it requires online benchmarking of competing algorithms.
MIOpen 2.0 introduced :doc:`immediate mode <../how-to/find-and-immediate>`, which
is based on a find database called FindDb. This database contains the results of calls to the legacy ``Find()`` stage.

As of MIOpen 2.0, we introduced an :doc:`immediate mode <../how-to/find-and-immediate>`, which
is based on a database that contains the results of calls to the legacy ``Find()`` stage. We refer to the
find database as FindDb.
.. note::

Prior to MIOpen 2.0, you could use calls (such as ``miopenFindConvolution*Algorithm()``) to gather a
set of convolution algorithms in the form of an array of ``miopenConvSolution_t`` structs. This process
is time consuming because it requires online benchmarking of competing algorithms.

FindDb consists of two parts:

* **System FindDb**: A system-wide storage that holds pre-run values for the most applicable
configurations.
* **User FindDb**: A per-user storage that holds results for arbitrary user-run configurations. It also
serves as a cache for the ``Find()`` stage.
* **System FindDb**: A system-wide storage that holds pre-run values for the most applicable
configurations.
* **User FindDb**: A per-user storage that holds results for arbitrary user-run configurations. It also
serves as a cache for the ``Find()`` stage.

User FindDb `always takes precedence` over System FindDb.
The User FindDb *always takes precedence* over the System FindDb.

By default, System FindDb resides within MIOpen's install location, while User FindDb resides in your
By default, System FindDb resides within the MIOpen install location, while User FindDb resides in your
home directory.

Note that:
.. note::

* The System FindDb is `not` modified upon installation of MIOpen.
* There are separate Find databases for HIP and OpenCL backends.
* The System FindDb is *not* modified upon installation of MIOpen.
* There are separate Find databases for the HIP and OpenCL backends.

Populating User FindDb
=============================================================

MIOpen collects FindDb information during the following API calls:

* ``miopenFindConvolutionForwardAlgorithm()``
* ``miopenFindConvolutionBackwardDataAlgorithm()``
* ``miopenFindConvolutionBackwardWeightsAlgorithm()``

During the call, find data entries are collected for one `problem configuration`, which is implicitly
defined by the tensor descriptors and convolution descriptor passed to API function.
* ``miopenFindConvolutionForwardAlgorithm()``
* ``miopenFindConvolutionBackwardDataAlgorithm()``
* ``miopenFindConvolutionBackwardWeightsAlgorithm()``

During the call, find data entries are collected for one specific "problem configuration", which is implicitly
defined by the tensor descriptors and convolution descriptor passed to the API function.

Updating MIOpen and User FindDb
=============================================================

When you install a new version of MIOpen, this new version ignores old User FindDb files. Therefore,
When you install a new version of MIOpen, the new version ignores old User FindDb files. Therefore,
you don't need to move or delete the old User FindDb files.

If you want to re-collect the information into the new User FindDb, you can use the same steps you
To collect the previous information again into the new User FindDb, use the same steps you
followed in the previous version. Re-collecting information keeps immediate mode optimized.


Disabling FindDb
=============================================================

You can disable FindDb by setting the ``MIOPEN_DEBUG_DISABLE_FIND_DB`` environmental variable
to 1:
To disable FindDb, set the ``MIOPEN_DEBUG_DISABLE_FIND_DB`` environmental variable to ``1``:

.. code:: bash
export MIOPEN_DEBUG_DISABLE_FIND_DB=1
export MIOPEN_DEBUG_DISABLE_FIND_DB=1
.. note::

System FindDb can be cached into memory and may dramatically increase performance. To disable
this option, set the ``DMIOPEN_DEBUG_FIND_DB_CACHING`` CMake configuration flag to off.
System FindDb can be cached into memory, which might dramatically increase performance. To disable
this option, set the ``DMIOPEN_DEBUG_FIND_DB_CACHING`` CMake configuration flag to off.

.. code:: bash
.. code:: bash
-DMIOPEN_DEBUG_FIND_DB_CACHING=Off
-DMIOPEN_DEBUG_FIND_DB_CACHING=Off
61 changes: 32 additions & 29 deletions docs/conceptual/perfdb.rst
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
.. meta::
:description: Using the performance database
:description: Using the MIOpen performance database
:keywords: MIOpen, ROCm, API, documentation, performance database

************************************************************************************************
Using the performance database
************************************************************************************************

Many MIOpen kernels have parameters that affect their performance. Setting these parameters to
optimal values allows for the best possible throughput. Optimal values depend on many factors,
including network configuration, GPU type, clock frequencies, and ROCm version.
optimal values allows for the best possible throughput. The optimal values depend on many factors,
including the network configuration, GPU type, clock frequencies, and ROCm version.

Due to the large number of possible configurations and settings, MIOpen provides a set of pre-tuned
values for the `most applicable` network configurations and a means for expanding the set of
values for the "most applicable" network configurations and a method for expanding the set of
optimized values. MIOpen's performance database (PerfDb) contains these pre-tuned parameter values
in addition to any user-optimized parameters.

Expand All @@ -21,17 +21,17 @@ The PerfDb consists of two parts:
configurations.
* **User PerfDb**: A per-user storage that holds optimized values for arbitrary configurations.

User PerfDb `always takes precedence` over System PerfDb.
The User PerfDb *always takes precedence* over System PerfDb.

MIOpen also has auto-tuning functionality, which is able to find optimized kernel parameter values for
a specific configuration. The auto-tune process may take a long time, but once optimized values are
a specific configuration. The auto-tune process might take a long time, but after the optimized values are
found, they're stored in the User PerfDb. MIOpen then automatically reads and uses these parameter
values.

By default, System PerfDb resides within MIOpen's install location, while User PerfDb resides in your
home directory. See :ref:`setting up locations <setting-up-locations>` for more information.
By default, System PerfDb resides within the MIOpen install location, while User PerfDb resides in your
home directory. For more information, see :ref:`setting up locations <setting-up-locations>`.

System PerfDb is not modified during MIOpen installation.
The System PerfDb is not modified during the MIOpen installation.

Auto-tuning kernels
==========================================================
Expand All @@ -42,17 +42,17 @@ MIOpen performs auto-tuning during the these API calls:
* ``miopenFindConvolutionBackwardDataAlgorithm()``
* ``miopenFindConvolutionBackwardWeightsAlgorithm()``

Auto-tuning is performed for only one `problem configuration`, which is implicitly defined by the
Auto-tuning is performed for only one "problem configuration", which is implicitly defined by the
tensor descriptors that are passed to the API function.

In order for auto-tuning to begin, the following conditions must be met:

* The applicable kernels have tuning parameters
* The value of the ``exhaustiveSearch`` parameter is ``true``
* Neither System nor User PerfDb can contain values for the relevant `problem configuration`.
* The applicable kernels must have tuning parameters
* The value of the ``exhaustiveSearch`` parameter is set to ``true``
* Neither the System nor User PerfDb can contain values for the relevant "problem configuration".

You can override the latter two conditions by enforcing the search using the
``- MIOPEN_FIND_ENFORCE`` environment variable. You can also use this variable to remove values
You can override the latter two conditions and force the search using the
``-MIOPEN_FIND_ENFORCE`` environment variable. You can also use this variable to remove values
from User PerfDb, as described in the following section.

To optimize performance, MIOpen provides several find modes to accelerate find API calls.
Expand All @@ -63,33 +63,36 @@ These modes include:
* hybrid find
* dynamic hybrid find

For more information about MIOpen find modes, see :ref:`Find modes <find_modes>`.
For more information about the MIOpen find modes, see :ref:`Find modes <find_modes>`.

Using MIOPEN_FIND_ENFORCE
----------------------------------------------------------------------------------------------------------

``MIOPEN_FIND_ENFORCE`` supports symbolic (case-insensitive) and numeric values. Possible values
``MIOPEN_FIND_ENFORCE`` supports case-insensitive symbolic and numeric values. The possible values
are:

* ``NONE``/``(1)``: No change in the default behavior.
* ``DB_UPDATE/``(2)``: Do not skip auto-tune (even if PerfDb already contains optimized values). If you
* ``DB_UPDATE``/``(2)``: Do not skip auto-tune (even if PerfDb already contains optimized values). If you
request auto-tune via API, MIOpen performs it and updates PerfDb. You can use this mode for
fine-tuning the MIOpen installation on your system. However, this mode slows down processes.
fine-tuning the MIOpen installation on your system. However, this mode slows down the processes.
* ``SEARCH``/``(3)``: Perform auto-tune even if not requested via API. In this case, the library behaves as
if the ``exhaustiveSearch`` parameter set to ``true``. If PerfDb already contains optimized values,
auto-tune is not performed. You can use this mode to tune applications that don't anticipate means
for getting the best performance from MIOpen. When in this mode, your application's first run may
if the ``exhaustiveSearch`` parameter is set to ``true``. If PerfDb already contains optimized values,
auto-tune is not performed. You can use this mode to tune applications that don't anticipate any means
of getting the best performance from MIOpen. When in this mode, your application's first run might
take substantially longer than expected.
* ``SEARCH_DB_UPDATE``/``(4)``: A combination of ``DB_UPDATE`` and ``SEARCH``. MIOpen performs
auto-tune (and updates User PerfDb) on each ``miopenFindConvolution*()`` call. This mode is
recommended only for debugging purposes.
* ``DB_CLEAN``/``(5)``: Removes optimized values related to the `problem configuration` from User
PerfDb. Auto-tune is blocked, even if explicitly requested. System PerfDb is left intact. **Use this
option with care.**
auto-tune and updates User PerfDb on each ``miopenFindConvolution*()`` call. This mode is
only recommended for debugging purposes.
* ``DB_CLEAN``/``(5)``: Removes optimized values related to the "problem configuration" from User
PerfDb. Auto-tune is blocked, even if explicitly requested. System PerfDb is left intact.

.. caution::

Use the ``DB_CLEAN`` option with care.

Updating MIOpen and User PerfDb
==========================================================

If you install a new version of MIOpen, we strongly recommend moving or deleting your old User
If you install a new version of MIOpen, it is recommended that you move or delete your old User
PerfDb file. This prevents older database entries from affecting configurations within the newer system
database. The User PerfDb is named ``miopen.udb`` and is located at the User PerfDb path.
database. The User PerfDb is named ``miopen.udb`` and can be found at the User PerfDb path location.
Loading

0 comments on commit 4b72d4d

Please sign in to comment.