diff --git a/docs/source/api/python_interface.md b/docs/source/api/python_interface.md index ab4e0e141f..3476f086f7 100755 --- a/docs/source/api/python_interface.md +++ b/docs/source/api/python_interface.md @@ -643,6 +643,7 @@ It trains the model for a fixed number of epochs (epoch mode) or iterations (non * `snapshot`: Integer, the interval of iterations at which the snapshot model weights and optimizer states will be saved to files. This argument is invalid when embedding training cache is being used, which means no model parameters will be saved. The default value is 10000. * `snapshot_prefix`: String, the prefix of the file names for the saved model weights and optimizer states. This argument is invalid when embedding training cache is being used, which means no model parameters will be saved. The default value is `''`. Remote file systems(HDFS and S3) are also supported. For example, for HDFS, the prefix can be `hdfs://localhost:9000/dir/to/model`. For S3, the prefix should be either virtual-hosted-style or path-style and contains the region information. For examples, take a look at the AWS official [documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html). +**Please note that dumping models to remote file system when enabled MPI is not supported yet.** *** @@ -1090,7 +1091,7 @@ The stored sparse model can be used for both the later training and inference ca Note that the key, slot id, and embedding vector are stored in the sparse model in the same sequence, so both the nth slot id in `slot_id` file and the nth embedding vector in the `emb_vector` file are mapped to the nth key in the `key` file. **Arguments** -* `prefix`: String, the prefix of the saved files for model weights and optimizer states. There is NO default value and it should be specified by users. Remote file systems(HDFS and S3) are also supported. For example, for HDFS, the prefix can be `hdfs://localhost:9000/dir/to/model`. For S3, the prefix should be either virtual-hosted-style or path-style and contains the region information. For examples, take a look at the AWS official [documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html). +* `prefix`: String, the prefix of the saved files for model weights and optimizer states. There is NO default value and it should be specified by users. Remote file systems(HDFS and S3) are also supported. For example, for HDFS, the prefix can be `hdfs://localhost:9000/dir/to/model`. For S3, the prefix should be either virtual-hosted-style or path-style and contains the region information. For examples, take a look at the AWS official [documentation](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-bucket-intro.html). **Please note that dumping models to remote file system when enabled MPI is not supported yet.** * `iter`: Integer, the current number of iterations, which will be the suffix of the saved files for model weights and optimizer states. The default value is 0. diff --git a/docs/source/sparse_operation_kit.md b/docs/source/sparse_operation_kit.md new file mode 100644 index 0000000000..7bae89e76f --- /dev/null +++ b/docs/source/sparse_operation_kit.md @@ -0,0 +1,9 @@ +# Sparse Operation Kit + +[Sparse Operation Kit (SOK)](https://github.com/NVIDIA-Merlin/HugeCTR/tree/master/sparse_operation_kit) is a Python package wrapped GPU accelerated operations dedicated for sparse training / inference cases. It is designed to be compatible with common deep learning (DL) frameworks like TensorFlow. +In sparse training / inference scenarios, for instance, CTR estimation, there are vast amounts of parameters which cannot fit into the memory of a single GPU. Many common DL frameworks only offer limited support for model parallelism (MP), because it can complicate using all available GPUs in a cluster to accelerate the whole training process. +SOK provides broad MP functionality to fully utilize all available GPUs, regardless of whether these GPUs are located in a single machine or multiple machines. Simultaneously, SOK takes advantage of existing data-parallel (DP) capabilities of DL frameworks to accelerate training while minimizing code changes. With SOK embedding layers, you can build a DNN model with mixed MP and DP. MP is used to shard large embedding parameter tables, such that they are distributed among the available GPUs to balance the workload, while DP is used for layers that only consume little GPU resources. + +Please check this [SOK Documentation](https://nvidia-merlin.github.io/HugeCTR/sparse_operation_kit/master/index.html) for detail. + + \ No newline at end of file diff --git a/docs/source/toc.yaml b/docs/source/toc.yaml index 9b8f31fea2..faf594c9a1 100755 --- a/docs/source/toc.yaml +++ b/docs/source/toc.yaml @@ -28,6 +28,8 @@ subtrees: - file: hierarchical_parameter_server/notebooks/hps_tensorflow_triton_deployment_demo.ipynb - file: hierarchical_parameter_server/api/index.rst title: API Documentation + - file: sparse_operation_kit.md + title: Sparse Operation Kit - file: performance.md title: Performance - file: notebooks/index.md diff --git a/docs/source/user_guide_src/workflow_of_embeddinglayer.png b/docs/source/user_guide_src/workflow_of_embeddinglayer.png new file mode 100644 index 0000000000..b8915cb0b8 Binary files /dev/null and b/docs/source/user_guide_src/workflow_of_embeddinglayer.png differ diff --git a/release_notes.md b/release_notes.md index 1c6d6cb183..db9287e972 100755 --- a/release_notes.md +++ b/release_notes.md @@ -70,6 +70,7 @@ By using the interface, the input DLPack capsule of embedding key can be a GPU t Otherwise, different workers are mapped to the same file and data loading does not progress as expected. + Joint loss training with a regularizer is not supported. + Dumping Adam optimizer states to AWS S3 is not supported. + + Dumping to remote file systems when enabled MPI is not supported. ## What's New in Version 4.0