Release Merlin: HugeCTR V3.8 (Merlin 22.07) · NVIDIA-Merlin/HugeCTR

What's New in Version 3.8

Sample Notebook to Demonstrate 3G Embedding:
This release includes a sample notebook that introduces the Python API of the
embedding collection and the key concepts for using 3G embedding.
You can view HugeCTR Embedding Collection
from the documentation or access the embedding_collection.ipynb file from the
notebooks
directory of the repository.
DLPack Python API for Hierarchical Parameter Server Lookup:
This release introduces support for embedding lookup from the Hierarchical
Parameter Server (HPS) using the DLPack Python API. The new method is
lookup_fromdlpack(). For sample usage, see the
Lookup the Embedding Vector from DLPack
heading in the "Hierarchical Parameter Server Demo" notebook.
Read Parquet Datasets from HDFS with the Python API:
This release enhances the DataReaderParams
class with a data_source_params argument. You can use the argument to specify
the data source configuration such as the host name of the Hadoop NameNode and the NameNode port number to read from HDFS.
Logging Performance Improvements:
This release includes a performance enhancement that reduces the performance impact of logging.
Enhancements to Layer Classes:
- The FullyConnected layer now supports 3D inputs
- The MatrixMultiply layer now supports 4D inputs.
Documentation Enhancements:
- An automatically generated table of contents is added to the top of most
  pages in the web documentation. The goal is to provide a better experience
  for navigating long pages such as the
  HugeCTR Layer Classes and Methods
  page.
- URLs to the Criteo 1TB click logs dataset are updated. For an example, see the
  HugeCTR Wide and Deep Model with Criteo
  notebook.
Issues Fixed:
- The data generator for the Parquet file type is fixed and produces consistent file names between the _metadata.json file and the actual dataset files.
  Previously, running the data generator to create synthetic data resulted in a core dump.
  This issue was first reported in the GitHub issue 321.
- Fixed the memory crash in running a large model on multiple GPUs that occurred during AUC warm up.
- Fixed the issue of keyset generation in the ETC notebook.
  Refer to the GitHub issue 332 for more details.
- Fixed the inference build error that occurred when building with debug mode.
- Fixed the issue that multi-node training prints duplicate messages.
Known Issues:
- Hybrid embedding with IB_NVLINK as the communication_type of the
  HybridEmbeddingParam
  class does not work currently. We are working on fixing it. The other communication types have no issues.
- HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources.
  If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:
```
  -shm-size=1g -ulimit memlock=-1
```
  See also the NCCL known issue and the GitHub issue.
- KafkaProducers startup succeeds even if the target Kafka broker is unresponsive.
  To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and are reachable from the node where you run HugeCTR.
- The number of data files in the file list should be greater than or equal to the number of data reader workers.
  Otherwise, different workers are mapped to the same file and data loading does not progress as expected.
- Joint loss training with a regularizer is not supported.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merlin: HugeCTR V3.8 (Merlin 22.07)

What's New in Version 3.8