Skip to content

Merlin: HugeCTR V4.3 (Merlin 22.12)

Compare
Choose a tag to compare
@minseokl minseokl released this 05 Jan 03:04
· 669 commits to main since this release

What's New in Version 4.3

In January 2023, the HugeCTR team plans to deprecate semantic versioning, such as `v4.3`.
Afterward, the library will use calendar versioning only, such as `v23.01`.
  • Support for BERT and Variants:
    This release includes support for BERT in HugeCTR.
    The documentation includes updates to the MultiHeadAttention layer and adds documentation for the SequenceMask layer.
    For more information, refer to the samples/bst directory of the repository in GitHub.

  • HPS Plugin for TensorFlow integration with TensorFlow-TensorRT (TF-TRT):
    This release includes plugin support for integration with TensorFlow-TensorRT.
    For sample code, refer to the Deploy SavedModel using HPS with Triton TensorFlow Backend notebook.

  • Deep & Cross Network Layer version 2 Support:
    This release includes support for Deep & Cross Network version 2.
    For conceptual information, refer to https://arxiv.org/abs/2008.13535.
    The documentation for the MultiCross Layer is updated.

  • Enhancements to Hierarchical Parameter Server:

    • RedisClusterBackend now supports TLS/SSL communication.
      For sample code, refer to the Hierarchical Parameter Server Demo notebook.
      The notebook is updated with step-by-step instructions to show you how to setup HPS to use Redis with (and without) encryption.
      The Volatile Database Parameters documentation for HPS is updated with the enable_tls, tls_ca_certificate, tls_client_certificate, tls_client_key, and tls_server_name_identification parameters.
    • MultiProcessHashMapBackend includes a bug fix that prevented configuring the shared memory size when using JSON file-based configuration.
    • On-device input keys are supported now so that an extra host-to-device copy is removed to improve performance.
    • A dependency on the XX-Hash library is removed.
      The library is no longer used by HugeCTR.
    • Added the static table support to the embedding cache.
      The static table is suitable when the embedding table can be placed entirely in GPU memory.
      In this case, the static table is more than three times faster than the embedding cache lookup.
      The static table does not support embedding updates.
  • Support for New Optimizers:

    • Added support for SGD, Momentum SGD, Nesterov Momentum, AdaGrad, RMS-Prop, Adam and FTRL optimizers for dynamic embedding table (DET).
      For sample code, refer to the test_embedding_table_optimizer.cpp file in the test/utest/embedding_collection/ directory of the repository on GitHub.
    • Added support for the FTRL optimizer for dense networks.
  • Data Reading from S3 for Offline Inference:
    In addition to reading during training, HugeCTR now supports reading data from remote file systems such as HDFS and S3 during offline inference by using the DataSourceParams API.
    The HugeCTR Training and Inference with Remote File System Example is updated to demonstrate the new functionality.

  • Documentation Enhancements:

  • Issues Fixed:

    • The original CUDA device with NUMA bind before a call to some HugeCTR APIs is recovered correctly now.
      This issue sometimes lead to a problem when you mixed calls to HugeCTR and other CUDA enabled libraries.
    • Fixed the occasional CUDA kernel launch failure of embedding when installed HugeCTR with macro DEBUG.
    • Fixed an SOK build error that was related to TensorFlow v2.1.0 and higher.
      The issue was that the C++ API and C++ standard were updated to use C++17.
    • Fixed a CUDA 12 related compilation error.
  • Known Issues:

    • HugeCTR can lead to a runtime error if client code calls the RMM rmm::mr::set_current_device_resource() method or rmm::mr::set_current_device_resource() method.
      The error is due to the Parquet data reader in HugeCTR also calling rmm::mr::set_current_device_resource().
      As a result, the device becomes visible to other libraries in the same process.
      Refer to GitHub issue #356 for more information.
      As a workaround, you can set environment variable HCTR_RMM_SETTABLE to 0 to prevent HugeCTR from setting a custom RMM device resource, if you know that rmm::mr::set_current_device_resource() is called by client code other than HugeCTR.
      But be cautious because the setting can reduce the performance of Parquet reading.

    • HugeCTR uses NCCL to share data between ranks and NCCL can require shared system memory for IPC and pinned (page-locked) system memory resources.
      If you use NCCL inside a container, increase these resources by specifying the following arguments when you start the container:

        -shm-size=1g -ulimit memlock=-1

      See also the NCCL known issue and the GitHub issue #243.

    • KafkaProducers startup succeeds even if the target Kafka broker is unresponsive.
      To avoid data loss in conjunction with streaming-model updates from Kafka, you have to make sure that a sufficient number of Kafka brokers are running, operating properly, and are reachable from the node where you run HugeCTR.

    • The number of data files in the file list should be greater than or equal to the number of data reader workers.
      Otherwise, different workers are mapped to the same file and data loading does not progress as expected.

    • Joint loss training with a regularizer is not supported.

    • Dumping Adam optimizer states to AWS S3 is not supported.