NVIDIA MERLIN HUGECTR

NVIDIA Merlin™ accelerates the entire pipeline, from ingesting and training to deploying GPU-accelerated recommender systems. Merlin HugeCTR (Huge Click-Through-Rate) is a deep neural network (DNN) training framework designed for recommender systems. It provides distributed training with model-parallel embedding tables, an embeddings cache, and data-parallel neural networks across multiple GPUs and nodes for maximum performance. HugeCTR covers common and recent architectures such as Deep Learning Recommendation Model (DLRM), Wide and Deep, Deep Cross Network (DCN), and DeepFM.



NVIDIA Merlin HugeCTR - Deep Neural Network Training Framework

Download and Try It Today


Merlin HugeCTR Core Features

Training Embeddings at Scale

Data scientists and machine learning engineers building deep learning recommenders work with large embedding tables that often exceed available memory. Merlin HugeCTR's model parallelism and embedding cache is designed for recommender workflows. This makes it easy to train an embedding table of any size and fully leverage compute memory. HugeCTR also leverages the NVIDIA Collective Communication Library (NCCL) for high-speed, multi-node, and multi-GPU communications at scale.

Learn More
NVIDIA Collective Communication Library (NCCL)
HugeCTRs embedding layer

Inherently Asynchronous, Multi-Threaded Pipeline

Effective data loading is challenging for machine learning engineers and data scientists who are continuously experimenting, training, and fine-tuning recommender models. HugeCTR's data reader is inherently asynchronous and multi-threaded. It will read batched data records that are high-dimensional, sparse, or categorical. Each record is fed directly to fully connected layers. HugeCTR's embedding layer compresses input-sparse features to dense-embedding vectors. HugeCTR's model parallelism enables embedded training in a homogeneous cluster across multiple nodes and GPUs.

Explore HugeCTR on GitHub

Inference, Hierarchical Deployment on Multiple GPUs

HugeCTR provides concurrent model inference execution across multiple GPUs through the use of a parameter server and embedding cache that are shared between multiple model instances. HugeCTR also leverages NVIDIA Triton™ Inference Server to ease workflows for data scientists and machine learning engineers when deploying models to production.

Learn more
NVIDIA Triton™ Inference Server

HugeCTR : open-source component of NVIDIA Merlin

Interoperability with Open Source

Machine learning engineers and data scientists use a hybrid of methods, libraries, tools, and frameworks that often include open-source components. HugeCTR is an open-source component of NVIDIA Merlin and is designed to optimize embeddings training within recommender workflows. HugeCTR is interoperable with open source and includes an open source Python package that supports sparse training and inference with TensorFlow.

Learn more

Embeddings Optimization

Embeddings optimization enables more experimentation, fine tuning, and better prediction at scale. HugeCTR's optimized embedding implementation is up to 8X more performant than other frameworks’ embedding layers. This optimized implementation is also made available as a TensorFlow plug-in that works seamlessly with TensorFlow and acts as a convenient drop-in replacement for the TensorFlow-native embedding layers.

Learn more  
HugeCTR: Embeddings optimization

Get Started with Merlin HugeCTR

All NVIDIA Merlin components are available as open-source projects on GitHub. However, a more convenient way to make use of these components is by using Merlin HugeCTR containers from the NVIDIA NGC catalog. Containers package the software application, libraries, dependencies, and runtime compilers in a self-contained environment. This way, the application environment is both portable, consistent, reproducible, and agnostic to the underlying host system software configuration.


Merlin Training

The NGC container allows users to do preprocessing, feature engineering, and training of a deep learning-based recommender system model with HugeCTR.

Merlin Inference

HugeCTR supports Triton Inference Server to provide GPU-accelerated inference. The NGC container enables users to deploy Merlin NVTabular workflows and HugeCTR models to Triton Inference Server for production.

HugeCTR on GitHub

The GitHub repo helps users get started with HugeCTR and quickly train a model using a Python interface. Available resources include documentation, tutorials, examples, and notebooks.


Diagram illustrating components of NVIDIA Merlin

Merlin HugeCTR Resources

Explore all Merlin resources.

Tencent and Merlin HugeCTR

Learn how Tencent deployed their real advertising recommendation training with Merlin and achieved more than 7X speedup over the original TensorFlow solution on the same GPU platform.

WATCH THE ON-DEMAND GTC SESSION

Merlin Reference Applications

Get started with open source reference implementations and achieve state-of-the-art accuracy on public datasets with up to 10X the acceleration.

GET DLRM FOR PYTORCH
GET WIDE AND DEEP FOR TENSORFLOW

Best Practices from Tencent

Discover insights, advice, and best practices about leading the design and development of Tencent's deep learning recommendations system.

READ INTERVIEW

Meituan and Merlin HugeCTR

Learn how Meituan optimizes their machine learning platform by building a high-performance deep learning training framework deployed on CPU and GPU clusters.

READ INTERVIEW

HugeCTR is available to download from the NVIDIA NGC catalog or from the GitHub repository.

Download from NGC