NVIDIA NeMo is an open-source toolkit for developing state-of-the-art conversational AI models.
Building state-of-the-art conversational AI models requires researchers to quickly experiment with novel network architectures. This means going through the complex and time-consuming process of modifying multiple networks and verifying compatibility across inputs, outputs, and data pre-processing layers.
NVIDIA NeMo is a Python toolkit for building, training, and fine-tuning GPU-accelerated conversational AI models using a simple interface. Using NeMo, researchers and developers can build state-of-the-art conversational AI models using easy-to-use application programming interfaces (APIs). NeMo runs mixed precision compute using Tensor Cores in NVIDIA GPUs and can scale up to multiple GPUs easily to deliver the highest training performance possible.
NeMo is used to build models for real-time automated speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications such as video call transcriptions, intelligent video assistants, and automated call center support across healthcare, finance, retail, and telecommunications.
Rapid Model Building
Configure, build, and train models quickly with simple Python APIs.
Download and customize pre-trained state-of-the-art models from NGC.
Interoperable with PyTorch and PyTorch Lightning ecosystem.
Apply NVIDIA® TensorRT™ optimizations for inference and export to NVIDIA Jarvis with a single command.
"Ping An addresses millions of queries from customers each day using chat-bot agents. As an early partner of the Jarvis early access program, we were able to use the tools and build better solutions with higher accuracy and lower latency, thus providing better services. More specifically, with NeMo, the pre-trained model, and the ASR pipeline optimized using Jarvis, the system achieved 5% improvement on accuracy, so as to serve our customers with better experience."
— Dr. Jing Xiao, the Chief Scientist at Ping An
"In our evaluation of Jarvis for virtual assistants and speech analytics, we saw remarkable accuracy by fine-tuning the Automated Speech Recognition models in the Russian language using the NeMo toolkit in Jarvis. Jarvis can provide up to 10x throughput performance with powerful TensorRT optimizations on models, so we’re looking forward to using Jarvis to get the most out of these technology advancements.”
— Nikita Semenov, Head of ML at MTS AI
“InstaDeep delivers decision-making AI products and solutions for enterprises. For this project, our goal is to build a virtual assistant in the Arabic language, and NVIDIA Jarvis played a significant role in improving the application’s performance. Using the NeMo toolkit in Jarvis, we were able to fine-tune an Arabic speech-to-text model to get a Word Error Rate as low as 7.84% and reduced the training time of the model from days to hours using GPUs. We look forward to integrating these models in Jarvis’ end-to-end pipeline to ensure real-time latency.”
— Karim Beguir, CEO and Co-Founder at InstaDeep
“Through the NVIDIA Jarvis early access program, we’ve been able to power our conversational AI products with state-of-the-art models using NVIDIA NeMo, significantly reducing the cost of getting started. Jarvis speech recognition has amazingly low latency and high accuracy. Having the flexibility to deploy on-prem and offer a range of data privacy and security options to our customers has helped us position our conversational AI-enabled products in new industry verticals.”
— Rajesh Jha, CEO of Siminsights.
"At MeetKai, we build virtual assistants that make people's lives easier. When we started our company, we faced engineering and production challenges because there weren’t many high-quality, open-source conversational AI toolkits. NVIDIA NeMo helped our engineering efforts by providing easy-to-use APIs and reducing our costs by 25%. We look forward to continuing to work with NeMo to create the ultimate AI helper.”
— James Kalpan, CEO of MeetKai
“Kensho leverages S&P Global's world-class data and research to build amazing tools that help people make fact-based decisions. Using NVIDIA NeMo on GPUs, Kensho successfully transcribed tens of thousands of earnings calls, management presentations, and acquisition calls, unlocking double-digit accuracy improvements and enabling S&P Global to increase earnings call coverage by more than 25%. ”
— Keenan Freyberg, Product Manager at Kensho
“Our goal with SpeechBrain at MILA is to build an all-in-one toolkit that can significantly speed up research and development for speech models. We’re interested in pushing the boundaries of speech technologies even further by integrating with NeMo modules, particularly speech recognition and language modeling.”
— Mirco Ravanelli, Speech and Deep Learning Scientist at MILA
Easily Compose New Model Architectures
NeMo includes domain-specific collections for ASR, NLP and TTS to develop state-of-the-art models such as QuartzNet, Jasper, BERT, Tacotron2, and WaveGlow in three lines of code. The NeMo model is composed of Neural Modules, which are the building blocks of models. The inputs and outputs of these modules are strongly typed with Neural types that can automatically perform the semantic checks between the modules.
NeMo is designed to offer high flexibility and you can use the Hydra framework to modify the behavior of models easily. For instance, you can modify the architecture of the Jasper Encoder module in the following diagram using Hydra.
Retrain SOTA Conversational AI Models
Several NeMo pre-trained state-of-the-art models are available in NGC that are trained for over 100,000 hours on NVIDIA DGX™ across open and proprietary datasets. You can fine tune these models or modify them with NeMo before training for your use case.
NeMo uses mixed precision on Tensor Cores to speed-up training upto 4.5X on a single GPU versus FP32 precision. You can further scale training to multi-GPU systems and multi-node clusters.
Flexible, Open-Source, Rapidly Expanding Ecosystem
NeMo is built on top of PyTorch and PyTorch Lightning, providing an easy path for researchers to develop and integrate with modules with which they are already comfortable. PyTorch and PyTorch lightning are open-source python libraries that provide modules to compose models
To provide the researcher's flexibility to customize the models/modules easily, NeMo integrated with the Hydra framework. Hydra is a popular framework that simplifies the development of complex conversational AI models.
NeMo is available as an open-source so that researchers can contribute to and build on it.
Deploy in Real-Time Services
Popular Framework Integrations
NeMo is built on top of the popular PyTorch framework and facilitates researchers to use the NeMo modules with PyTorch applications.
NeMo with Pytorch Lightning enables easy and performant multi-GPU/multi-node mixed-precision training.
Label Training Data Easily
NVIDIA NeMo provides the capability to train and fine tune state-of-the-art models built using it. Fine-tuning models requires high quality labeled data, which might not be readily available. NeMo is integrated with several easy-to-use speech and language data labeling tools to help acquire labeled data as well as label custom data.
Provides off-the-shelf training data in multiple languages and domains.
Generates high-quality labels and delivers accurate results in production.
Get Started with Tutorials
Check out tutorials to get up and running quickly with state-of-the-art speech and language models.
Take a NeMo Tour
Understand the advantages of using NVIDIA NeMo with a Jupyter Notebook walkthrough.
Build Conversational AI Applications
Learn how to build and fine-tune ASR, NLP, and TTS services with NVIDIA NeMo and Jarvis.