TensorRT Getting Started
TensorRT: What’s New
TensorRT 8.2 includes new optimizations to run billion parameter language models in real time.
TensorRT is also integrated with PyTorch and TensorFlow.
- TensorRT 8.2 - Optimizations for T5 and GPT-2 deliver real time translation and summarization with 21x faster performance vs CPUs
- TensorRT 8.2 - Simple Python API for developers using Windows
- Torch-TensorRT - Integration for PyTorch delivers up to 6x performance vs in-framework inference on GPUs with just 1 line of code
- TensorFlow-TensorRT - TensorFlow Integration with TensorRT delivers up to 6x faster performance versus in-framework inference on GPUs with one line of code
Torch-TensorRT is available today in the PyTorch Container from the NGC catalog here.
TensorFlow-TensorRT is available today in the TensorFlow Container from the NGC catalog here.
TensorRT 8.2 is available freely to members of the NVIDIA Developer Program today.
Learn how to apply TensorRT optimizations and deploy a PyTorch model to GPUs.
Watch and learn more about TensorRT 8.2 features, and tools that simplify the inference workflow.
See how to get started with TensorRT in this step-by-step developer guide and API reference.
Additional TensorRT Resources
- Getting Started: Torch-TensorRT, TensorFlow-TensorRT (Video)
- Up to 6x Faster Inference in PyTorch on GPUs with Torch-TensorRT (Blog)
- Accelerate PyTorch Inference with Torch-TensorRT (Webinar)
- Leveraging TensorFlow-TensorRT integration for Low latency Inference (Blog)
- Image Classification with ResNet-50 using TensorFlow-TensorRT (Video)
- Deploying Quantization Aware Trained models in INT8 using Torch-TensorRT (Notebook)
- Object Detection with TF-TRT (Notebook)
- Accelerate Semantic Segmentation with TF-TRT (Notebook)
- Image Classification with Torch-TensorRT (Notebook)
- Real-Time Inference T5 & GPT-2 using NVIDIA TensorRT (Blog)
- Real-Time Natural Language Understanding with BERT Using TensorRT (Blog)
- Quantize BERT with PTQ and QAT for INT8 Inference (Sample)
- Automatic Speech Recognition with TensorRT (Notebook)
- Accelerating Real-Time Text-to-Speech with TensorRT (Blog)
- NLU with BERT (Notebook)
- Real Time Text-to-Speech (Sample)
- Neural Machine Translation (NMT) Using A Sequence To Sequence (seq2seq) Model (Sample Code)
- Building An RNN Network Layer By Layer (Sample Code)
Image and Video
- Accelerating Inference with Sparsity using Ampere Architecture and TensorRT (Blog)
- Achieving FP32 Accuracy in INT8 using Quantization Aware Training with TensorRT (Blog)
- PyTorch-Quantization (QAT) Toolkit (Python Code Sample)
- Optimize Object Detection with EfficientDet and TensorRT 8 (Notebook)
- Estimating Depth with ONNX Models and Custom Layers Using NVIDIA TensorRT (Blog)
- Speeding up Deep Learning Inference using TensorFlow, ONNX, and TensorRT (Semantic Segmentation Blog)
- Object detection with SSD, Faster R-CNN networks (C++ Code Samples)
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.