Technical Walkthrough

Training Like an AI Pro Using NVIDIA TAO AutoML

Discuss (2)

There has been tremendous growth in AI over the years. With that, comes a larger demand for AI models and applications. Creating production-quality AI requires expertise in AI and data science and can still be intimidating for many developers.

To develop accurate AI, you must choose what model architecture to use, what data to collect, and finally how to tune the model to meet the desired KPIs. There are thousands of combinations of model architectures and hyperparameters that you’d have to try to get the best model for your specific use case. This process is extremely laborious and requires model architecture expertise to tune hyperparameters.

Automated machine learning (AutoML) automates the manual task of finding the best models and hyperparameters for the desired KPI. It can algorithmically derive the best model for your given KPI and abstract away a lot of the complexity of AI model creation and optimization.

AutoML makes it easy for even a novice developer to create a highly accurate AI model.

AutoML in TAO

AutoML in TAO is fully configurable for automatically optimizing the hyperparameters of a model, which reduces the need for manual tuning. It caters to both AI experts and non-experts.

  • For non-experts, the guided Jupyter notebook provides a simple, efficient way to create an accurate AI model.
  • For experts, TAO gives you full control of which hyperparameters to tune and which algorithm to use for sweeps.

TAO currently supports two optimization algorithms: Baysian and Hyperband optimization. These algorithms can effectively sweep across a range of hyperparameters to find the best combination to optimize the user-provided metric.

Hyperband yields faster because it doesn’t have to run through the entire training configuration. It runs for a limited number of epochs, discards the runs that are performing poorly, and only continues on the remaining runs. This process of elimination continues until there is a single configuration that gives the best results.

For Bayesian, the training runs to completion for all the sweeps.

AutoML is supported for a wide range of CV tasks: image classification, object detection, segmentation, and OCR. Table 1 shows the full list of supported networks.

Image ClassificationObject DetectionSegmentationOCR
SqueezeNet VGG16/19
DetectNet_v2 SSD/DSSD    
Table 1. Networks supported by TAO AutoML

Getting started with AutoML

The entire AutoML workflow can be run from the provided Jupyter notebooks. AutoML uses the TAO API services to manage all training jobs.

TAO API services

TAO API is a Kubernetes service that enables deployment of TAO as a microservice either on your own Kubernetes cluster or with cloud Kubernetes services such as Amazon EKS or Azure AKS.

TAO API services provide an additional layer of abstraction over containers. You can manage and deploy TAO services using Helm charts and remotely run jobs using REST API calls. With the APIs, you can remotely create and upload datasets, run training jobs, evaluate models, and export models for deployment.

Diagram shows a cloud-native TAO API service architecture with a front-end service container that receives the HTTP request for downstream applications and orchestrates the appropriate TAO training container.
Figure 1. Cloud-native TAO API service architecture

API services make it easy to integrate TAO into your own custom application or build a web-UI application on top of TAO. To get started with building your custom applications with REST APIs, see the API guide and the API notebooks in TAO Toolkit Getting Started. For more information, see the Notebook section later in this post.

To train using the CLI, use the lightweight CLI client application that you can install on a client system to access TAO services and CLI notebooks. CLI notebooks are available on NGC in TAO getting started resources.

AutoML requires a higher-level service on top of your training runs to determine and manage the set of experiments. TAO services keep track of all the experiments that they have tried with the KPIs and build the next set of experiments to improve on the KPIs. You can run AutoML with TAO API services either through the remote CLI application or directly using REST APIs. Jupyter notebooks for both are provided. For more information, see the Notebook section.

The REST API notebooks mainly serve as a reference if you are building your own applications or UI on top of TAO.

Set up TAO services

The TAO API service can run on any Kubernetes platform. To simplify the deployment of TAO services, we have provided a one-click deploy script. This simplifies the deployment of TAO services on bare-metal setup or on Amazon EKS. For this post, we use the bare-metal setup but instructions to deploy on the cloud are provided in the API guide.


  • An NVIDIA GPU (on-premises or in the cloud):
    • NVIDIA Volta architecture
    • NVIDIA Turing architecture
    • NVIDIA Ampere architecture
    • NVIDIA Hopper architecture
  • TAO Toolkit 4.0
  • Ubuntu 18.04 or 20.04

Download the one-click deploy folder using the NGC CLI:

ngc registry resource download-version "nvidia/tao/tao-getting-started:4.0.0"

Change the current directory:

cd tao-getting-started_v4.0.0/cv/resource/setup/quickstart_api_bare_metal

Add the host IP address and login credentials in the hosts file. This is the system where you plan to run the TAO services. It could be a local or remote system but you must have sudo privileges.

For credentials, you can use either a password (ansible_ssh_pass) or an SSH private key file (ansible_ssh_private_key_file). For a single-node cluster, you can list only the master node.

File name: hosts:

<IP Address> ansible_ssh_user='<username>' ansible_ssh_pass='<password>'
<IP Address> ansible_ssh_user='<username>' ansible_ssh_pass='<password>'

You can validate SSH credentials for remote machines with the following command. The proper answer would be root.

ssh <username>@<IP Address> 'sudo whoami'

Next, modify the tao-toolkit-api-ansible-values.yml file to add your NGC credentials and the Helm chart. This pulls the Helm chart from the NGC registry. For more information, see Generating Your NGC API Key.

File name: tao-toolkit-api-ansible-values.yml

ngc_api_key: <NGC API Key>
ngc_email: <NGC email>
api_values: ./tao-toolkit-api-helm-values.yml
cluster_name: tao-automl-demo

Install the dependencies and deploy the TAO service. Before installation, first check whether all the dependencies are met by running check-inventory.yml. If everything looks good, you should see a message that says 0 failed. Then, run install, which takes 10–15 minutes.

bash check-inventory.yml
bash install

Download AutoML notebooks

Download the computer vision training resources from TAO Toolkit Getting Started on NGC.

wget --content-disposition -O
unzip -u  -d ./getting_started_v4.0.0 && 
rm -rf && cd ./getting_started_v4.0.0

All the AutoML-related notebooks are inside the TAO API directory. The notebooks are provided in the directory structure:

     |--> tao_api_starter_kit
          |--> API
               |--> automl
               |--> end2end
               |--> dataset_prepare
          |--> client
               |--> automl
                    |--> classification.ipynb
                    |--> object_detection.ipynb
                    |--> segmentation.ipynb
                    |--> lprnet.ipynb        
               |--> end2end
               |--> dataset_prepare

For this post, use the object detection notebook (TAO API Starter Kit/Notebooks/client/automl/object_detection.ipynb) but you can also do AutoML on other computer vision tasks.

Use AutoML to fine-tune an object detection model with TAO

Here is a quick walkthrough of the AutoML workflow with the Object Detection AutoML notebook. For this walkthrough, you use the client/automl/object_detection.ipynb notebook from the hierarchy shown earlier. We highlight the key steps here, but all the steps are captured in the Jupyter notebook.

Diagram shows the end-to-end workflow of running AutoML in TAO. User starts with selecting model topology, create and upload dataset, configuring parameters, training with AutoML to comparing the model.
Figure 2. Step-by-step TAO AutoML workflow

Select a model topology

Choose any one of the available models listed for that notebook. Each notebook has a default model for that domain. In this example, the default model is DetectNet V2 but you can change it to FasterRCNN, SSD, DSSD, Retinanet, EfficientDet, Yolo V3, Yolo V4, or YoloV4 tiny.

model_name = "detectnet-v2"

Create a dataset

The next step is to use the dataset given as an example in the notebook or use your own dataset. The folder structure of the dataset requirement is provided in the notebooks.

train_dataset_id = subprocess.getoutput(f"tao-client {model_name} dataset-create --dataset_type object_detection --dataset_format {ds_format}")

eval_dataset_id = subprocess.getoutput(f"tao-client {model_name} dataset-create --dataset_type object_detection --dataset_format {ds_format}")

Upload the dataset

When you have the dataset ready, upload it to the TAO Toolkit REST API-deployed machine through the Unix rsync command for TAO-Client notebooks. You must upload the images and labels for both training and validation data.

rsync -ah --info=progress2 {TRAIN_DATA_DIR}/images ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
rsync -ah --info=progress2 {TRAIN_DATA_DIR}/labels ~/shared/users/{os.environ['USER']}/datasets/{train_dataset_id}/
rsync -ah --info=progress2 {VAL_DATA_DIR}/images ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/
rsync -ah --info=progress2 {VAL_DATA_DIR}/labels ~/shared/users/{os.environ['USER']}/datasets/{eval_dataset_id}/

Convert the dataset

After the dataset has been uploaded, convert the dataset to tfrecords through the dataset-convert action. All object detection models require dataset conversion, but some models from other domains, like classification, can operate on the raw data uploaded.

train_convert_job_id = subprocess.getoutput(f"tao-client {model_name} dataset-convert --id {train_dataset_id}  --action {convert_action} ")

Configure the AutoML parameters

The next step is to choose which AutoML algorithm to run. There are options to tweak some AutoML-specific parameters. You can view the parameters that are enabled by default for the AutoML search for a model, along with all the parameters that are available for a network

tao-client {model_name} model-automl-defaults --id {model_id} | tee ~/shared/users/{os.environ['USER']}/models/{model_id}/specs/automl_defaults.json

This outputs a list of hyperparameters that are used for AutoML. For this experiment, you are choosing five different hyperparameters to sweep.

"training_config.learning_rate.soft_start_annealing_schedule.min_learning_rate",   "training_config.learning_rate.soft_start_annealing_schedule.annealing",

You can add additional parameters or remove existing defaulted parameters. For example, to sweep the soft_start hyperparameter, add the following in your notebook:

additional_automl_parameters = [“training_config.learning_rate.soft_start_annealing_schedule.soft_start”]

There are also options to tweak algorithm-specific parameters, but the default parameters work well. For more information, see AutoML.

Train with AutoML

At this point, you have all the tools necessary to start the AutoML run. You can also change default training specs, like image extension or class mapping, before triggering the AutoML run:

train_job_id = subprocess.getoutput(f"tao-client {model_name} model-train --id " + model_id)

When AutoML run starts, you can see various stats, such as the best accuracy score at that time, how many experiments have been completed, an approximate estimated time for completion, and so on. You should see an output log similar to the following.

  "best_map": 0.59636,
  "Estimated time for automl completion": "23.13 minutes remaining approximately",
  "Current experiment number": 3,
  "Number of epochs yet to start": 429.0,
  "Time per epoch in seconds": 3.24

Compare models

At the end of the AutoML run, you can see the results of all experiments. You are presented with the spec file and the binary weight file of the model achieving the highest accuracy among the AutoML sweeps.

Checkpoints for the best performing experiment
Folder: /home/nvidia/shared/users/95af85a9-805c-5680-b01a-3c85ed70f009/models/4f22c462-1d97-4537-99b2-15ee69eb2660/168d6149-6c47-40e6-b6a3-267867cea551/best_model/weights
Files: [epoch-80.tlt']

Results of all experiments
    id   result
0    0  0.43636
1    1  0.41818
2    2  0.53636
3    3  0.44545
4    4  0.33636
5    5  0.44545
6    6  0.53636
7    7  0.53636
8    8  0.61636
9    9  0.62727
10  10  0.593636
11  11  0.52727
12  12  0.53636
13  13  0.54545
14  14  0.61636
15  15  0.60909
16  16  0.5636
17  17  0.54545
18  18  0.53636
19  19  0.53636

The spec file for the best-performing experiment is stored in the following directory:


The best model for this experiment was ID 9 with a mAP of 0.627. This is stored in the best_model/recommendataion_9.kitti file.

After saving the best model obtained from AutoML, you can plug the model and spec file in the end-to-end notebook and then prune and optimize the model for inference.

Diagram starts with training using AutoML and moves through model evaluation, pruning, re-training, model evaluation again, to exporting and then optimizing the model for inference.
Figure 3. End-to-end workflow from AutoML training to model optimization

To plug the model into the new notebook, copy the train job ID from the AutoML notebook. The AutoML train job ID is printed when you run the training job.

train_job_id = subprocess.getoutput(f"tao-client {model_name} model-train --id " + model_id)

When you have the train job ID, open the end-to-end notebook from the notebook hierarchy from earlier. For this post, use the TAO API Starter Kit/Notebooks/client/end2end/detectnet_v2.ipynb notebook. As you have already trained a model, just run the import statement in the first cell and skip all the way down to the Run Evaluate section. In this section, create a code cell before evaluating.

train_job_id = “id_you_copied”

After adding the job_map code cell, you can evaluate the model, prune the model for compression, and even do a quantization-aware training of the original model or the pruned model as showcased in the end-to-end notebooks


We trained various models with AutoML on public datasets to see how much improvement in accuracy we could achieve. We compared the best AutoML-based accuracy against a baseline accuracy number from the default spec file provided in the packages. The results are in Table 2.

For accuracy, we used mAP (mean average precision) for object detection, average accuracy of all classes and tasks for image classification, and mean IoU (intersection over union) score for semantic segmentation.

TaskModelBaseline Accuracy
(default spec)
Best AutoML accuracyDataset
Object DetectionDetectNet_v2 – ResNet1844.1651.37FLIR
Object DetectionFasterRCNN – ResNet1856.4260.44FLIR
Object DetectionYOLOv4 – ResNet1840.1263.46FLIR
Object DetectionYOLOv3 – ResNet1842.3661.84FLIR
Object DetectionRetinaNet – ResNet1850.5463.09FLIR
Image ClassificationResNet1853.9566.28Pascal VOC
Semantic SegmentationUNET71.6476.65ISBI
Table 2. Accuracy gain across networks by using AutoML

Across all the models that we tested, the increase in model accuracy is substantial compared to static default hyperparameters. The amount of improvement varies depending on the model, but we have generally seen improvements ranging from 5% to more than 20%. This shows that AutoML can work on various datasets to train the best model for a given KPI.


As the number of use cases and customization grows, it becomes imperative to accelerate the AI creation process. AutoML can eliminate the need for manual tuning, saving valuable time for developers.

With TAO AutoML, you can now automatically tune models for object detection, classification, and segmentation use cases using various popular model architectures. TAO AutoML provides simplicity for novice users to get started, as well as configurability for experts to choose their own hyperparameters to sweep.