NVIDIA DRIVE Perception



NVIDIA DRIVE® Perception enables robust perception of obstacles, paths, and wait conditions (such as stop signs and traffic lights) right out of the box with an extensive set of pre-processing, post-processing, and fusion processing modules. Together with NVIDIA DRIVE Networks, these form an end-to-end perception pipeline for autonomous driving that uses data from multiple sensor types (e.g. camera, radar, LIDAR). DRIVE Perception makes it possible for developers to create new perception, sensor fusion, mapping and/or planning/control/actuation autonomous vehicle (AV) functionalities without having to first develop and validate the underlying perception building blocks.



DRIVE Perception is designed for a variety of objectives:

  • Developing perception algorithms for obstacles, path, and wait conditions
  • Detecting and classifying objects, drivable space, lanes and road markings, and traffic lights and signs
  • Tracking detected objects (such as other vehicles, pedestrians, road markings) from across frames
  • Estimating distances to detected objects
  • Fusing inputs from sensors of different modalities



Path Perception Ensemble

Path perception ensemble combines several base models and produces an optimal predictive model for drivable paths. Agreement and disagreement analysis between the base models enables generation of real-time confidence metrics.




Surround Camera Object Tracking

Surround camera object tracking software tracks objects in camera images, such as vehicles, pedestrians, and bicyclists, over time, and assigns a unique ID number to each tracked object.





Surround Camera-Radar Fusion

Surround camera-radar fusion is a sensor fusion layer built on top of surround camera and surround radar perception pipelines. It is designed to leverage the complementary strengths of each sensor type and provide quality semantic information as well as accurate position, velocity and acceleration estimates for objects around the car.



At the heart of NVIDIA DRIVE Perception are NVIDIA DRIVE Networks that deliver deep neural network (DNN) models that have been trained on thousands of hours of high-quality labeled data to produce outputs for use in obstacle, path, and wait conditions perception. DRIVE Networks include both convolutional and recurrent neural network models. The DNN modules also include optimized functionality to precondition the input, run inference on a GPU or Deep Learning Accelerator (DLA), and post-process the network output for consumption by the NVIDIA DRIVE™ Perception modules.


Details


Obstacle Perception

The NVIDIA DRIVE Perception pipeline for obstacle perception consists of interacting algorithmic modules built around NVIDIA DRIVE Networks DNNs, along with DNN post-processing. Capabilities include:

Camera-based:

  • Obstacle detection and classification, including cars and pedestrians, as well as distance to object detection (based on DriveNet DNN)
  • Drivable free-space detection (based on OpenRoadNet DNN)
  • Camera image clarity detection and classification (based on ClearSightNet DNN)
  • Semantic motion segmentation (SMS) for detection of both static and dynamic objects

Radar-based:

  • Surround obstacle detection and tracking over time


DriveNet

DriveNet is used for obstacle perception. It detects and classifies objects such as vehicles, pedestrians, and bicycles. DriveNet also includes temporal models for future object motion prediction.




OpenRoadNet

OpenRoadNet detects drivable free space around objects. It predicts the boundary that separates space occupied by obstacles from unoccupied driveable space.





Path Perception

The NVIDIA DRIVE Perception pipeline for path perception consists of interacting algorithmic modules built around NVIDIA DRIVE Networks DNNs, including DNN post-processing and the ability to consume HD Map input. Capabilities include:

  • Camera-based path perception (using PathNet DNN)
  • Lane, roadmarkings, and landmark detection (using MapNet DNN)
  • Path perception signal generation using HD Map input data
  • Machine learning algorithms that enable diversity and redundancy in path perception by combining multiple individual path perception signals (e.g. multiple DNN-based outputs, HD Map-based outputs, egomotion-based outputs) and generating a combined (ensemble) path perception output along with a confidence metric

PilotNet

PilotNet is trained on human driving behavior to predict driving trajectories for lane keeping, lane changes, as well as lane splits and merges.

PathNet

PathNet predicts all the drivable paths and lane dividers in images, regardless of the presence or absence of lane line markings.

MapNet

MapNet detects visual landmarks such as lane lines, crosswalks, text marks, and arrow marks on the road surface. It can detect features useful for path perception, as well as mapping and localization.



Wait Perception

The NVIDIA DRIVE Perception pipeline for wait conditions perception consists of interacting algorithmic modules built around NVIDIA DRIVE™ Networks DNNs, including DNN post-processing and the ability to consume HD Map input. Capabilities include:

  • Camera-based wait condition perceptions, such as perception of intersections, traffic lights, and traffic signs (using WaitNet DNN)
  • Camera-based traffic light state classification (using LightNet DNN)
  • Camera-based traffic sign type classification (using SignNet DNN)

WaitNet

WaitNet detects intersections, classifies intersection type, and estimates the distance to the intersection. WaitNet also detects traffic lights and traffic signs.


LightNet

LightNet classifies traffic light types (solid vs. arrows) as well as traffic light state (e.g. red vs. green vs. yellow).


SignNet

SignNet classifies traffic sign types (e.g. stop, yield, speed limit, etc.)




Advanced Functions Perception

The NVIDIA DRIVE Perception pipeline for advanced functions perception consists of interacting algorithmic modules built around NVIDIA DRIVE Networks DNNs, including DNN post-processing. Capabilities include:

  • Camera-based assessment of the cameras’ ability to see clearly (using ClearSightNet DNN)
  • Camera-based light source perception for automatic high beam control (using AutoHighBeamNet DNN)

ClearSightNet

ClearSightNet determines where the camera view is blocked and classifies the output in one three classes (clean, blurred, blocked).


AutoHighBeamNet

AutoHighBeamNet generates a binary on/off control signal for automatic high beam control.