如何使用 NVIDIA DeepStream 编码智能体构建视觉 AI 工作流

开发实时视觉 AI 应用给开发者带来了重大挑战，通常需要复杂的数据工作流、无数行代码和漫长的开发周期。

NVIDIA DeepStream 9 使用编码智能体 (例如 Cloude Code 或 Cursor) 消除了这些开发障碍，可帮助您轻松创建可部署、优化的代码，从而更快地将视觉 AI 应用变为现实。

这种新方法简化了构建复杂的多摄像头工作流的流程，这些工作流可以提取、处理和分析大量实时视频、音频和传感器数据。DeepStream 基于 GStreamer 构建，是 NVIDIA Metropolis 视觉 AI 开发平台的一部分，加速了开发者从概念到跨行业可行见解的过程。

视频 1. 如何使用 NVIDIA DeepStream 编码代理，通过自然语言提示词生成完整的视觉 AI 工作流。

如需观看一段录像，了解如何使用 Cloud Code 或 Cursor 构建 DeepStream 视觉 AI 工作流，请单击此处。

使用 NVIDIA Cosmos Reason 2 构建视频分析应用

您可以使用 NVIDIA Cosmos Reason 2 (适用于物理 AI 的更准确、更开放、更推理的 VLM) 构建视频分析应用，同时提取数百个摄像头流，并使用视觉语言模型 (VMA) 分析这些流。

该应用可动态扩展，不会浪费用于添加摄像头或交换模型的重新部署时间，也不会猜测瓶颈。编码智能体能够理解您的硬件，并生成针对其优化的应用。

只需几行代码，一个提示词就能在一个开发会话中生成完整的生产级微服务，其中包含 REST API、运行状况监控、部署自动化和 Kafka 集成。

如何生成由 VLM 驱动的视觉 AI 应用：

第 1 步： 为 Cloud Code 或 Cursor 安装 DeepStream Coding Agent 技能。您可以在任何地方生成代码，但部署所需的最低硬件数量（GitHub 上列出）。

第 2 步：将以下提示粘贴到智能体中，以生成可扩展的 VLM 工作流，并支持动态 N 流摄取和每流批处理。

Implement a Python application that uses a multi-modal VLM to summarize video frames and sends summaries to a remote server via Kafka.
Architecture:
  1. DeepStream Pipeline: Use DeepStream pyservicemaker APIs to receive N RTSP
     streams, decode video, and convert frames to RGB format. Process each stream
     independently — do not mux streams together.
  2. Frame Sampling & Batching: Use MediaExtractor to sample frames at a
     configurable interval (e.g. 1 frame every 10 seconds). When the VLM supports
     multi-frame input, batch sampled frames over a configurable duration (e.g.
     1 minute) before sending to the model. Each batch must contain frames from a
     single stream only.
  3. VLM Backend: Implement a module that receives a batch of decoded video frames
     and returns a text summary from the multi-modal VLM.
  4. Kafka Output: Send each text summary to a remote server using Kafka.
  Constraints:
  - Scalable to hundreds of RTSP streams across multiple GPUs on a single node.
    Distribute processing load across all available GPUs.
  - Never mix frames from different RTSP streams in a single batch.
  Store output in the rtvi_app directory.
  Also generate a README.md with instructions to setup kafka server, vLLM, and
  how to run the application.

您可以自定义帧采样间隔等参数 (例如，每 10 秒 1 帧；Cosmos-Reason2-8B 不会施加固定的帧限制，而是使用较大的上下文窗口 (高达 256K 词元) ，并根据 fps 和分辨率动态采样帧。

第 3 步：现在您有了一个正在工作的应用程序，让我们将其部署就绪。

再加上一个提示，您可以将其转换为完整的生产微服务，并配备表示状态传输 (REST) API，以动态管理流、用于编排的运行状况探针、可观察性指标、用于容器化的 Dockerfile 以及部署脚本，从而在几分钟内让其运行：

Need to create microservice for the app in @rtvi_app directory. Follow the
  steps below to complete that.
  - Create FastAPI based server and implement the endpoints mentioned in the
    attached image @rtvi_vlm_openapi_spec.png.
  - Create dockerfile to package the everything together which can later be used
    to generate docker image.
  - Create deployment guide to run the microservice.
  IMPORTANT
  - Need to generate production ready code and don't create dummy implementation for any of the endpoint.
  - Update the code in @rtvi_app if it is required for having the working
    implementation of the endpoints.

第 4 步：生成的代码将具有部署脚本，并可通过 Swagger UI (网址为 http://localhost:8080/docs 或 curl) 访问 API。您可以在 GitHub 上找到与此页面类似的页面。

使用任何模型生成高效的实时 CV 应用

现在我们来深入了解一下。假设您想使用 YOLOv26 等开源模型构建实时应用。要将任何模型插入 DeepStream，您需要了解三件事：

输入张量 — 形状和缩放 (例如，[batch, 3, 640, 640], 归一化像素)

输出张量 — 输出张量的名称和形状 (例如 [batch, 300, 6] where each row is [x1, y1, x2, y2, conf, class_id])
后处理 — 例如，从原始模型输出中提取最终检测结果所需的任何操作都是模型中内置的非极大值抑制 (NMS) ，或者是在模型最后一层之后作为后处理步骤所需的操作。

您可以从模型卡中获取这些内容，也可以使用任何模型可视化/ 检查工具 (例如 Netron、VisualDL、Zetane) ，或者直接运行 onnx.load() 并打印图形的输入/ 输出形状。或者跳过所有这些，将模型文件直接馈送给编码代理，它将为您检查模型，并提取模型检查所需的正确库。

以这种方式思考：您将自定义模型引入 DeepStream 的硬件优化视频分析工作流。在介绍模型 (输入形状和输出格式) 后，DeepStream 会负责其余工作；高效的缓冲区管理可充分利用 GPU 解码、计算和下游处理，为硬件提供出色的延迟。

使用 DeepStream 编码代理生成 YOLOv26 检测应用的步骤如下：

第 1 步：确保您已安装 DeepStream Coding Agent 技能以及部署所需的最低硬件。安装适用于 Cloud Code 或 Cursor 的 DeepStream Coding Agent 技能。您可以在任何地方生成代码，但部署所需的最低硬件数量 (在 GitHub 上列出) 。

第 2 步：将此提示粘贴到代理中：

Download the YOLO26s detection model using the ultralytics library, then convert the model to ONNX model that supports dynamic batch, in a Python virtual environment. Write a DeepStream custom parsing library for the model. Use DeepStream SDK pyservicemaker APIs to develop the python application that can do the following.
 - Read from file, decode the video and infer using the model.
 - The custom parsing library is used in nvinfer's configuration file.
 - Display the bounding box around detected objects using OSD.
 Save the generated code in yolo_detection directory.
 The app should support RTSP streams as input.

第 3 步： 智能体生成包含多个文件 (模型下载脚本、工作流应用、推理配置文件等) 的完整应用。

我们来重点介绍对模型集成至关重要的文件：推理配置文件。您需要了解的三项内容 (输入张量、输出张量和后处理) 准确显示在推理配置文件中：

输入张量: 这将告知 DeepStream 如何预处理上游 GPU 缓冲区 (将大小调整为 640 × 640 并将像素值缩放 1/ 255) ，并将其输入 TensorRT。首次运行时，ONNX 文件会自动转换为 TensorRT 引擎，并针对您的确切 GPU 和批量大小进行优化。

推理配置将具有：

infer-dims=3;640;640
net-scale-factor=1/255
onnx-file=yolo26s.onnx

输出张量和后处理: 代理生成一个 NvDsInferParseCustomYolo 函数，用于读取名为示例的输出 Blob：output0 in yolo26s – 一个 [300, 6] 张量，其中每行为 [x1, y1, x2, y2, conf, class_id]，并将每个检测结果转换为 NvDsInferObjectDetectionInfo struct。

extern "C" bool NvDsInferParseCustomYolo(
      std::vector<NvDsInferLayerInfo> const &outputLayersInfo,
      ...
      std::vector<NvDsInferObjectDetectionInfo> &objectList)
  {
      // Find output layer "output0" → [300, 6]
      ...
      const float *det = data + i * 6;
      // [x1, y1, x2, y2, conf, class_id]
      obj.classId = static_cast<unsigned int>(det[5]);
      obj.detectionConfidence = det[4];
      obj.left = det[0];  obj.top = det[1];
      obj.width = det[2] - det[0];  obj.height = det[3] - det[1];
      ...
      objectList.push_back(obj);
  }

这是在下游 NvDsBatchMeta 中填充 ObjectMeta 的内容。推理配置将具有：

custom-lib-path=nvdsinfer_custom_impl_yolo/libnvdsinfer_custom_impl_yolo.so
parse-bbox-func-name=NvDsInferParseCustomYolo
output-blob-names=output0

第 4 步：要将其转换为生产微服务，就像上面的 VLM 应用示例 (第 3 步) 一样，请使用类似的提示为流管理、运行状况探针、指标、Dockerfile 和部署脚本添加 FastAPI 端点

第 5 步：使用生成的脚本进行部署，并通过 http://localhost:8080/docs 或 curl 上的 Swagger UI 访问 API。

这两种应用只是开始。从多摄像头追踪到音频分析，再到自定义推理链，相同的技能可以生成任何 DeepStream 工作流。

查看资源库中的更多示例提示。使用以下代码作为参考，为您可以想象到的任何视觉 AI 应用编写自己的提示：

重新定义视觉 AI 开发

DeepStream 通过代理式工作流加速视觉 AI 开发，将编码时间从数周缩短到数小时。借助自然语言提示，开发者可以立即插入模型、配置摄像头流并部署分析应用程序。DeepStream 针对 NVIDIA 硬件进行了优化，与通用工作流相比，它能按美元计算提供更多的流和分析，从而更大限度地提高从边缘到云的性能。

开始使用

在 NGC 上下载适用于最新 SDK，开始使用 Jetson、数据中心 GPU 或云端的 DeepStream。