VisionWorks Toolkit Reference

December 18, 2015 | 1.2 Release

 All Data Structures Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Groups Pages
Programming Model Overview

VisionWorks takes the OpenVX programming model as base.

This section, summarizes the most important aspects of it. More details can be found in the OpenVX specification.

Object Life Cycle

In order to safely manage destruction of objects, VisionWorks uses a reference counter mechanism. The application gets references to objects and can not destroy them directly. Instead, the destruction of an object is managed by the VisionWorks framework and occurs when the reference counter of the object reaches zero, meaning that this object is no longer needed by the application.

The life cycle of any object is then:

  1. Creation: the object is created and the application acquires a reference to the object.
  2. Usage: the application uses the object reference.
  3. Release: the application release the object reference when it does not need to use or access the object anymore.
  4. Destruction: the VisionWorks framework destroys the object when not referenced anymore.

Memory Model

Data containers (such as vx_image, vx_array, ...) are opaque objects, therefore access to their data (such as image pixels, array elements, ...) must be requested explicitly by the application. The handle returned by the request must be released back to VisionWorks before further execution of any primitives that use the data object. This memory model allows VisionWorks to efficiently and safely manage the complex memory hierarchy of heterogeneous hardware systems. For instance, an image can be moved automatically across the system and the resynchronization with the application running on the CPU happens only when needed by the application.

Execution Model

Primitives can be executed in two ways :

  • Graph based execution
    Primitives are instantiated as graph nodes. The graph is built, verified and optimized ahead-of-time and can be executed multiple times without re-verification at run-time. The graph based execution should be prefered for vision pipelines that are executed multiple times (when processing a video stream for instance), as it will give best performance.
  • Immediate execution
    Primitives are executed directly by calling a function (prefixed with vxu or nvxu) similar to the OpenCV or NPP execution mode. The immediate execution model is useful for one-time processing when the primitive setup overhead is not a big concern. It can also be useful as an intermediate step in application development, like when porting an application that uses OpenCV, for example.

Thread Safety

The VisionWorks API may be used from multiple threads, subject to restrictions. VisionWorks provides the following safety guarantees:

  • If there are no other concurrent function calls in progress during the entire execution of the function, then any function may be called with any arguments from any thread.
  • If two or more calls are made concurrently, then the following conditions must hold:

    • All objects that are used by more than one of the calls must be either contexts or data objects (where the term data object has the same meaning as in OpenVX).
    • None of such shared objects must be destroyed in any possible sequential execution of the calls.
    • If the same buffer of data, or overlapping buffers are passed as arguments to more than one of the calls, then none of these calls must write to these buffers.
    • If one of the calls launches or verifies a graph, then no other calls must involve objects that are parameters to one or more nodes of that graph.

    If either of the conditions fails to hold, then the behavior is undefined; otherwise, the effect of the calls is as if they have been made sequentially, in some unpredictable order.

Valid Concurrent Calls

Examples of valid concurrent calls follow.

Valid No Shared Objects Example
vx_uint16 version;
// In thread 1:
// In thread 2:
vxQueryContext(ctx2, VX_CONTEXT_ATTRIBUTE_VERSION, &version, sizeof version);
// Valid: the two calls have no arguments in common.
Valid Shared Context Example
vx_uint32 numKernels;
// In thread 1:
&border, sizeof border);
// In thread 2:
&numKernels, sizeof numKernels);
// Valid: the common object is a context.
Valid Shared Data Object Example
vx_lut lut = vxCreateLUT(ctx, VX_TYPE_UINT8, 256);
void *ptr1 = 0, *ptr2 = 0;
// In thread 1:
vxAccessLUT(lut, &ptr1, VX_READ_ONLY);
// In thread 2:
vxAccessLUT(lut, &ptr2, VX_READ_ONLY);
// Valid: the common object is a LUT, which is a data object.
// Both threads will succeed.
Valid Unpredictable Shared Data Object Example
vx_lut lut = vxCreateLUT(ctx, VX_TYPE_UINT8, 256);
void *ptr1 = 0, *ptr2 = 0;
// In thread 1:
vxAccessLUT(lut, &ptr1, VX_READ_ONLY);
// In thread 2:
vxAccessLUT(lut, &ptr2, VX_WRITE_ONLY);
// Valid: the common object is a LUT, which is a data object.
// However, it's not allowed to map an object for both reading
// and writing. Thus, one of the calls will fail, and it's unpredictable which.
Valid Shared Read-Only Buffer Example
// In thread 1:
&border, sizeof border);
// In thread 2:
&border, sizeof border);
// Valid: `border` is shared between the two calls, but none of them writes to it.

Invalid Concurrent Calls

Examples of invalid concurrent calls follow.

Invalid Shared Framework Object Example
vx_graph graph = vxCreateGraph(ctx);
vx_node node = vxHistogramNode(graph, 0, 0);
// In thread 1:
vxSetParameterByIndex(node, 0, 0);
// In thread 2:
vxSetParameterByIndex(node, 1, 0);
// Invalid: `node` is a non-context framework object, and is shared between the two calls.
Invalid Shared Buffer Example
// In thread 1:
&border, sizeof border);
// In thread 2:
&border, sizeof border);
// Invalid: `border` is shared between the two calls, and the first one writes to it.
Invalid Overlapping Buffers Example
vx_matrix matrix = vxCreateMatrix(ctx, VX_TYPE_UINT8, 2, 2);
vx_uint8 data[6];
// In thread 1:
vxReadMatrix(matrix, &data[0]);
// In thread 2:
vxReadMatrix(matrix, &data[2]);
// Invalid: the given buffers overlap, and are used for writing.
Invalid Object Destruction Example
vx_matrix matrix = vxCreateMatrix(ctx, VX_TYPE_UINT8, 1, 1);
vx_size size;
// In thread 1:
vxQueryMatrix(matrix, VX_MATRIX_ATTRIBUTE_SIZE, &size, sizeof size);
// In thread 2:
vxReleaseMatrix(&matrix);
// Invalid: if the second call is executed first, `matrix`, which is used by the first call,
// is destroyed.
Invalid Graph Execution Example
vx_graph graph = vxCreateGraph(ctx);
vx_image image1 = vxCreateImage(ctx, 10, 10, VX_DF_IMAGE_NV12);
vx_image image2 = vxCreateImage(ctx, 10, 10, VX_DF_IMAGE_IYUV);
vxColorConvertNode(graph, image1, image2);
// In thread 1:
// In thread 2:
vxSetImageAttribute(image1, VX_IMAGE_ATTRIBUTE_SPACE, &space, sizeof space);
// Invalid: an object is accessed concurrently with a verification/execution of a graph
// involving that object.

Multi-Device Systems

A host system (especially a desktop system) can have multiple GPU devices. The VisionWorks API can be used on such systems with some limitations.

A VisionWorks context is bound to a single device that is selected as current prior to VisionWorks context creation:

// Initialize VisionWorks context on device #1
cudaSetDevice(1);

VisionWorks data objects are allocated on that device and all Vision primitives are executed on that device independently from the current device selection. VisionWorks API calls do not affect the current device selection:

// Switch to device #0
cudaSetDevice(0);
// User CUDA allocations will be done on the device #0
void* my_cuda_ptr = NULL;
cudaMalloc(&my_cuda_ptr, 100 * 256 * sizeof(int));
// VisionWorks images will be allocated on the device #1
vx_image im0 = vxCreateImage(context, 640, 480, VX_DF_IMAGE_U8);
vx_image im1 = vxCreateImage(context, 640, 480, VX_DF_IMAGE_U8);
// VisionWorks nodes will be launched on the device #1
vxuBox3x3(context, im0, im1);
// User code will be launched on the device #0
myKernel<<<100, 256>>>(my_cuda_ptr);

To map VisionWorks data objects to CUDA memory space, the VisionWorks context's CUDA device must be selected as current prior to the access call; otherwise, the access call will fail.

// To process the mapped memory, we must switch to the device #1
cudaSetDevice(1);
void* mapped_cuda_ptr = NULL;
vx_rectangle_t rect = { 0, 0, 640, 400 };
vxAccessImagePatch(im1, &rect, 0, &addr, &mapped_cuda_ptr, NVX_READ_ONLY_CUDA);
dim3 block(16, 16);
dim3 grid(640 / 16, 480 / 16);
myKernel<<<grid, block>>>(mapped_cuda_ptr);
vxCommitImagePatch(im1, NULL, 0, &addr, mapped_cuda_ptr);

Imported CUDA device memory must be allocated on the same device that was used for VisionWorks context initialization; otherwise, it leads to undefined behavior. That device must be selected as current when the import function is called.

// Imported memory must be allocated on the device #1
cudaSetDevice(1);
void* import_cuda_ptr = NULL;
size_t pitch;
cudaMallocPitch(&import_cuda_ptr, &pitch, 640 * sizeof(vx_uint8), 480);
addrs[0].dim_x = 640;
addrs[0].dim_y = 480;
addrs[0].stride_x = sizeof(vx_uint8);
addrs[0].stride_y = pitch;
void *ptrs[] = {import_cuda_ptr};
vx_image imported_image = vxCreateImageFromHandle(context, VX_DF_IMAGE_U8, addrs, ptrs, NVX_IMPORT_TYPE_CUDA);

The device that was selected as current prior to VisionWorks context creation can be obtained via vxQueryContext with NVX_CONTEXT_ATTRIBUTE_INITIAL_CUDA_DEVICE_ID.

int device_id;
vxQueryContext(context, NVX_CONTEXT_ATTRIBUTE_INITIAL_CUDA_DEVICE_ID, &device_id, sizeof(device_id));
cudaSetDevice(device_id);