Access to the raw depth data

Because of the way OpenNI is structured, it isn't really possible for multiple applications to access the sensor at the same time. Specifically,

If two applications are accessing the sensor at the same time, OpenNI appears to alternate which of them receives a given frame of depth data, which makes the hand tracking really slow and unpleasant to use.
The sensor can only be in one configuration, and we require the sensor to be in its highest resolution mode (640x480) while OpenNI defaults to 320×240.

Therefore, to support the simultaneous use of multiple applications that rely on the depth sensor, we make the depth data available through the use of memory-mapped files. This is an extremely efficient form of interprocess communication.

Using the memory-mapped file

The segmentation is written to the file segmentation in the DataDirectory specified in your config.yml. (If no DataDirectory is specified, it defaults to C:\Users\<Username>\AppData\Local\3Gear[PMD] on Windows or /Users/<Username>/Library/Application Support/3Gear[PMD] on Mac.) You can open it with the memory-mapped file routines specific to your OS (e.g. mmap or CreateFileMapping), but at 3Gear we find the easiest thing to do in C++ is to use the boost interprocess module.

To determine the size of the depth image and the associated camera parameters, you should look at the camera parameters that are passed in the WelcomeMessager. The OpenCVCamera class also provides the data necessary for going from depth values to 3D points; you'll need to use routines in OpenCV if you want to invert out the distortion, however.

using HandTracking::OpenCVCamera;

const char* segmentationFilePath = "/my/data/dir/segmentation";
boost::interprocess::file_mapping depthDataMMappedFile (
    segmentationFilePath, boost::interprocess::read_only);
boost::interprocess::mapped_region depthDataMMappedRegion (
    depthDataMMappedFile, boost::interprocess::read_only);

const OpenCVCamera& camera = welcomeMessage.getCameras().front(); // Saved from the WelcomeMessage
const size_t imageWidth = camera.imageWidth;
const size_t imageHeight = camera.imageHeight;
const size_t numCameras = 1;
const size_t numHands = 2;
const size_t nTotalPixels = numCameras * (3 + numHands) * imageWidth * imageHeight;
if (depthDataMMappedRegion->get_size() != (nTotalPixels * sizeof(float)))
{
    perror ("Mapped segmentation region is not the correct size.");
    exit(1);
}

const float* rawData = reinterpret_cast(_depthDataMMappedRegion->get_address());
const float* bgSubData = rawData + nCameras * imageWidth * imageHeight;
const float* leftSeg = bgSubData + nCameras * imageWidth * imageHeight;
const float* rightSeg = leftSeg + nCameras * imageWidth * imageHeight;

The file will get updated every time a Pose message is sent. The code is structured such that the update to the file is guaranteed to finish before the message is sent. However, beyond this there is no synchronization used on the file (since we only use it for debugging, we did not think it necessary). If you plan to do additional processing on the data, we recommend copying it to another location as soon as you get it. Since it's only updating at 30fps, this is likely to be good enough for most applications (the odds that it will update before you get a chance to copy it are fairly low); however, if necessary we could certainly add interprocess synchronization in the future.

File format

There are four main sections in the depth data file:

The raw depth data, as retrieved from the camera.
The raw data with the background subtracted out (according to our internal background model),
The segmentation of the left hand, and
The segmentation of the right hand.

Each section has the same format: a single width×height image for each available depth camera (width and height values should be read from the WELCOME message; see below).

Each depth data value is a single floating point number, representing the depth in meters. Values of 0 and FLT_MAX represent data that has been culled, for various reasons:

In the raw depth data, a zero value indicates that the camera returned no data for that point.
In the background-subtracted data, a zero value indicates that the data at that point was considered part of the background.
In the hand segmentation images, values of FLT_MAX indicate that the point does not belong to the given hand.

Camera model

You may wish to show the raw depth data in the same space that we track hands in, which will require understanding 3Gear's camera model. We use the OpenCV camera model. Consult the OpenCV documentation for further details about this model.

Getting the camera parameters

Details about the camera model can be read from the WELCOME message that the server sends out upon connection. The camera is assumed to be always fixed throughout the course of tracking.

extrinsics: The matrix taking world-space points (those returned by the various HandTrackingMessages) to camera space, where we are looking down the z axis.
imageWidth imageHeight: The size of the image.
fx fy: The focal lengths, in pixels, as specified in the OpenCV camera model.
cx cy: The camera center, from the OpenCV model.
k1…p2: The distortion parameters, from the OpenCV model.

The OpenCVCamera class contains routines for going from world space to image space. To go from image-space z values to world-space z values, if there is distortion you will likely need to use the OpenCV routine undistortPoints.