Data annotations

We provide man-made annotations of several kinds:

- Temporal segmentation: all temporal segmentation for the devel01 to devel20 batches into individual gestures;

- Body part annotations: the position of the head, shoulders, elbows and hands for over 400 frames sampled from the devel01 to devel20 batches;

- Image alignment: translation and scale transform to align the RGB image with the depth image.

- Lexicon name and user ID: for each batch, we provide the name of the lexicon and the user ID in Info_devel_valid.txt. The file also includes depth normalization information, which can be used to restore the actual depth value and data quality information. Post-challenge: We are now releasing this information for the final datasets: Info_final1.txt and Info_final2.txt. See README.txt for details.

- Lexicon database [New]: All the gesture lexicons used to collect the gesture challenge data and a few extra ones.

- Batch categorization: for each bath, we provide 18 features that were extracted by data visualization, including:

Body-movement (static/dynamic), Gesture (static/dynamic), Body parts involved (arms/hands/fingers), Background (blank or not), Speed (slow/medium/fast), Skin color, etc. The information is gathered in an Excel spreadsheet.

- Data labels [New]: To facilitate research, we are providing the labels for the validation and final data, which were not available to the challenge participants.

[ See the bottom of this page to download the archives ]

Temporal segmentation (February 2012)

We manually determined the starting frame and the end frame of each gesture in the videos by watching them.

We show in Figure 1 the amount of motion in the lower part of the image in a sample video and the corresponding human temporal segmentation. Because the hand returns to a resting position between each gesture, there are peaks of hand motion near the segmentation points.

Devel01 K12

Figure 1: Temporal segmentations. We represent the amount of motion in the lower part of an example of video as a function of time. We show in green the start of the gesture and in red the end of the gesture. The gesture labels are displayed in each segment.

Body part annotations (February 2012)

We manually determined the position of head, shoulders, elbows and hands in a number of movie frames.

We show in Figure 2 two examples.

devel16 K24

Figure 2: Body part annotations. The face is located with a blue square. The hands positions are identified by red and green rectangles. However, if the position of the hand is outside the image, the guessed location is indicated with a circle. The shoulder and elbow locations points are also shown. Circles are used if those positions are uncertain.

Image alignment (May-June 2012)

May: 260 development set frames from videos in devel01-20 have been annotated with 4 parameters: scale_x, scale_y, translation_x, translation_y, representing the geometrical transformation (scaling and translation) necessary to align the depth image to the RGB image. See Figure 3 for an example and Figure 4 for an explanation of how to use these parameters.

June: average alignment parameters for every batch devel01-20, valid01-20, final01-40.

Before alignment
After alignment

Figure 3: Image alignment. On the left side we show the body contour extracted from the depth image overlaid on top of the RGB image. On the right we show the same thing after the contour has been rescaled and translated.

Rescale

Figure 4: Parameters of the transformation of the depth image so it may be overlaid on the RGB image.