Using a Sequential DataLoader to Create a Training Loop

In this tutorial, we will learn how to load data in a sequential way via our classes metavision_ml.data.sequential_dataset.SequentialDataLoader and metavision_ml.data.sequential_dataset.SequentialDataset.

Import Libraries

import os
import glob
import numpy as np
import cv2
from functools import partial

from metavision_ml.data import SequentialDataLoader
from metavision_ml.data import box_processing as box_api

Download the Data Sample

Here is the link to download the files used in this tutorial.

You should now have the following files:

Suggestions on organizing your dataset

The data sample we provided here is only used for illustration purpose. In practice, we suggest you to separate train, test and validation data in distinct folders, using the following structure:

dataset_folder/
├── train/
│   ├── file_1.h5
│   ├── file_1_bbox.npy
│   ├── file_2.h5
│   ├── file_2_bbox.npy
├── test/
│   ├── file_3.h5
│   ├── file_3_bbox.npy
│   ├── file_4.h5
│   ├── file_4_bbox.npy
├── val/
│   ├── file_5.h5
│   ├── file_5_bbox.npy
├── possibly a readme and some metadata file (JSON etc.)

Note that when using the SequentialDataLoader, we suggest to precompute the tensor features as explained in our data preprocessing tutorials. This will reduce the computation time. However, metavision_ml.data.sequential_dataset.SequentialDataLoader can also use DAT files directly.

Load Labels

In supervised training, in addition to the training data, we need ground truth labels as well. Depending on the type of training, the labels might come in very different formats. To facilitate training with various label formats, we provide a template function that can be used to write your own label loading functions.

First, let’s see how this template function looks like in our ML module.

from metavision_ml.data.sequential_dataset import load_labels_stub
help(load_labels_stub)
Help on function load_labels_stub in module metavision_ml.data.sequential_dataset:

load_labels_stub(metadata, start_time, duration, tensor)
    This is a stub implementation of a function to load label data.

    This function doesn't actually load anything and should be passed to the SequentialDataset for
        self-supervised training when no actual labelling is required.

    Args:
        metadata (FileMetadata): This class contains information about the sequence that is being read.
            Ideally the path for the labels should be deducible from `metadata.path`.
        start_time (int): Time in us in the file at which we start reading.
        duration (int): Duration in us of the data we need to read from said file.
        tensor (torch.tensor): Torch tensor of the feature for which labels are loaded.
            It can be used for instance to filter out the labels in area where there is no events.
    Returns:
        labels should be indexable by time bin (to differentiate the labels of each time bin). It could
            therefore be a list of length *num_tbins*.
        (boolean nd array): This boolean mask array of *length* num_tbins indicates
            whether the frame contains a label. It is used to differentiate between time_bins that actually
            contain an empty label (for instance no bounding boxes) from time bins that weren't labeled due
            to cost constraints. The latter timebins shouldn't contribute to supervised losses used during
            training.

As you can see, the function returns both a list of labels and a boolean mask indicating if the corresponding time bins are labeled or not. During training, this boolean mask will be used to filter out unlabeled time bins so that no loss will be computed on them.

Note

Since our SequentialDataLoader deals with event-based data, the ground truth labels should be timestamped. This is required for playing labels synchronously with the event features.

Customize a Label Loading Function

Now let’s create a function to load detection bounding boxes.

def custom_load_boxes(metadata, batch_start_time, duration, tensor, **kwargs):

    # we first load the events from file
    box_events = box_api.load_box_events(metadata, batch_start_time, duration)

    # here, we just look in the class look up what is the corresponding number for each class
    # in order to get contiguous class numbers for our training dataset.
    class_lookup = kwargs['class_lookup']
    box_events['class_id'] = class_lookup[box_events['class_id']]

    # We then split the box events into each time bin in a list of box event array
    num_tbins = tensor.shape[0]
    box_events = box_api.split_boxes(box_events, batch_start_time=batch_start_time, delta_t=duration // num_tbins, num_tbins=num_tbins)
    # if all frames contain labels
    all_frames_are_okay = np.ones((len(box_events)), dtype=bool)
    return box_events, all_frames_are_okay

You see that the function above requires an additional argument: class_lookup compared to our template function. Therefore, we need to customize the function so that its signature is exactly the one we expect. You can use the partial function from the functools module to pass additional arguments.

# the labels of the class we want to load from the dataset
wanted_keys = ['car', 'pedestrian', 'two wheeler']

# create a look up table to get the lookup IDs from the selected classes
class_lookup = box_api.create_class_lookup(label_map_path, wanted_keys)

custom_load_boxes_fn = partial(custom_load_boxes, class_lookup=class_lookup)

Note

For a more complete example, see the load_boxes function in the source code of metavision_ml/data/box_processing.py

Event-Based SequentialDataLoader

Before instantiating the SequentialDataLoader class, let’s first define some input parameters, then pass our custom label loading function custom_load_boxes_fn.

files = glob.glob(os.path.join(dataset_path, "*.h5"))[:2]
preprocess_function_name = "histo"
delta_t = 50000
channels = 2  # histograms have two channels
num_tbins = 3
height, width = 360, 640
batch_size = 2
max_incr_per_pixel = 2.5
array_dim = [num_tbins, channels, height, width]

Instantiate the class

seq_dataloader = SequentialDataLoader(files, delta_t, preprocess_function_name, array_dim,
                                      load_labels=custom_load_boxes_fn,
                                      batch_size=batch_size, num_workers=0,
                                      preprocess_kwargs={"max_incr_per_pixel": max_incr_per_pixel})

Let’s iterate over the loaded data and visualize its metadata.

for index, batch in enumerate(seq_dataloader):
    if index == 1: # we only visualize one example, remove it if you want to process all data
        break
    print("available keys: ", batch.keys(), "\n")
    print("input shape:", batch["inputs"].shape, "\n")
    print("metadata:", batch["video_infos"], "\n")
    print("box events: ", len(batch['labels']), "lists (corresponding to the no. of time bins), each containing a batch sized lists :", [len(labels) for labels in batch['labels']])
available keys:  dict_keys(['inputs', 'labels', 'mask_keep_memory', 'frame_is_labeled', 'video_infos'])

input shape: torch.Size([3, 2, 2, 360, 640])

metadata: ((FileMetadata object dataset_precomputed/moorea_2019-01-30_000_td_11500000_21500000.h5
    start_ts 0us delta_t 50000us, num_tbins 3, total duration 10000000us
, 0, 150000), (FileMetadata object dataset_precomputed/moorea_2019-01-30_000_td_500000_10500000.h5
    start_ts 0us delta_t 50000us, num_tbins 3, total duration 10000000us
, 0, 150000))

box events:  3 lists (corresponding to the no. of time bins), each containing a batch sized lists : [2, 2, 2]

As you can see, at each iteration SequentialDataLoader produces a dictionary, containing information of inputs, labels, mask_keep_memory, frame_is_labeled and video_infos.

The inputs are tensors of the shape \([T \times N \times C \times H \times N]\) instead of \([N \times C \times H \times N]\), because we need to deal with the temporal information in our training, and it allows to process the data sequentially from the first time bin to the last.

  • T: number of time bins

  • N: batch size

  • C: feature size

  • H: height

  • W: width

Similarly, the bounding boxes are organized in \(T\) lists of \(N\) nested lists, so that labels and tensor are indexed consistently.

The mask_keep_memory is a binary tensor of length \(N\), with value 0. indicating the beginning of a new recording. This is useful in case we want to reset memory between different recordings.

Let’s also take a closer look at those labels in one batch of the time bin.

For instance, bounding boxes in the 2nd time bin of the 1st batch are:

batch['labels'][1][0]
array([(203504,  880.19727, 375.4504 , 830.14355, 265.34906,          1, 0, 0.99837),
       (220187,  755.305  , 396.82315,  62.55174,  45.97175,          1, 0, 0.9    ),
       (236871, 1416.3406 , 288.7673 , 216.00056, 298.24036,          2, 0, 0.9    ),
       (236871,  770.77075, 339.76614, 267.32938, 215.94064,          1, 0, 0.96602),
       (236871,  642.55927, 359.73267,  12.17938,  15.6112 , 4294967295, 0, 0.9    ),
       (236871,  860.3385 , 279.58615,  12.41817,  25.90058, 4294967295, 0, 0.9    )],
      dtype={'names':['t','x','y','w','h','class_id','track_id','class_confidence'], 'formats':['<i8','<f4','<f4','<f4','<f4','<u4','<u4','<f4'], 'offsets':[0,8,12,16,20,24,28,32], 'itemsize':40})

Visualization Utility of SequentialDataLoader

The class SequentialDataLoader provides a visualization method named show. It can visualize batches of the SequentialDataLoader in parallel with openCV.

Let’s visualize the frames we have just loaded.

if os.environ.get("DOC_DISPLAY", "ON") != "OFF":
    cv2.namedWindow('sequential_dataloader')
    for frame in seq_dataloader.show():
        cv2.imshow('sequential_dataloader', frame[..., ::-1])
        key = cv2.waitKey(1)
        if key == 27:
            break
    cv2.destroyWindow('sequential_dataloader')

The show() method can be called with a custom label visualization function so as to stream the labels together with the input data. Its signature should match the following:

def draw_labels(frame, labels):
    """
    Args:
        frame (np.ndarray) frame of size height x width x 3
        labels: label for one file and one tbin

    Returns:
        The input frame on which the labels were drawn.
    """
    return frame

For more information, check metavision_ml.detection_tracking.display_frame.draw_box_events function.

Let’s now visualize the batch data together with the labels.

This time we will use the predefined metavision_ml.data.box_processing.load_boxes function in the ML module to load the labels.

load_boxes_fn = partial(box_api.load_boxes, class_lookup=class_lookup)

seq_dataloader = SequentialDataLoader(files, delta_t, preprocess_function_name, array_dim, load_labels=load_boxes_fn,
                                      batch_size=batch_size, num_workers=0, preprocess_kwargs={"max_incr_per_pixel": max_incr_per_pixel})

We will also use the predefined metavision_ml.detection_tracking.display_frame.draw_box_events function to show the labels.

from metavision_ml.detection_tracking.display_frame import draw_box_events

label_map = ['background'] + wanted_keys

# adding box visualization. Notice how here again we rely on partial.
viz_labels = partial(draw_box_events, label_map=label_map)

if os.environ.get("DOC_DISPLAY", "ON") != "OFF":
    cv2.namedWindow('sequential_dataloader')
    for frame in seq_dataloader.show(viz_labels):
        cv2.imshow('sequential_dataloader', frame[..., ::-1])
        key = cv2.waitKey(1)
        if key == 27:
            break
    cv2.destroyWindow('sequential_dataloader')

Training Loop Example

The data loader presented in this tutorial can be used to create a custom training loop. The following is the pseudo-code you can use to train you own event-based models:

for data in seq_dataloader:
    # we first need to reset the memory for each new sequence in the batch and detach the gradients
    # detaching the gradients prevents the computational graph to be as long as the full sequence.
    # This is called *truncated backpropagation*.
    net.reset(data['mask_keep_memory'])

    # clear the optimiser
    optimizer.zero_grad()

    # we compute the predictions chronologically. This is the forward pass.
    predictions = []
    for batch in data['inputs']:
        predictions.append(net.forward(batch))
    predictions = torch.stack(predictions)

    # loss is computed only during relevant timestamps.
    loss = compute_loss(predictions[data["frame_is_labeled"]], data['targets'][data["frame_is_labeled"]])

    # we then compute the backward pass and update the networks weights.
    loss.backward()
    optimizer.step()

CDProcessorDataLoader

Above Implementation is based on the Pytorch Dataset (Map-Style Dataset with get_item function to override). We present here an alternative implementation based on Pytorch IterableDataset. The advantage is for cases where frequent file seeking is costly or not possible. With it you can stream RAW files directly, without having to convert them to DAT or HDF5 files.

Here are the links to download the RAW files used in this sample:

from metavision_ml.data.cd_processor_dataset import CDProcessorDataLoader

# we use a folder of RAW files
files = ["driving_sample.raw","hand_spinner.raw", "spinner.raw", "80_balls.raw"]

for file in files:
    assert os.path.isfile(file)

dataloader = CDProcessorDataLoader(
    files,
    mode='delta_t',
    delta_t=10000,
    n_events=0,
    max_duration=10000000,
    preprocess_function_name="diff",
    height=240,
    width=320,
    num_tbins=5,
    batch_size=4,
    num_workers=2,
    load_labels=None,
    padding_mode='zeros')
from metavision_ml.data.sequential_dataset_common import show_dataloader

show = show_dataloader(
    dataloader=dataloader,
    height=dataloader.height,
    width=dataloader.width,
    vis_func=dataloader.get_vis_func(),
    viz_labels=None)

if os.environ.get("DOC_DISPLAY", "ON") != "OFF":
    cv2.namedWindow('stream_dataloader')
    for frame in show:
        cv2.imshow('stream_dataloader', frame[..., ::-1])
        key = cv2.waitKey(1)
        if key == 27:
            break
    cv2.destroyWindow('stream_dataloader')