Using a Sequential DataLoader to Create a Training Loop
In this tutorial, we will learn how to load data in a sequential way via our classes metavision_ml.data.sequential_dataset.SequentialDataLoader
and metavision_ml.data.sequential_dataset.SequentialDataset
.
Import Libraries
import os
import glob
import numpy as np
import cv2
from functools import partial
from metavision_ml.data import SequentialDataLoader
from metavision_ml.data import box_processing as box_api
Download the Data Sample
Here is the link to download the files used in this tutorial.
You should now have the following files:
label_map_dictionary.json
moorea_2019-01-30_000_td_11500000_21500000_bbox.npy
moorea_2019-01-30_000_td_11500000_21500000.h5
moorea_2019-01-30_000_td_500000_10500000_bbox.npy
moorea_2019-01-30_000_td_500000_10500000.h5
Suggestions on organizing your dataset
The data sample we provided here is only used for illustration purpose. In practice, we suggest you to separate train, test and validation data in distinct folders, using the following structure:
dataset_folder/
├── train/
│ ├── file_1.h5
│ ├── file_1_bbox.npy
│ ├── file_2.h5
│ ├── file_2_bbox.npy
├── test/
│ ├── file_3.h5
│ ├── file_3_bbox.npy
│ ├── file_4.h5
│ ├── file_4_bbox.npy
├── val/
│ ├── file_5.h5
│ ├── file_5_bbox.npy
├── possibly a readme and some metadata file (JSON etc.)
Note that when using the SequentialDataLoader
, we suggest to precompute the tensor features as explained in our data preprocessing tutorials.
This will reduce the computation time. However, metavision_ml.data.sequential_dataset.SequentialDataLoader
can also use DAT
files directly.
Load Labels
In supervised training, in addition to the training data, we need ground truth labels as well. Depending on the type of training, the labels might come in very different formats. To facilitate training with various label formats, we provide a template function that can be used to write your own label loading functions.
First, let’s see how this template function looks like in our ML module.
from metavision_ml.data.sequential_dataset import load_labels_stub
help(load_labels_stub)
Help on function load_labels_stub in module metavision_ml.data.sequential_dataset:
load_labels_stub(metadata, start_time, duration, tensor)
This is a stub implementation of a function to load label data.
This function doesn't actually load anything and should be passed to the SequentialDataset for
self-supervised training when no actual labelling is required.
Args:
metadata (FileMetadata): This class contains information about the sequence that is being read.
Ideally the path for the labels should be deducible from `metadata.path`.
start_time (int): Time in us in the file at which we start reading.
duration (int): Duration in us of the data we need to read from said file.
tensor (torch.tensor): Torch tensor of the feature for which labels are loaded.
It can be used for instance to filter out the labels in area where there is no events.
Returns:
labels should be indexable by time bin (to differentiate the labels of each time bin). It could
therefore be a list of length *num_tbins*.
(boolean nd array): This boolean mask array of *length* num_tbins indicates
whether the frame contains a label. It is used to differentiate between time_bins that actually
contain an empty label (for instance no bounding boxes) from time bins that weren't labeled due
to cost constraints. The latter timebins shouldn't contribute to supervised losses used during
training.
As you can see, the function returns both a list of labels and a boolean mask indicating if the corresponding time bins are labeled or not. During training, this boolean mask will be used to filter out unlabeled time bins so that no loss will be computed on them.
Note
Since our SequentialDataLoader deals with event-based data, the ground truth labels should be timestamped. This is required for playing labels synchronously with the event features.
Customize a Label Loading Function
Now let’s create a function to load detection bounding boxes.
def custom_load_boxes(metadata, batch_start_time, duration, tensor, **kwargs):
# we first load the events from file
box_events = box_api.load_box_events(metadata, batch_start_time, duration)
# here, we just look in the class look up what is the corresponding number for each class
# in order to get contiguous class numbers for our training dataset.
class_lookup = kwargs['class_lookup']
box_events['class_id'] = class_lookup[box_events['class_id']]
# We then split the box events into each time bin in a list of box event array
num_tbins = tensor.shape[0]
box_events = box_api.split_boxes(box_events, batch_start_time=batch_start_time, delta_t=duration // num_tbins, num_tbins=num_tbins)
# if all frames contain labels
all_frames_are_okay = np.ones((len(box_events)), dtype=bool)
return box_events, all_frames_are_okay
You see that the function above requires an additional argument:
class_lookup
compared to our template function. Therefore, we
need to customize the function so that its signature is exactly the one
we expect. You can use the
partial
function from the
functools module
to pass additional arguments.
# the labels of the class we want to load from the dataset
wanted_keys = ['car', 'pedestrian', 'two wheeler']
# create a look up table to get the lookup IDs from the selected classes
class_lookup = box_api.create_class_lookup(label_map_path, wanted_keys)
custom_load_boxes_fn = partial(custom_load_boxes, class_lookup=class_lookup)
Note
For a more complete example, see the load_boxes
function in the source code of metavision_ml/data/box_processing.py
Event-Based SequentialDataLoader
Before instantiating the SequentialDataLoader
class, let’s first
define some input parameters, then pass our custom label loading
function custom_load_boxes_fn
.
files = glob.glob(os.path.join(dataset_path, "*.h5"))[:2]
preprocess_function_name = "histo"
delta_t = 50000
channels = 2 # histograms have two channels
num_tbins = 3
height, width = 360, 640
batch_size = 2
max_incr_per_pixel = 2.5
array_dim = [num_tbins, channels, height, width]
Instantiate the class
seq_dataloader = SequentialDataLoader(files, delta_t, preprocess_function_name, array_dim,
load_labels=custom_load_boxes_fn,
batch_size=batch_size, num_workers=0,
preprocess_kwargs={"max_incr_per_pixel": max_incr_per_pixel})
Let’s iterate over the loaded data and visualize its metadata.
for index, batch in enumerate(seq_dataloader):
if index == 1: # we only visualize one example, remove it if you want to process all data
break
print("available keys: ", batch.keys(), "\n")
print("input shape:", batch["inputs"].shape, "\n")
print("metadata:", batch["video_infos"], "\n")
print("box events: ", len(batch['labels']), "lists (corresponding to the no. of time bins), each containing a batch sized lists :", [len(labels) for labels in batch['labels']])
available keys: dict_keys(['inputs', 'labels', 'mask_keep_memory', 'frame_is_labeled', 'video_infos'])
input shape: torch.Size([3, 2, 2, 360, 640])
metadata: ((FileMetadata object dataset_precomputed/moorea_2019-01-30_000_td_11500000_21500000.h5
start_ts 0us delta_t 50000us, num_tbins 3, total duration 10000000us
, 0, 150000), (FileMetadata object dataset_precomputed/moorea_2019-01-30_000_td_500000_10500000.h5
start_ts 0us delta_t 50000us, num_tbins 3, total duration 10000000us
, 0, 150000))
box events: 3 lists (corresponding to the no. of time bins), each containing a batch sized lists : [2, 2, 2]
As you can see, at each iteration SequentialDataLoader
produces a
dictionary, containing information of inputs
, labels
,
mask_keep_memory
, frame_is_labeled
and video_infos
.
The inputs are tensors of the shape \([T \times N \times C \times H \times N]\) instead of \([N \times C \times H \times N]\), because we need to deal with the temporal information in our training, and it allows to process the data sequentially from the first time bin to the last.
T: number of time bins
N: batch size
C: feature size
H: height
W: width
Similarly, the bounding boxes are organized in \(T\) lists of \(N\) nested lists, so that labels and tensor are indexed consistently.
The mask_keep_memory
is a binary tensor of length \(N\), with
value 0. indicating the beginning of a new recording. This is useful in
case we want to reset memory between different recordings.
Let’s also take a closer look at those labels in one batch of the time bin.
For instance, bounding boxes in the 2nd time bin of the 1st batch are:
batch['labels'][1][0]
array([(203504, 880.19727, 375.4504 , 830.14355, 265.34906, 1, 0, 0.99837),
(220187, 755.305 , 396.82315, 62.55174, 45.97175, 1, 0, 0.9 ),
(236871, 1416.3406 , 288.7673 , 216.00056, 298.24036, 2, 0, 0.9 ),
(236871, 770.77075, 339.76614, 267.32938, 215.94064, 1, 0, 0.96602),
(236871, 642.55927, 359.73267, 12.17938, 15.6112 , 4294967295, 0, 0.9 ),
(236871, 860.3385 , 279.58615, 12.41817, 25.90058, 4294967295, 0, 0.9 )],
dtype={'names':['t','x','y','w','h','class_id','track_id','class_confidence'], 'formats':['<i8','<f4','<f4','<f4','<f4','<u4','<u4','<f4'], 'offsets':[0,8,12,16,20,24,28,32], 'itemsize':40})
Visualization Utility of SequentialDataLoader
The class SequentialDataLoader
provides a visualization method named
show
. It can visualize batches of the SequentialDataLoader
in
parallel with openCV.
Let’s visualize the frames we have just loaded.
if os.environ.get("DOC_DISPLAY", "ON") != "OFF":
cv2.namedWindow('sequential_dataloader')
for frame in seq_dataloader.show():
cv2.imshow('sequential_dataloader', frame[..., ::-1])
key = cv2.waitKey(1)
if key == 27:
break
cv2.destroyWindow('sequential_dataloader')
The show()
method can be called with a custom label visualization
function so as to stream the labels together with the input data. Its
signature should match the following:
def draw_labels(frame, labels):
"""
Args:
frame (np.ndarray) frame of size height x width x 3
labels: label for one file and one tbin
Returns:
The input frame on which the labels were drawn.
"""
return frame
For more information, check metavision_ml.detection_tracking.display_frame.draw_box_events
function.
Let’s now visualize the batch data together with the labels.
This time we will use the predefined metavision_ml.data.box_processing.load_boxes
function in the ML module to load the labels.
load_boxes_fn = partial(box_api.load_boxes, class_lookup=class_lookup)
seq_dataloader = SequentialDataLoader(files, delta_t, preprocess_function_name, array_dim, load_labels=load_boxes_fn,
batch_size=batch_size, num_workers=0, preprocess_kwargs={"max_incr_per_pixel": max_incr_per_pixel})
We will also use the predefined metavision_ml.detection_tracking.display_frame.draw_box_events
function to show the labels.
from metavision_ml.detection_tracking.display_frame import draw_box_events
label_map = ['background'] + wanted_keys
# adding box visualization. Notice how here again we rely on partial.
viz_labels = partial(draw_box_events, label_map=label_map)
if os.environ.get("DOC_DISPLAY", "ON") != "OFF":
cv2.namedWindow('sequential_dataloader')
for frame in seq_dataloader.show(viz_labels):
cv2.imshow('sequential_dataloader', frame[..., ::-1])
key = cv2.waitKey(1)
if key == 27:
break
cv2.destroyWindow('sequential_dataloader')
Training Loop Example
The data loader presented in this tutorial can be used to create a custom training loop. The following is the pseudo-code you can use to train you own event-based models:
for data in seq_dataloader:
# we first need to reset the memory for each new sequence in the batch and detach the gradients
# detaching the gradients prevents the computational graph to be as long as the full sequence.
# This is called *truncated backpropagation*.
net.reset(data['mask_keep_memory'])
# clear the optimiser
optimizer.zero_grad()
# we compute the predictions chronologically. This is the forward pass.
predictions = []
for batch in data['inputs']:
predictions.append(net.forward(batch))
predictions = torch.stack(predictions)
# loss is computed only during relevant timestamps.
loss = compute_loss(predictions[data["frame_is_labeled"]], data['targets'][data["frame_is_labeled"]])
# we then compute the backward pass and update the networks weights.
loss.backward()
optimizer.step()
CDProcessorDataLoader
Above Implementation is based on the Pytorch Dataset (Map-Style Dataset with get_item function to override). We present here an alternative implementation based on Pytorch IterableDataset. The advantage is for cases where frequent file seeking is costly or not possible. With it you can stream RAW files directly, without having to convert them to DAT or HDF5 files.
Here are the links to download the RAW files used in this sample:
from metavision_ml.data.cd_processor_dataset import CDProcessorDataLoader
# we use a folder of RAW files
files = ["driving_sample.raw","hand_spinner.raw", "spinner.raw", "80_balls.raw"]
for file in files:
assert os.path.isfile(file)
dataloader = CDProcessorDataLoader(
files,
mode='delta_t',
delta_t=10000,
n_events=0,
max_duration=10000000,
preprocess_function_name="diff",
height=240,
width=320,
num_tbins=5,
batch_size=4,
num_workers=2,
load_labels=None,
padding_mode='zeros')
from metavision_ml.data.sequential_dataset_common import show_dataloader
show = show_dataloader(
dataloader=dataloader,
height=dataloader.height,
width=dataloader.width,
vis_func=dataloader.get_vis_func(),
viz_labels=None)
if os.environ.get("DOC_DISPLAY", "ON") != "OFF":
cv2.namedWindow('stream_dataloader')
for frame in show:
cv2.imshow('stream_dataloader', frame[..., ::-1])
key = cv2.waitKey(1)
if key == 27:
break
cv2.destroyWindow('stream_dataloader')