SDK ML Data API

Subclassing Torch dataset to load DAT files and labels from events and wrapping them using the dataloader class. It supports currently DAT and HDF5 files, although we recommend to use the latter.

This class is generic to any type of labels. A function should be provided to load them.

class metavision_ml.data.sequential_dataset.SequentialDataLoader(files, delta_t, preprocess_function_name, array_dim, load_labels=<function load_labels_stub>, durations=[], batch_size=8, num_workers=2, preprocess_kwargs={}, shuffle=False, padding=False, transforms=None)

SequentialDataLoader uses a pytorch DataLoader to read batches chronologically.

It is used simply as an iterator and returns a dictionary containing the following keys:

inputs a torch.tensor of shape num_tbins x batch_size x channel x height x width.
Note that it is normalized to 1. The dtype depends on the preprocessing function used but can by specifying the preprocess_kwargs.

labels is the list of labels provided by the load_labels function.

mask_keep_memory a float array of shape batch_size, with values in (0., 1.) indicating
whether memory is kept or reset at the beginning of the sequence.

frame_is_labeled a boolean array of shape num_tbins x batch_size, indicating whether the
corresponding labels can be used for loss computation. (id est if the labels are valid or not).

video_infos is a list of (FileMetadata, batch_start_time, duration) of size batch_size containing
infos about each recording in the batch.

batch_size

Number of sequences being read concurrently. This can affect the loading time of the batch and has effect on the gradient statistics.

Type: int

num_workers

Number of processes being used by the DataLoader, 0 means it uses Python’s main process. More processes help with speed but up to a point: too many processes can actually hurt loading times.

Type: int

max_consecutive_batch

Maximum number of consecutive batches allowed in a sequence. If a file is longer than max_consecutive_batch x num_tbins x delta_t the rest will be considered as part of another sequence. If None, the full length of the sequence will be used.

Type: int

device

Indicates on which device (cpu or cuda for instance) the data will be put.

Type: torch.device

dataset

Instance of SequentialDataset that is used to load the data, or possibly change the scheduling. Note that if the dataset is changed, that change won’t take effect until the next iteration of the DataLoader.

Type: SequentialDataset

Parameters

files (list) – List of input files. Can be either DAT files or HDF5 files.
delta_t (int) – Timeslice delta_t in us.
preprocess_function_name (string) – Name of the preprocessing function used to turn events into features. Can be any of the functions present in metavision_ml.preprocessing or one registered by the user.
array_dim (int list) – Dimension of feature tensors: (num_tbins, channels, sensor_height // 2^k, sensor_width >> 2^k)
load_labels – function providing labels (see load_labels_stub).
durations (int list) – Optionally you can provide the durations in us to all the files in input. This allows to save a bit of time when there are many of them. If you provide a duration that is shorter than the actual duration of a sequence, only part of it will be read.
batch_size (int) – Number of sequences being read concurrently. This can affect the loading time of the batch and has effect on the gradient statistics.
num_workers (int) – Number of processes being used by the DataLoader, 0 means it uses Python’s main process. More processes help with speed but up to a point: too many processes can actually hurt loading times.
preprocess_kwargs – dictionary of optional arguments to the preprocessing function. This can be used to override the default value of max_incr_per_pixel for instance. {“max_incr_per_pixel”: 20} to clip and normalize tensors by 20.
shuffle (boolean) – If True, breaks the temporal continuity between batches. This should be only used when training a model without memory.
padding (boolean) – If True, at the end of an epoch the Dataset will run with incomplete batches when it can’t read a complete one until all data is read. The last incomplete batches will contain FileMetadata object, with padding = True so that no loss is computed on them. If False, the epoch stops after the last complete batch. This can be used to make sure that evaluation is computed on the whole test set for example.
transforms (torchvision Transforms) – Transformations to be applied to each frame of a sequence.

Examples

>>> array_dim = [5, 2, 480, 640]
>>> dataloader = SequentialDataLoader(['train/file1.dat', 'train/file1.dat'], 50000, "histo", array_dim)
>>> for ind, data_dic in enumerate(dataloader):
>>>     batch = data_dic["inputs"]
>>>     targets = data_dic["labels"]
>>>     mask = data_dic["mask_keep_memory"]
>>>     frame_is_labeled = data_dic["frame_is_labeled"]

cpu(): Sets the SequentialDataLoader to leave tensors on CPU.

cuda(device=device(type='cuda'))

Sets the SequentialDataLoader to copy tensors to GPU memory before returning them.

Parameters: device (torch.device) – The destination GPU device. Defaults to the current CUDA device.

get_vis_func(): Returns the visualization function corresponding to the preprocessing being used.

show(viz_labels=None)

Visualizes batches of the DataLoader in parallel with open cv.

This returns a generator that draws the input and also the labels if a “viz_labels” function is provided.

Parameters: viz_labels (function) – Optionally take a visualization function for labels. Its signature is - img (np.ndarray) a image of size (height, width, 3) and of dtype np.uint8 - labels as defined in your load_labels function.

to(device)

Sets the SequentialDataLoader to copy tensors to the given device before returning them.

Parameters: device (torch.device) – The destination GPU device. For instance torch.device(‘cpu’) or torch.device(‘cuda’).

class metavision_ml.data.sequential_dataset.SequentialDataset(files, delta_t, preprocess_function_name, array_dim, load_labels=<function load_labels_stub>, durations=[], batch_size=8, preprocess_kwargs={}, padding=False, transforms=None)

Subclass of torch.data.dataset designed to stream batch of sequences chronologically.

It will read data sequentially from the same file until it jumps to another file which will also be read sequentially.

Usually it is used in conjunction with the SequentialDataLoader, in which case this object is directly initialized by the SequentialDataLoader itself.

Parameters

files (list) – List of input files. Can be either DAT files or HDF5 files.
delta_t (int) – Timeslice delta_t in us.
preprocess_function_name (string) – Name of the preprocessing function used to turn events into features. Can be any of the functions present in metavision_ml.preprocessing or one registered by the user.
array_dim (int list) – Dimension of feature tensors: (num_tbins, channels, sensor_height * 2^-k, sensor_width * 2^-k)
load_labels (function) –
batch_size (int) – Number of sequences being read concurrently. This can affect the loading time of the batch and has effect on the gradient statistics.
preprocess_kwargs – dictionary of optional arguments to the preprocessing function.
padding (boolean) – If True, at the end of an epoch the Dataset will run with incomplete batches when it can’t read a complete one until all data is read. The last incomplete batches will contain FileMetadata object, with padding = True so that no loss is computed on them. If False, the epoch stops after the last complete batch. This can be used to make sure that evaluation is computed on the whole test set for example.
transforms (torchvision Transforms) – Transformations to be applied to each frame of a sequence.

downsampling_factor

Parameter used to reduce the spatial dimension of the obtained feature. Actually multiply the coordinates by 2**(-downsampling_factor).

Type: int

get_batch_metadata(batch_idx)

Gets the metadata information of the batch obtained from the batch indices.

Returns: List of tuple composed of (FileMetadata, start list time of sequence in us, duration of sequence in us).

get_size(): Returns height and width of histograms/features, i.e. size after downsampling_factor.

get_size_original(): Returns height and width of input events before downscaling.

get_unique_files(): Returns a unique list of FileMetadata. It is useful in case of a curriculum learning (launch using reschedule) where there is several occurrences of the same file with different start_ts.

reschedule(max_consecutive_batch, shuffle=True)

Recomputes a new schedule corresponding to the same files but a different max_consecutive_batch parameter.

This is useful when trying to do curriculum learning when you want to feed your model with sequence of increasing duration. Alternatively if you don’t want to change any parameters you can simply use the shuffle function.

Parameters

max_consecutive_batch (int) – Maximum number of consecutive batches allowed in a sequence. If a file is longer than max_consecutive_batch x num_tbins x delta_t the rest will be considered as part of another sequence. If None, the full length of the sequence will be used.
shuffle (boolean) – Whether to apply a random shuffle to the list of files. Setting it to True, is recommended.

shuffle(seed=None): Shuffles the list of input files.

metavision_ml.data.sequential_dataset.collate_fn(data_list)

Builds a batch from the result of the different __getitem__ calls of the Dataset. This function helps define the DataLoader behaviour.

By doing so it puts the temporal dimensions (each time bin) as the first dimension and the batch dimension becomes second.

Parameters: data_list (tuple list) – List where each item is a tuple composed of a tensor, the labels, the keep memory mask and the frame_is_labeled mask.
Returns: see SequentialDataLoader
Return type: dictionary

metavision_ml.data.sequential_dataset.load_labels_stub(metadata, start_time, duration, tensor)

This is a stub implementation of a function to load label data.

This function doesn’t actually load anything and should be passed to the SequentialDataset for: self-supervised training when no actual labelling is required.

Parameters

metadata (FileMetadata) – This class contains information about the sequence that is being read. Ideally the path for the labels should be deducible from metadata.path.
start_time (int) – Time in us in the file at which we start reading.
duration (int) – Duration in us of the data we need to read from said file.
tensor (torch.tensor) – Torch tensor of the feature for which labels are loaded. It can be used for instance to filter out the labels in area where there is no events.

Returns

labels should be indexable by time bin (to differentiate the labels of each time bin). It could: therefore be a list of length num_tbins.
(boolean nd array): This boolean mask array of length num_tbins indicates: whether the frame contains a label. It is used to differentiate between time_bins that actually contain an empty label (for instance no bounding boxes) from time bins that weren’t labeled due to cost constraints. The latter timebins shouldn’t contribute to supervised losses used during training.

Utils for sequential datasets, works for sequential_dataset_map_style and sequential_dataset_iterable_style

metavision_ml.data.sequential_dataset_common.collate_fn(data_list)

Builds a batch from the result of the different __getitem__ calls of the Dataset. This function helps define the DataLoader behaviour.

By doing so it puts the temporal dimensions (each time bin) as the first dimension and the batch dimension becomes second.

Parameters: data_list (tuple list) – List where each item is a tuple composed of a tensor, the labels, the keep memory mask and the frame_is_labeled mask.
Returns: see SequentialDataLoader
Return type: dictionary

metavision_ml.data.sequential_dataset_common.load_labels_stub(metadata, start_time, duration, tensor)

This is a stub implementation of a function to load label data.

This function doesn’t actually load anything and should be passed to the SequentialDataset for: self-supervised training when no actual labelling is required.

Parameters

metadata (FileMetadata) – This class contains information about the sequence that is being read. Ideally the path for the labels should be deducible from metadata.path.
start_time (int) – Time in us in the file at which we start reading.
duration (int) – Duration in us of the data we need to read from said file.
tensor (torch.tensor) – Torch tensor of the feature for which labels are loaded. It can be used for instance to filter out the labels in area where there is no events.

Returns

labels should be indexable by time bin (to differentiate the labels of each time bin). It could: therefore be a list of length num_tbins.
(boolean nd array): This boolean mask array of length num_tbins indicates: whether the frame contains a label. It is used to differentiate between time_bins that actually contain an empty label (for instance no bounding boxes) from time bins that weren’t labeled due to cost constraints. The latter timebins shouldn’t contribute to supervised losses used during training.

metavision_ml.data.sequential_dataset_common.show_dataloader(dataloader, height, width, vis_func, viz_labels=None)

Visualizes batches of the DataLoader in parallel with open cv.

This returns a generator that draws the input and also the labels if a “viz_labels” function is provided.

Parameters

dataloader (DataLoader) – iterable of batch of sequential features.
height (int) – height of the feature maps provided by the dataloader.
width (int) – width of the feature maps provided by the dataloader
viz_func (function) – the visualization function corresponding to the preprocessing being used. Takes a tensor
image. (of shape channels x height x width and turns it into a RGB height width x 3 uint8) –
viz_labels (function) – Optionally take a visualization function for labels. Its signature is - img (np.ndarray) a image of size (height, width, 3) and of dtype np.uint8 - labels as defined in your load_labels function.

Scheduler is a File agnostic class that does the scheduling of sequence for a dataloader.

class metavision_ml.data.scheduler.FileMetadata(file, duration, delta_t, num_tbins, labels=None, start_ts=0, padding=False)

Metadata class describing a sequence.

Parameters

file (str) – Path to the sequence file.
duration (int) – Sequence duration in us.
delta_t (int) – Duration of a time bin in us.
num_tbins (int) – Number of time bins together.
labels (str) – Path to the label file for the sequence.
start_ts (int) – Timestamps at which we start reading the sequence. effectively cuts it.
padding (boolean) – Whether the object is padding (i.e. the FileMetadata is associated to no file or labels and is just here in case of incomplete batches.)

path

Path to the sequence file

Type: str

duration

Sequence duration in us

Type: int

delta_t

Duration of a time bin in us

Type: int

num_tbins

Number of time bins together

Type: int

labels

Path to the label file for the sequence

Type: str

start_ts

Timestamps at which we start reading the sequence. effectively cuts it

Type: int

padding

Whether the object is padding (i.e. the FileMetadata is associated to no file or labels and is just here in case of incomplete batches.)

Type: boolean

get_original_size()

Returns the couple (height, width) of a file before any downsampling was optionally done.

This corresponds to the resolution of the imager used to record the original data.

get_remaining_duration(): Returns the duration left considering the starting point.

is_padding(): Is padding data.

is_precomputed(): Is the data in an HDF5 file.

class metavision_ml.data.scheduler.Scheduler(filesmetadata, total_tbins_delta_t, batch_size, max_consecutive_batch=None, padding=False, base_seed=0)

File agnostic class that does the scheduling of sequence for a dataloader. Assumes a dataloader in non shuffle mode for temporal continuity.

Args :

filesmetadata (FileMetadata list): List of FileMetadata objects describing the dataset. total_tbins_delta_t (int): Duration in us of a sequence inside a minibatch. batch_size (int): Number of sequences being read concurrently. max_consecutive_batch (int): Maximum number of consecutive batches allowed in a sequence. If a

file is longer than max_consecutive_batch x total_tbins_delta_t the rest will be considered as part of another sequence. If None, the full length of the sequence will be used. This is used for curriculum learning to vary how long sequences are.

padding (boolean): If True, the Scheduler will run with incomplete batches when it can’t: read a complete one until all data is read. The last incomplete batches will contain FileMetadata object, with padding = True so that no loss is computed on them. If False, the Scheduler stops at the last complete batch

base_seed (int): consistent random seed associated with each epoch.

classmethod create_schedule(files, durations, delta_t, num_tbins, batch_size, labels=None, max_consecutive_batch=None, shuffle=True, padding=False): Alternate way of constructing a Scheduler with paths and duration instead of FileMetadata list create a full schedule where everything is read

remove_files(files_to_remove): Removes some files from the scheduler and reinitialize the schedule.

reschedule(max_consecutive_batch, num_tbins, delta_t, shuffle=True)

Returns a new schedule corresponding to the same files but some different parameters.

This is useful when trying to do curriculum learning when you want to feed your model with sequence of increasing duration. Alternatively if you don’t want to change any parameters you can simply use the shuffle function.

Parameters

max_consecutive_batch (int) – Maximum number of consecutive batches allowed in a sequence. If a file is longer than max_consecutive_batch x num_tbins x delta_t the rest will be considered as part of another sequence. If None, the full length of the sequence will be used.
num_tbins (int) – Number of time bins in each batch (also the first dimension of the input tensor)
delta_t (int) – In us duration of a single time bin.
shuffle (boolean) – Whether to apply a random shuffle to the list of files. If max_consecutive_batch is not None, this is heavily recommended.

Returns

scheduler, a new Scheduler object.

shuffle(seed=None)

Shuffles the FileMetadata list held by the Scheduler and reconstructs a schedule.

Parameters: seed (int) – seed value to make shuffling deterministic.

metavision_ml.data.scheduler.get_duration(path): Returns duration of a file

metavision_ml.data.transformations.transform_ev_tensor(ev_tensor, file_path, transforms, base_seed=0)

Applies a series of 2d transformations to each frame and each channel of a ev_tensor.

Parameters

ev_tensor (torch.tensor) – feature tensor of shape (num_ev_reps, num_channels, height, width).
file_path (string) – it will be used to calculate the seed.
transforms (torchvision.transforms) – transform to be applied to each channel of each frame.
base_seed (int) – base_seed to add to the sequence in order to have additionnal randomness. However it needs to be the constant within an epoch.

Returns

feature tensor of shape (num_ev_reps, num_channels, height, width).

Return type

ev_tensor (torch.tensor)

metavision_ml.data.transformations.transform_sequence(sequence, metadata, transforms, base_seed=0)

Applies a series of 2d transformations to each frame and each channel of a sequence.

The metadata of the sequence is used to provide a seed.

Parameters

sequence (torch.tensor) – feature tensor of shape (num_time_bins, num_channels, height, width)
metadata (FileMetadata) – object describing the metadata of the sequence to which the tensor belongs.
transforms (torchvision.transforms) – transform to be applied to each channel of each frame.
base_seed (int) – base_seed to add to the sequence in order to have additionnal randomness. However it needs to be the constant within an epoch.

Returns

feature tensor of shape (num_time_bins, num_channels, height, width)

Return type

sequence (torch.tensor)

Collections of functions to add bounding box loading capabilities to the SequentialDataLoader

metavision_ml.data.box_processing.bboxes_to_box_vectors(bbox)

Converts back EventBbox bounding boxes to plain numpy array.

Parameters: bbox – np.ndarray Nx1 dtype EventBbox (x1,y1,w,h,score,conf,track_id)

WARNING: Here class id must be in 0-C (-1: ignore, 0: background, [1,C]: classes)

Returns: torch.array Nx6 dtype (x1,y1,x2,y2,label,track_id)
Return type: out

metavision_ml.data.box_processing.box_vectors_to_bboxes(boxes, labels, scores=None, track_ids=None, ts=0)

Concatenates box vectors into a structured array of EventBbox.

Parameters

boxes (np.ndarray) – Bboxes coordinates (x1,y2,x2,y2).
labels (np.ndarray) – Class index for each box.
scores (np.ndarray) – Score for each box.
track_ids (np.ndarray) – Individual track id for each box.
ts (int) – Timestamp in us.

Returns

Box with EventBbox.

Return type

box_events (np.ndarray)

metavision_ml.data.box_processing.clip_boxes(box_events, width_orig, height_orig)

Clips boxes so that they belong to the viewport width and height. Discards those that ends up being empty.

Parameters

box_events (structured np.ndarray) – Nx1 of dtype EventBbox
width_orig (int) – Original width of sensor for annotation
height_orig (int) – Original height of sensor for annotation

Returns

Nx1 of dtype EventBbox

Return type

box_events (structured np.ndarray)

metavision_ml.data.box_processing.could_frame_contain_valid_gt(batch_start_time, duration, labelling_delta_t, num_tbins)

This function returns a np.array of num_tbins boolean, indicating whether a frame was labeled or not.

This is useful if our recordings are labeled at a fix frame rate but we want to train at a higher framerate (i.e. small delta_t.) The number of frames in a batch (num_tbins) is the duration of this batch divided by delta_t

Note: If you train at faster frequency than your annotations it is also possible to interpolate your bounding box files offline to avoid this.

For example, given the following setup

num_tbins = 5 (number of frames in a batch)
delta_t = 50 (time of each frame)
labelling_delta_t = 120 (delta_t at which labels are provided)
duration = batch_size * delta_t = 250

-> this function will be called several times, with batch_start_time = 0, then 250, then 500, etc. Each time this function is called, it returns an array of 5 booleans to indicate which frames could contain a label:

           GT            GT              GT            GT            GT            GT
|            120           240|            360           480|          600           720  |
|             |             | |             |             | |           |             |   |
|             v             v |             v             v |           v             v   |
|     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
0    50    100   150   200   250   300   350   400   450   500   550   600   650   700   750
|                             |                             |                             |
|< F > < F > < T > < F > < T >|< F > < F > < T > < F > < T >|< F > < T > < F > < F > < T >|
|                             |                             |                             |
|<-------- first call ------->|<------- second call ------->|<-------- third call ------->|
|                             |                             |                             |

Same setup as before, but now with labelling_delta_t = 100 instead of 120:

         GT          GT          GT          GT          GT           GT         GT
|          100         200    |    300         400         500          600        700    |
|           |           |     |     |           |           |           |           |     |
|           v           v     |     v           v           v           v           v     |
|     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
0    50    100   150   200   250   300   350   400   450   500   550   600   650   700   750
|                             |                             |                             |
|< F > < T > < F > < T > < F >|< T > < F > < T > < F > < T >|< F > < T > < F > < T > < F >|
|                             |                             |                             |
|<-------- first call ------->|<------- second call ------->|<-------- third call ------->|
|                             |                             |                             |

Note: if labelling_delta_t <= delta_t, all frames could contain a valid GT

Note: If the FileMetadata is a pure distractor file (with no label at all), it will have a 1 us labelling_delta_t, and therefore all the frames will be considered labeled.

Parameters

batch_start_time – Time from when to start loading (in us).
duration – Duration to load (in us).
labelling_delta_t – Period (in us) of your labelling annotation system.
num_tbins – Number of frames to load.

Returns

(boolean nd array): This boolean mask array of length num_tbins indicates: whether the frame contains a label. It is used to differentiate between time_bins that actually contain an empty label (for instance no bounding boxes) from time bins that weren’t labeled due to cost constraints. The latter timebins shouldn’t contribute to supervised losses used during training.

Return type

frame_could_contain_gt

metavision_ml.data.box_processing.create_class_lookup(labelmap_path, wanted_keys=[])

Takes as argument a json path storing a dictionary with class_id as key and class_name as value for the ground truth. Takes also as argument a list of wanted keys (class_names that we want to select).

Parameters

labelmap_path (string) –

Path to the label map ex of inside the json’{“0”: “pedestrian”,
”1”: “two wheeler”, “2”: “car”, “3”: “truck”

}’
wanted_keys (list) – List of classes to extract example: [‘car’, ‘pedestrian’]

Returns

class_lookup numpy array [1, -1, 2, -1]

In the example we get 0 for background, 1 for pedestrians and 2 for cars. At the end, if you do new_label = class_lookup[gt_label] you can transform ground truth ids array in an array with ids that fit your network. Reminder : Ground truth does not have id for background. For our network we get id 0 for background and consecutive ids for other classes.

metavision_ml.data.box_processing.filter_boxes(box_events, class_lookup, idx_to_filter, ignore_filtered)

Filters or ignores boxes or in box_events according to idx_to_filter.

Ignored boxes are still present but are marked with a -1 class_id. At the loss computation stage this information can be used so that they don’t contribute to the loss.This ius used when you don’t want the proposals matched with those ignored boxes to be considered as False positives in the loss. For instance if you train on cars only in a dataset containing trucks they could be ignored.

Parameters

box_events (np.ndarray) – Box events.
class_lookup (int list) – Lookup table for converting class indices to contiguous int values.
idx_to_filter (np.ndarray) – Boxes indices to filter out or ignore (see below).
ignore_filtered (bool) – If true, ignores the boxes filtered in the loss those boxes are marked with a -1 class_id in order to discard them in a loss.

Returns

Box_events with class_id translated using the class_lookup.

Return type

(np.ndarray)

metavision_ml.data.box_processing.filter_empty_tensor(array: numpy.array, box_events: numpy.array, area_box_filter: float = 0.1, shift: int = 0, time_per_bin: int = 10000, batch_start_time: int = 0, last_time_to_filter: Optional[int] = None) → numpy.array

Preprocessing bounding boxes: discard bbox with empty event data inside of it

Parameters

array – (T,C,H,W) event frame
box_events – numpy array of bbox
area_box_filter – minimum percentage area of bbox which contain events
shift – downsampling coefficient
time_per_bin – time interval per time bin along T axis
batch_start_time – starting time stamp of the array batch
last_time_to_filter – stop filtering bbox after this time stamp

Returns:

metavision_ml.data.box_processing.load_box_events(metadata, batch_start_time, duration)

Fetches box events from FileMetadata object, batch_start_time & duration.

Parameters

metadata (object) – Record details.
batch_start_time (int) – (us) Where to seek in the file to load corresponding bounding boxes
duration (int) – (us) How long to load events from bounding box file

Returns

Nx1 of dtype EventBbox

Return type

box_events (structured np.ndarray)

metavision_ml.data.box_processing.load_boxes(metadata, batch_start_time, duration, tensor, **kwargs)

Function to fetch boxes and preprocess them. Should be passed to a SequentialDataLoader.

Since this function has additional arguments compared to load_labels_stub, one has to specialize it:

Examples

>>> from functools import partial
>>> n_classes = 21
>>> class_lookup = np.arange(n_classes)  # each class is mapped to itself
>>> load_boxes_function = partial(load_boxes, class_lookup=class_lookup)

Parameters

metadata (FileMetadata) – Record details.
batch_start_time (int) – (us) Where to seek in the file to load corresponding bounding boxes
duration (int) – (us) How long to load events from bounding box file
tensor (np.ndarray) – Current preprocessed input, can be used for data dependent preprocessing, for instance remove boxes without any features in them.
**kwargs –
class_lookup (np.array): Look up array for class indices. labelling_delta_t (int): Indicates the period of labelling in order to only consider time bins

with actual labels when computing the loss.

min_box_diag (int): Diagonal value under which boxes are not considerated. Defaults to 60 pixels.
containing –
class_lookup (np.array): Look up array for class indices. labelling_delta_t (int): Indicates the period of labelling in order to only consider time bins

with actual labels when computing the loss.

min_box_diag (int): Diagonal value under which boxes are not considerated. Defaults to 60 pixels.

Returns

List of structured array of dtype EventBbox corresponding to each time: bins.
frames_contain_gt (np.ndarray): This boolean mask array of length num_tbins indicates: whether the frame contains a label. It is used to differentiate between time bins that actually contain an empty label (for instance no bounding boxes) from time bins that weren’t labeled due to cost constraints. The latter time bins shouldn’t contribute to supervised losses used during training.

Return type

boxes (List[np.ndarray])

metavision_ml.data.box_processing.nms(box_events, scores, iou_thresh=0.5)

NMS on box_events

Parameters

box_events (np.ndarray) – nx1 with dtype EventBbox, the sorting order of those box is used as a a criterion for the nms.
scores (np.ndarray) – nx1 dtype of plain dtype, needs to be argsortable.
iou_thresh (float) – if two boxes overlap with more than iou_thresh (intersection over union threshold) with each other, only the one with the highest criterion value is kept.

Returns

Indices of the box to keep in the input array.

Return type

keep (np.ndarray)

metavision_ml.data.box_processing.nms_by_class(box_events, scores, iou_thresh=0.5)

NMS on box_events done independently by class

Parameters

box_events (np.ndarray) – nx1 with dtype EventBbox , the sorting order of those box is used as a a criterion for the nms.
scores (np.ndarray) – nx1 dtype of plain dtype, needs to be argsortable.
iou_thresh (float) – if two boxes overlap with more than iou_thresh (intersection over union threshold) with each other, only the one with the highest criterion value is kept.

Returns

Indices of the box to keep in the input array.

Return type

keeps (np.ndarray)

metavision_ml.data.box_processing.rescale_boxes(box_events, width_orig, height_orig, width_dst, height_dst)

Rescales boxes to new height and width.

Parameters

box_events (structured np.ndarray) – Array of length n of dtype EventBbox.
width_orig (int) – Original width of sensor for annotation.
height_orig (int) – Original height of sensor for annotation.
width_dst (int) – Destination width.
height_dst (int) – Destination height.

Returns

Array of length n of dtype EventBbox.

Return type

box_events (structured np.ndarray)

metavision_ml.data.box_processing.split_boxes(box_events, batch_start_time, delta_t=None, num_tbins=None)

Split box_events to a list of box events clustered by delta_t Removes a bounding box from the input list box_events if: there are less than min_box_area_thr*bbox_area events in the box and timestamp of bbox < last_time_to_filter”

Box times are in range(0, num_tbins*tbin)

Parameters

box_events (structured np.ndarray) – Box events inputs of type EventBbox
delta_t (optional int) – Duration of time bin in us. Used for chronological NMS.
num_tbins (optional int) – Number of time bins.

Returns

List of box_events of type EventBbox separated in time bins.

Return type

box_events (np.ndarray list)

This class allows to stream a dataset of .raw or .dat

This is yet another example how to use data.multistream_dataloader Here we go further and integrate with the same interface as SequentialDataLoader

class metavision_ml.data.cd_processor_dataset.CDProcessorDataLoader(files, mode, delta_t, n_events, max_duration, preprocess_function_name, num_tbins, batch_size, num_workers=2, height=None, width=None, preprocess_kwargs={}, load_labels=None, padding_mode='zeros', transforms=None, base_seed=None)

Attempt at doing the same interface than SequentialDataloader but using multistream_dataloader implementation.

get_base_seed(base_seed)

Attributes a base seed that is added to transforms when applicable.

Changing this seed after an epoch is ended allows differentiation between epochs while maintaining temporal coherency of the spatial transformation.

Parameters: base_seed (int) – seed to be added to the hash before drawing a random transformation. if None, will use the time instead.

get_vis_func(): Returns the visualization function corresponding to the preprocessing being used.

class metavision_ml.data.cd_processor_dataset.CDProcessorDatasetIterator(path, height_out, width_out, load_labels, mode, n_events, delta_t, num_tbins, preprocess_function_name, preprocess_kwargs={}, start_ts=0, max_duration=None, transforms=None, base_seed=None): This iterator reads events or preprocessed tensors, computes tensors, load labels and retrieves them difference with sequential_dataset_v1 is that load_labels cannot be a pure function it has to be a class

This class simulates a moving box in translation and zoom in a frame.

class metavision_ml.data.moving_box.Animation(height, width, channels, max_stop=15, max_classes=1, max_objects=3)

Responsible for endless Animation of moving boxes. Mother class that can be inherited for various drawings of moving objects.

Parameters

height – frame height
width – frame width
channels – frame channels (either 1 or 3)
max_stop – animation random pauses
max_classes – maximum number of classes
max_objects – maximum number of objects

class metavision_ml.data.moving_box.MovingSquare(h=300, w=300, max_stop=15, max_classes=3)

Responsible for endless MovingSquare

Parameters

h – frame height
w – frame width
max_stop – randomly pause for this many steps
max_classes – maximum number of classes

reset(): Resets internal variables

reset_speed(): Resets Speed Variables

metavision_ml.data.moving_box.clamp_xyxy(x1, y1, x2, y2, width, height)

Clamps a box to a frame

Parameters

x1 – top left corner x
y1 – top left corner y
x2 – bottom right corner x
y2 – bottom right corner y

Returns

clamped positions

metavision_ml.data.moving_box.move_box(x1, y1, x2, y2, vx, vy, vs, width, height, min_width, min_height)

Move bounding box around in a frame using velocity vx, vy & vscale. It returns the moved box and a flag saying if you need to change the speed because it did a collision with a wall.

Parameters

x1 – top left corner x
y1 – top left corner y
x2 – bottom right corner x
y2 – bottom right corner y
vx – x speed
vy – y speed
vs – scale speed
width – frame width
height – frame height
min_width – minimal box width
min_height – minimal box height

Returns

moved box

metavision_ml.data.moving_box.rotate(x, y, xo, yo, theta)

Rotates a point w.r.t origin (x0,y0)

Parameters

x – point x coordinate
y – point y coordinate

Returns

rotated point

Toy Problem Dataset that serves as an example of our streamer dataloader.

This displays moving digits from MNIST database. The digit varies in size and position. To use this class, download MNIST.ZIP at https://kdrive.infomaniak.com/app/share/975517/3c529307-3cec-4fc6-bbb3-87a95c6ef6cf

The dataset both generates chained video clips and provides bounding box with correct class id.

The dataset procedurally generates the video clips, so it is an “Iterable” kind of dataset

class metavision_ml.data.moving_mnist.MovingMNISTDataset(tbins, num_workers, batch_size, height, width, max_frames_per_video, max_frames_per_epoch, train, dataset_dir='.')

Creates the dataloader for moving mnist

Parameters

tbins – number of steps per batch
num_workers – number of parallel workers
batch_size – number of animations
height – animation height
width – animation width
max_frames_per_video – maximum frames per animation (must be greater than tbins)
max_frames_per_epoch – maximum frames per epoch
train – use training part of MNIST dataset.
dataset_dir – directory where MNIST dataset is stored

class metavision_ml.data.moving_mnist.MovingMnist(idx, tbins, height, width, train, max_frames_per_video, channels=3, max_stop=15, max_objects=2, drop_labels_p=0, data_caching_path='.')

Moving Mnist Animation

Parameters

idx – unique id
tbins – number of steps delivered at once
height – frame height (must be at least 64 pix)
width – frame width (must be at least 64 pix)
max_stop – random pause in animation
max_objects – maximum number of objects per animation
train – use training/ validation part of MNIST
max_frames_per_video – maximum frames per video before reset
drop_labels_p – probability to drop the annotation of certain frames (in which case it is marked in the mask)
data_caching_path – where to store the MNIST dataset

metavision_ml.data.moving_mnist.collate_fn(data_list)

this collates batch parts to a single dictionary

Parameters: data_list – batch parts