SDK ML Data API

Subclassing Torch dataset to load DAT files and labels from events and wrapping them using the dataloader class. It supports currently DAT and HDF5 files, although we recommend to use the latter.

This class is generic to any type of labels. A function should be provided to load them.

class metavision_ml.data.sequential_dataset.SequentialDataLoader(files, delta_t, preprocess_function_name, array_dim, load_labels=<function load_labels_stub>, durations=[], batch_size=8, num_workers=2, preprocess_kwargs={}, shuffle=False, padding=False, transforms=None)

SequentialDataLoader uses a pytorch DataLoader to read batches chronologically.

It is used simply as an iterator and returns a dictionary containing the following keys:

  • inputs a torch.tensor of shape num_tbins x batch_size x channel x height x width.

    Note that it is normalized to 1. The dtype depends on the preprocessing function used but can by specifying the preprocess_kwargs.

  • labels is the list of list of the labels provided by the load_labels function.

  • mask_keep_memory a float array of shape batch_size, with values in (0., 1.) indicating

    whether memory is kept or reset at the beginning of the sequence.

  • frame_is_labeled a boolean array of shape num_tbins x batch_size, indicating whether the

    corresponding labels can be used for loss computation. (id est if the labels are valid or not).

  • video_infos is a list of (FileMetadata, batch_start_time, duration) of size batch_size containing

    infos about each recording in the batch.

batch_size

Number of sequences being read concurrently. This can affect the loading time of the batch and has effect on the gradient statistics.

Type

int

num_workers

Number of processes being used by the DataLoader, 0 means it uses Python’s main process. More processes help with speed but up to a point: too many processes can actually hurt loading times.

Type

int

max_consecutive_batch

Maximum number of consecutive batches allowed in a sequence. If a file is longer than max_consecutive_batch x num_tbins x delta_t the rest will be considered as part of another sequence. If None, the full length of the sequence will be used.

Type

int

device

Indicates on which device (cpu or cuda for instance) the data will be put.

Type

torch.device

dataset

Instance of SequentialDataset that is used to load the data, or possibly change the scheduling. Note that if the dataset is changed, that change won’t take effect until the next iteration of the DataLoader.

Type

SequentialDataset

Parameters
  • files (list) – List of input files. Can be either DAT files or HDF5 files.

  • delta_t (int) – Timeslice delta_t in us.

  • preprocess_function_name (string) – Name of the preprocessing function used to turn events into features. Can be any of the functions present in metavision_ml.preprocessing or one registered by the user.

  • array_dim (int list) – Dimension of feature tensors: (num_tbins, channels, sensor_height // 2^k, sensor_width >> 2^k)

  • load_labels – function providing labels (see load_labels_stub).

  • durations (int list) – Optionally you can provide the durations in us to all the files in input. This allows to save a bit of time when there are many of them. If you provide a duration that is shorter than the actual duration of a sequence, only part of it will be read.

  • batch_size (int) – Number of sequences being read concurrently. This can affect the loading time of the batch and has effect on the gradient statistics.

  • num_workers (int) – Number of processes being used by the DataLoader, 0 means it uses Python’s main process. More processes help with speed but up to a point: too many processes can actually hurt loading times.

  • preprocess_kwargs – dictionary of optional arguments to the preprocessing function. This can be used to override the default value of max_incr_per_pixel for instance. {“max_incr_per_pixel”: 20} to clip and normalize tensors by 20.

  • shuffle (boolean) – If True, breaks the temporal continuity between batches. This should be only used when training a model without memory.

  • padding (boolean) – If True, at the end of an epoch the Dataset will run with incomplete batches when it can’t read a complete one until all data is read. The last incomplete batches will contain FileMetadata object, with padding = True so that no loss is computed on them. If False, the epoch stops after the last complete batch. This can be used to make sure that evaluation is computed on the whole test set for example.

  • transforms (torchvision Transforms) – Transformations to be applied to each frame of a sequence.

Examples

>>> array_dim = [5, 2, 480, 640]
>>> dataloader = SequentialDataLoader(['train/file1.dat', 'train/file1.dat'], 50000, "histo", array_dim)
>>> for ind, data_dic in enumerate(dataloader):
>>>     batch = data_dic["inputs"]
>>>     targets = data_dic["labels"]
>>>     mask = data_dic["mask_keep_memory"]
>>>     frame_is_labeled = data_dic["frame_is_labeled"]
cpu()

Sets the SequentialDataLoader to leave tensors on CPU.

cuda(device=device(type='cuda'))

Sets the SequentialDataLoader to copy tensors to GPU memory before returning them.

Parameters

device (torch.device) – The destination GPU device. Defaults to the current CUDA device.

get_vis_func()

Returns the visualization function corresponding to the preprocessing being used.

show(viz_labels=None)

Visualizes batches of the DataLoader in parallel with open cv.

This returns a generator that draws the input and also the labels if a “viz_labels” function is provided.

Parameters

viz_labels (function) – Optionally take a visualization function for labels. Its signature is - img (np.ndarray) a image of size (height, width, 3) and of dtype np.uint8 - labels as defined in your load_labels function.

to(device)

Sets the SequentialDataLoader to copy tensors to the given device before returning them.

Parameters

device (torch.device) – The destination GPU device. For instance torch.device(‘cpu’) or torch.device(‘cuda’).

class metavision_ml.data.sequential_dataset.SequentialDataset(files, delta_t, preprocess_function_name, array_dim, load_labels=<function load_labels_stub>, durations=[], batch_size=8, preprocess_kwargs={}, padding=False, transforms=None)

Subclass of torch.data.dataset designed to stream batch of sequences chronologically.

It will read data sequentially from the same file until it jumps to another file which will also be read sequentially.

Usually it is used in conjunction with the SequentialDataLoader, in which case this object is directly initialized by the SequentialDataLoader itself.

Parameters
  • files (list) – List of input files. Can be either DAT files or HDF5 files.

  • delta_t (int) – Timeslice delta_t in us.

  • preprocess_function_name (string) – Name of the preprocessing function used to turn events into features. Can be any of the functions present in metavision_ml.preprocessing or one registered by the user.

  • array_dim (int list) – Dimension of feature tensors: (num_tbins, channels, sensor_height * 2^-k, sensor_width * 2^-k)

  • load_labels (function) –

  • batch_size (int) – Number of sequences being read concurrently. This can affect the loading time of the batch and has effect on the gradient statistics.

  • preprocess_kwargs – dictionary of optional arguments to the preprocessing function.

  • padding (boolean) – If True, at the end of an epoch the Dataset will run with incomplete batches when it can’t read a complete one until all data is read. The last incomplete batches will contain FileMetadata object, with padding = True so that no loss is computed on them. If False, the epoch stops after the last complete batch. This can be used to make sure that evaluation is computed on the whole test set for example.

  • transforms (torchvision Transforms) – Transformations to be applied to each frame of a sequence.

downsampling_factor

Parameter used to reduce the spatial dimension of the obtained feature. Actually multiply the coordinates by 2**(-downsampling_factor).

Type

int

get_batch_metadata(batch_idx)

Gets the metadata information of the batch obtained from the batch indices.

Returns

List of tuple composed of (FileMetadata, start list time of sequence in us, duration of sequence in us).

get_size()

Returns height and width of histograms/features, i.e. size after downsampling_factor.

get_size_original()

Returns height and width of input events before downscaling.

get_unique_files()

Returns a unique list of FileMetadata. It is useful in case of a curriculum learning (launch using reschedule) where there is several occurrences of the same file with different start_ts.

reschedule(max_consecutive_batch, shuffle=True)

Recomputes a new schedule corresponding to the same files but a different max_consecutive_batch parameter.

This is useful when trying to do curriculum learning when you want to feed your model with sequence of increasing duration. Alternatively if you don’t want to change any parameters you can simply use the shuffle function.

Parameters
  • max_consecutive_batch (int) – Maximum number of consecutive batches allowed in a sequence. If a file is longer than max_consecutive_batch x num_tbins x delta_t the rest will be considered as part of another sequence. If None, the full length of the sequence will be used.

  • shuffle (boolean) – Whether to apply a random shuffle to the list of files. Setting it to True, is recommended.

shuffle(seed=None)

Shuffles the list of input files.

metavision_ml.data.sequential_dataset.collate_fn(data_list)

Builds a batch from the result of the different __getitem__ calls of the Dataset. This function helps define the DataLoader behaviour.

By doing so it puts the temporal dimensions (each time bin) as the first dimension and the batch dimension becomes second.

Parameters

data_list (tuple list) – List where each item is a tuple composed of a tensor, the labels, the keep memory mask and the frame_is_labeled mask.

Returns

see SequentialDataLoader

Return type

dictionary

metavision_ml.data.sequential_dataset.load_labels_stub(metadata, start_time, duration, tensor)

This is a stub implementation of a function to load label data.

This function doesn’t actually load anything and should be passed to the SequentialDataset for

self-supervised training when no actual labelling is required.

Parameters
  • metadata (FileMetadata) – This class contains information about the sequence that is being read. Ideally the path for the labels should be deducible from metadata.path.

  • start_time (int) – Time in us in the file at which we start reading.

  • duration (int) – Duration in us of the data we need to read from said file.

  • tensor (torch.tensor) – Torch tensor of the feature for which labels are loaded. It can be used for instance to filter out the labels in area where there is no events.

Returns

labels should be indexable by time bin (to differentiate the labels of each time bin). It could

therefore be a list of length num_tbins.

(boolean nd array): This boolean mask array of length num_tbins indicates

whether the frame contains a label. It is used to differentiate between time_bins that actually contain an empty label (for instance no bounding boxes) from time bins that weren’t labeled due to cost constraints. The latter timebins shouldn’t contribute to supervised losses used during training.

Utils for sequential datasets, works for sequential_dataset_map_style and sequential_dataset_iterable_style

metavision_ml.data.sequential_dataset_common.collate_fn(data_list)

Builds a batch from the result of the different __getitem__ calls of the Dataset. This function helps define the DataLoader behaviour.

By doing so it puts the temporal dimensions (each time bin) as the first dimension and the batch dimension becomes second.

Parameters

data_list (tuple list) – List where each item is a tuple composed of a tensor, the labels, the keep memory mask and the frame_is_labeled mask.

Returns

see SequentialDataLoader

Return type

dictionary

metavision_ml.data.sequential_dataset_common.load_labels_stub(metadata, start_time, duration, tensor)

This is a stub implementation of a function to load label data.

This function doesn’t actually load anything and should be passed to the SequentialDataset for

self-supervised training when no actual labelling is required.

Parameters
  • metadata (FileMetadata) – This class contains information about the sequence that is being read. Ideally the path for the labels should be deducible from metadata.path.

  • start_time (int) – Time in us in the file at which we start reading.

  • duration (int) – Duration in us of the data we need to read from said file.

  • tensor (torch.tensor) – Torch tensor of the feature for which labels are loaded. It can be used for instance to filter out the labels in area where there is no events.

Returns

labels should be indexable by time bin (to differentiate the labels of each time bin). It could

therefore be a list of length num_tbins.

(boolean nd array): This boolean mask array of length num_tbins indicates

whether the frame contains a label. It is used to differentiate between time_bins that actually contain an empty label (for instance no bounding boxes) from time bins that weren’t labeled due to cost constraints. The latter timebins shouldn’t contribute to supervised losses used during training.

metavision_ml.data.sequential_dataset_common.show_dataloader(dataloader, height, width, vis_func, viz_labels=None)

Visualizes batches of the DataLoader in parallel with open cv.

This returns a generator that draws the input and also the labels if a “viz_labels” function is provided.

Parameters
  • dataloader (DataLoader) – iterable of batch of sequential features.

  • height (int) – height of the feature maps provided by the dataloader.

  • width (int) – width of the feature maps provided by the dataloader

  • viz_func (function) – the visualization function corresponding to the preprocessing being used. Takes a tensor

  • shape channels x height x width and turns it into a RGB height width x 3 uint8 image. (of) –

  • viz_labels (function) – Optionally take a visualization function for labels. Its signature is - img (np.ndarray) a image of size (height, width, 3) and of dtype np.uint8 - labels as defined in your load_labels function.

Scheduler is a File agnostic class that does the scheduling of sequence for a dataloader.

class metavision_ml.data.scheduler.FileMetadata(file, duration, delta_t, num_tbins, labels=None, start_ts=0, padding=False)

Metadata class describing a sequence.

Parameters
  • file (str) – Path to the sequence file.

  • duration (int) – Sequence duration in us.

  • delta_t (int) – Duration of a time bin in us.

  • num_tbins (int) – Number of time bins together.

  • labels (str) – Path to the label file for the sequence.

  • start_ts (int) – Timestamps at which we start reading the sequence. effectively cuts it.

  • padding (boolean) – Whether the object is padding (i.e. the FileMetadata is associated to no file or labels and is just here in case of incomplete batches.)

path

Path to the sequence file

Type

str

duration

Sequence duration in us

Type

int

delta_t

Duration of a time bin in us

Type

int

num_tbins

Number of time bins together

Type

int

labels

Path to the label file for the sequence

Type

str

start_ts

Timestamps at which we start reading the sequence. effectively cuts it

Type

int

padding

Whether the object is padding (i.e. the FileMetadata is associated to no file or labels and is just here in case of incomplete batches.)

Type

boolean

get_original_size()

Returns the couple (height, width) of a file before any downsampling was optionally done.

This corresponds to the resolution of the imager used to record the original data.

get_remaining_duration()

Returns the duration left considering the starting point.

is_padding()

Is padding data.

is_precomputed()

Is the data in a HDF5 File.

class metavision_ml.data.scheduler.Scheduler(filesmetadata, total_tbins_delta_t, batch_size, max_consecutive_batch=None, padding=False, base_seed=0)

File agnostic class that does the scheduling of sequence for a dataloader. Assumes a dataloader in non shuffle mode for temporal continuity.

Args :

filesmetadata (FileMetadata list): List of FileMetadata objects describing the dataset. total_tbins_delta_t (int): Duration in us of a sequence inside a minibatch. batch_size (int): Number of sequences being read concurrently. max_consecutive_batch (int): Maximum number of consecutive batches allowed in a sequence. If a

file is longer than max_consecutive_batch x total_tbins_delta_t the rest will be considered as part of another sequence. If None, the full length of the sequence will be used. This is used for curriculum learning to vary how long sequences are.

padding (boolean): If True, the Scheduler will run with incomplete batches when it can’t

read a complete one until all data is read. The last incomplete batches will contain FileMetadata object, with padding = True so that no loss is computed on them. If False, the Scheduler stops at the last complete batch

base_seed (int): consistent random seed associated with each epoch.

classmethod create_schedule(files, durations, delta_t, num_tbins, batch_size, labels=None, max_consecutive_batch=None, shuffle=True, padding=False)

Alternate way of constructing a Scheduler with paths and duration instead of FileMetadata list create a full schedule where everything is read

remove_files(files_to_remove)

Removes some files from the scheduler and reinitialize the schedule.

reschedule(max_consecutive_batch, num_tbins, delta_t, shuffle=True)

Returns a new schedule corresponding to the same files but some different parameters.

This is useful when trying to do curriculum learning when you want to feed your model with sequence of increasing duration. Alternatively if you don’t want to change any parameters you can simply use the shuffle function.

Parameters
  • max_consecutive_batch (int) – Maximum number of consecutive batches allowed in a sequence. If a file is longer than max_consecutive_batch x num_tbins x delta_t the rest will be considered as part of another sequence. If None, the full length of the sequence will be used.

  • num_tbins (int) – Number of time bins in each batch (also the first dimension of the input tensor)

  • delta_t (int) – In us duration of a single time bin.

  • shuffle (boolean) – Whether to apply a random shuffle to the list of files. If max_consecutive_batch is not None, this is heavily recommended.

Returns

scheduler, a new Scheduler object.

shuffle(seed=None)

Shuffles the FileMetadata list held by the Scheduler and reconstructs a schedule.

Parameters

seed (int) – seed value to make shuffling deterministic.

metavision_ml.data.scheduler.get_duration(path)

Returns duration of a file

metavision_ml.data.transformations.transform_sequence(sequence, metadata, transforms, base_seed=0)

Applies a series of 2d transformations to each frame and each channel of a sequence.

The metadata of the sequence is used to provide a seed.

Parameters
  • sequence (torch.tensor) – feature tensor of shape (num_time_bins, num_channels, height, width)

  • metadata (FileMetadata) – object describing the metadata of the sequence to which the tensor belongs.

  • transforms (torchvision.transforms) – transform to be applied to each channel of each frame.

  • base_seed (int) – base_seed to add to the sequence in order to have additionnal randomness. However it needs to be the constant within an epoch.

Returns

feature tensor of shape (num_time_bins, num_channels, height, width)

Return type

sequence (torch.tensor)


Collections of functions to add bounding box loading capabilities to the SequentialDataLoader

metavision_ml.data.box_processing.bboxes_to_box_vectors(bbox)

Converts back EventBbox bounding boxes to plain numpy array.

Parameters

bbox – np.ndarray Nx1 dtype EventBbox (x1,y1,w,h,score,conf,track_id)

WARNING: Here class id must be in 0-C (-1: ignore, 0: background, [1,C]: classes)

Returns

torch.array Nx6 dtype (x1,y1,x2,y2,label,track_id)

Return type

out

metavision_ml.data.box_processing.box_vectors_to_bboxes(boxes, labels, scores=None, track_ids=None, ts=0)

Concatenates box vectors into a structured array of EventBbox.

Parameters
  • boxes (np.ndarray) – Bboxes coordinates (x1,y2,x2,y2).

  • labels (np.ndarray) – Class index for each box.

  • scores (np.ndarray) – Score for each box.

  • track_ids (np.ndarray) – Individual track id for each box.

  • ts (int) – Timestamp in us.

Returns

Box with EventBbox.

Return type

box_events (np.ndarray)

metavision_ml.data.box_processing.clip_boxes(box_events, width_orig, height_orig)

Clips boxes so that they belong to the viewport width and height. Discards those that ends up being empty.

Parameters
  • box_events (structured np.ndarray) – Nx1 of dtype EventBbox

  • width_orig (int) – Original width of sensor for annotation

  • height_orig (int) – Original height of sensor for annotation

Returns

Nx1 of dtype EventBbox

Return type

box_events (structured np.ndarray)

metavision_ml.data.box_processing.could_frame_contain_valid_gt(batch_start_time, duration, labelling_delta_t, num_tbins)

This function returns a np.array of num_tbins boolean, indicating whether a frame was labeled or not.

This is useful if our recordings are labeled at a fix frame rate but we want to train at a higher framerate (i.e. small delta_t.) The number of frames in a batch (num_tbins) is the duration of this batch divided by delta_t

Note: If you train at faster frequency than your annotations it is also possible to interpolate your bounding box files offline to avoid this.

For example, given the following setup
  • num_tbins = 5 (number of frames in a batch)

  • delta_t = 50 (time of each frame)

  • labelling_delta_t = 120 (delta_t at which labels are provided)

  • duration = batch_size * delta_t = 250

-> this function will be called several times, with batch_start_time = 0, then 250, then 500, etc. Each time this function is called, it returns an array of 5 booleans to indicate which frames could contain a label:

           GT            GT              GT            GT            GT            GT
|            120           240|            360           480|          600           720  |
|             |             | |             |             | |           |             |   |
|             v             v |             v             v |           v             v   |
|     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
0    50    100   150   200   250   300   350   400   450   500   550   600   650   700   750
|                             |                             |                             |
|< F > < F > < T > < F > < T >|< F > < F > < T > < F > < T >|< F > < T > < F > < F > < T >|
|                             |                             |                             |
|<-------- first call ------->|<------- second call ------->|<-------- third call ------->|
|                             |                             |                             |

Same setup as before, but now with labelling_delta_t = 100 instead of 120:

         GT          GT          GT          GT          GT           GT         GT
|          100         200    |    300         400         500          600        700    |
|           |           |     |     |           |           |           |           |     |
|           v           v     |     v           v           v           v           v     |
|     |     |     |     |     |     |     |     |     |     |     |     |     |     |     |
0    50    100   150   200   250   300   350   400   450   500   550   600   650   700   750
|                             |                             |                             |
|< F > < T > < F > < T > < F >|< T > < F > < T > < F > < T >|< F > < T > < F > < T > < F >|
|                             |                             |                             |
|<-------- first call ------->|<------- second call ------->|<-------- third call ------->|
|                             |                             |                             |

Note: if labelling_delta_t <= delta_t, all frames could contain a valid GT

Note: If the FileMetadata is a pure distractor file (with no label at all), it will have a 1 us labelling_delta_t, and therefore all the frames will be considered labeled.

Parameters
  • batch_start_time – Time from when to start loading (in us).

  • duration – Duration to load (in us).

  • labelling_delta_t – Period (in us) of your labelling annotation system.

  • num_tbins – Number of frames to load.

Returns

(boolean nd array): This boolean mask array of length num_tbins indicates

whether the frame contains a label. It is used to differentiate between time_bins that actually contain an empty label (for instance no bounding boxes) from time bins that weren’t labeled due to cost constraints. The latter timebins shouldn’t contribute to supervised losses used during training.

Return type

frame_could_contain_gt

metavision_ml.data.box_processing.create_class_lookup(labelmap_path, wanted_keys=[])

Takes as argument a json path storing a dictionary with class_id as key and class_name as value for the ground truth. Takes also as argument a list of wanted keys (class_names that we want to select).

Parameters
  • labelmap_path (string) –

    Path to the label map ex of inside the json’{“0”: “pedestrian”,

    ”1”: “two wheeler”, “2”: “car”, “3”: “truck”

    }’

  • wanted_keys (list) – List of classes to extract example: [‘car’, ‘pedestrian’]

Returns

class_lookup numpy array [1, -1, 2, -1]

In the example we get 0 for background, 1 for pedestrians and 2 for cars. At the end, if you do new_label = class_lookup[gt_label] you can transform ground truth ids array in an array with ids that fit your network. Reminder : Ground truth does not have id for background. For our network we get id 0 for background and consecutive ids for other classes.

metavision_ml.data.box_processing.filter_boxes(box_events, class_lookup, idx_to_filter, ignore_filtered)

Filters or ignores boxes or in box_events according to idx_to_filter.

Ignored boxes are still present but are marked with a -1 class_id. At the loss computation stage this information can be used so that they don’t contribute to the loss.This ius used when you don’t want the proposals matched with those ignored boxes to be considered as False positives in the loss. For instance if you train on cars only in a dataset containing trucks they could be ignored.

Parameters
  • box_events (np.ndarray) – Box events.

  • class_lookup (int list) – Lookup table for converting class indices to contiguous int values.

  • idx_to_filter (np.ndarray) – Boxes indices to filter out or ignore (see below).

  • ignore_filtered (bool) – If true, ignores the boxes filtered in the loss those boxes are marked with a -1 class_id in order to discard them in a loss.

Returns

Box_events with class_id translated using the class_lookup.

Return type

(np.ndarray)

metavision_ml.data.box_processing.filter_empty_tensor(tensor, box_events, area_box_filter=0.1, shift=0, time_per_bin=10000, batch_start_time=0, last_time_to_filter=None)

after a tensor of shape (num_bin, channels, h,w ) has been built thanks to the events we use it to discard boxes with no non nul data inside of it this is way more efficient than performing this computation on events directly

metavision_ml.data.box_processing.load_box_events(metadata, batch_start_time, duration)

Fetches box events from FileMetadata object, batch_start_time & duration.

Parameters
  • metadata (object) – Record details.

  • batch_start_time (int) – (us) Where to seek in the file to load corresponding bounding boxes

  • duration (int) – (us) How long to load events from bounding box file

Returns

Nx1 of dtype EventBbox

Return type

box_events (structured np.ndarray)

metavision_ml.data.box_processing.load_boxes(metadata, batch_start_time, duration, tensor, **kwargs)

Function to fetch boxes and preprocess them. Should be passed to a SequentialDataLoader.

Since this function has additional arguments compared to load_labels_stub, one has to specialize it:

Examples

>>> from functools import partial
>>> n_classes = 21
>>> class_lookup = np.arange(n_classes)  # each class is mapped to itself
>>> load_boxes_function = partial(load_boxes, class_lookup=class_lookup)
Parameters
  • metadata (object) – Record details.

  • batch_start_time (int) – (us) Where to seek in the file to load corresponding bounding boxes

  • duration (int) – (us) How long to load events from bounding box file

  • tensor (np.ndarray) – Current preprocessed input, can be used for data dependent preprocessing, for instance remove boxes without any features in them.

  • class_lookup (np.array) – Look up array for class indices.

  • labelling_delta_t (int) – Indicates the period of labelling in order to only consider time bins with actual labels when computing the loss.

  • min_box_diag (int) – Diagonal value under which boxes are not considerated. Defaults to 60 pixels.

Returns

List of structured array of dtype EventBbox corresponding to each time

bins.

frames_contain_gt (np.ndarray): This boolean mask array of length num_tbins indicates

whether the frame contains a label. It is used to differentiate between time bins that actually contain an empty label (for instance no bounding boxes) from time bins that weren’t labeled due to cost constraints. The latter time bins shouldn’t contribute to supervised losses used during training.

Return type

boxes (List[np.ndarray])

metavision_ml.data.box_processing.nms(box_events, scores, iou_thresh=0.5)

NMS on box_events

Parameters
  • box_events (np.ndarray) – nx1 with dtype EventBbox, the sorting order of those box is used as a a criterion for the nms.

  • scores (np.ndarray) – nx1 dtype of plain dtype, needs to be argsortable.

  • iou_thresh (float) – if two boxes overlap with more than iou_thresh (intersection over union threshold) with each other, only the one with the highest criterion value is kept.

Returns

Indices of the box to keep in the input array.

Return type

keep (np.ndarray)

metavision_ml.data.box_processing.nms_by_class(box_events, scores, iou_thresh=0.5)

NMS on box_events done independently by class

Parameters
  • box_events (np.ndarray) – nx1 with dtype EventBbox , the sorting order of those box is used as a a criterion for the nms.

  • scores (np.ndarray) – nx1 dtype of plain dtype, needs to be argsortable.

  • iou_thresh (float) – if two boxes overlap with more than iou_thresh (intersection over union threshold) with each other, only the one with the highest criterion value is kept.

Returns

Indices of the box to keep in the input array.

Return type

keeps (np.ndarray)

metavision_ml.data.box_processing.rescale_boxes(box_events, width_orig, height_orig, width_dst, height_dst)

Rescales boxes to new height and width.

Parameters
  • box_events (structured np.ndarray) – Array of length n of dtype EventBbox.

  • width_orig (int) – Original width of sensor for annotation.

  • height_orig (int) – Original height of sensor for annotation.

  • width_dst (int) – Destination width.

  • height_dst (int) – Destination height.

Returns

Array of length n of dtype EventBbox.

Return type

box_events (structured np.ndarray)

metavision_ml.data.box_processing.split_boxes(box_events, batch_start_time, delta_t=None, num_tbins=None)

Split box_events to a list of box events clustered by delta_t Removes a bounding box from the input list box_events if: there are less than min_box_area_thr*bbox_area events in the box and timestamp of bbox < last_time_to_filter”

Box times are in range(0, num_tbins*tbin)

Parameters
  • box_events (structured np.ndarray) – Box events inputs of type EventBbox

  • delta_t (optional int) – Duration of time bin in us. Used for chronological NMS.

  • num_tbins (optional int) – Number of time bins.

Returns

List of box_events of type EventBbox separated in time bins.

Return type

box_events (np.ndarray list)


Iterator of feature tensor for a source of input events.

class metavision_ml.data.cd_processor_iterator.CDProcessorIterator(path, preprocess_function_name, mode='delta_t', start_ts=0, max_duration=None, delta_t=50000, n_events=10000, num_tbins=1, preprocess_kwargs={}, device=device(type='cpu'), height=None, width=None, transforms=None, base_seed=0, **kwargs)

Provides feature tensors (torch.Tensor) at regular intervals.

Relies on the EventsIterator class. The different behaviours of EventsIterator can be leveraged.

Parameters
  • path (string) – Path to the file to read, or empty for a camera.

  • preprocess_function_name (string) – Name of the preprocessing function used to turn events into features. Can be any of the functions present in metavision_ml.preprocessing or one registered by the user.

  • mode (str) – Mode of Streaming (n_event, delta_t, mixed)

  • start_ts (int) – Start of EventIterator

  • max_duration (int) – Total Duration of EventIterator

  • delta_t (int) – Duration of used events slice in us.

  • num_tbins (int) – Number of TimeBins

  • preprocess_kwargs – dictionary of optional arguments to the preprocessing function. This can be used to override the default value of max_incr_per_pixel For instance. {“max_incr_per_pixel”: 20} to clip and normalize tensors by 20 at full resolution.

  • device (torch.device) – Torch device (defaults to cpu).

  • height (int) – if None the features are not downsampled, however features are downsampled to height which must be the sensor’s height divided by a power of 2.

  • width (int) – if None the features are not downsampled, however features are downsampled to width which must be the sensor’s width divided by a power of 2.

  • transforms (torchvision Transforms) – Transformations to be applied to each frame of a sequence.

  • base_seed (int) – seed to change the random transformation when applicable, if None use time as seed.

  • **kwargs – Arbitrary keyword arguments passed to the underlying EventsIterator.

mv_it

object used to read from the file or the camera.

Type

EventsIterator

array_dim

shape of the tensor (channel, height, width).

Type

tuple

cd_proc

class computing features from events into a preallocated memory array.

Type

CDProcessor

step

counter of iterations.

Type

int

event_input_height

original height of the sensor in pixels.

Type

int

event_input_width

original width of the sensor in pixels.

Type

int

base_seed

seed to change the random transformation when applicable.

Type

int

Examples

>>> path = "example.raw"
>>> for tensor in Preprocessor(path, "event_cube", delta_t=10000):
>>>     # Returns a torch Tensor.
>>>     print(tensor.shape)
get_time()

Cut Inner Reader Time

get_vis_func()

Returns the visualization function corresponding to the preprocessing being used.

class metavision_ml.data.cd_processor_iterator.HDF5Iterator(path, num_tbins=1, preprocess_kwargs={}, start_ts=0, device=device(type='cpu'), height=None, width=None, transforms=None, base_seed=0)

Provides feature tensors (torch.Tensor) at regular intervals from a precomputed HDF5 file.

Parameters
  • path (string) – Path to the HDF5 file containing precomputed features.

  • height (int) – if None the features are not downsampled, however features are downsampled to height which must be the sensor’s height divided by a power of 2.

  • width (int) – if None the features are not downsampled, however features are downsampled to width which must be the sensor’s width divided by a power of 2.

  • device (torch.device) – Torch device (defaults to cpu).

  • start_ts (int) – First timestamp to consider in us. (Must be a multiple of the HDF5 file delta_t)

  • transforms (torchvision Transforms) – Transformations to be applied to each frame of a sequence.

  • base_seed (int) – seed to change the random transformation when applicable, if None use time as seed.

dataset

hDF5 dataset containg the precomputed features.

Type

h5py.Dataset

array_dim

shape of the tensor (channel, height, width).

Type

tuple

preprocess_dict

dictionary of the parameters used.

Type

dictionary

step

counter of iterations.

Type

int

event_input_height

original height of the sensor in pixels.

Type

int

event_input_width

original width of the sensor in pixels.

Type

int

base_seed

seed to change the random transformation when applicable.

Type

int

Examples

>>> path = "example.raw"
>>> for tensor in Preprocessor(path, "event_cube", delta_t=10000):
>>>     # Returns a torch Tensor.
>>>     print(tensor.shape)
checks(preprocess_function_name, delta_t)

Convenience function to assert precomputed parameters

Parameters
  • preprocess_function_name (string) – Name of the preprocessing function used to turn events into features. Can be any of the functions present in metavision_ml.preprocessing or one registered by the user.

  • delta_t (int) – Duration of used events slice in us.

get_time()

Cut Inner Reader Time

get_vis_func()

Returns the visualization function corresponding to the preprocessing being used.

This class allows to stream a dataset of .raw or .dat

This is yet another example how to use data.multistream_dataloader Here we go further and integrate with the same interface as SequentialDataLoader

class metavision_ml.data.cd_processor_dataset.CDProcessorDataLoader(files, mode, delta_t, n_events, max_duration, preprocess_function_name, num_tbins, batch_size, num_workers=2, height=None, width=None, preprocess_kwargs={}, load_labels=None, padding_mode='zeros', transforms=None)

Attempt at doing the same interface than SequentialDataloader but using multistream_dataloader implementation.

get_vis_func()

Returns the visualization function corresponding to the preprocessing being used.

class metavision_ml.data.cd_processor_dataset.CDProcessorDatasetIterator(path, height_out, width_out, load_labels, mode, n_events, delta_t, num_tbins, preprocess_function_name, preprocess_kwargs={}, start_ts=0, max_duration=None, transforms=None, base_seed=None)

This iterator reads events or preprocessed tensors, computes tensors, load labels and retrieves them difference with sequential_dataset_v1 is that load_labels cannot be a pure function it has to be a class


This class simulates a moving box in translation and zoom in a frame.

class metavision_ml.data.moving_box.Animation(height, width, channels, max_stop=15, max_classes=1, max_objects=3)

Responsible for endless Animation of moving boxes. Mother class that can be inherited for various drawings of moving objects.

Parameters
  • height – frame height

  • width – frame width

  • channels – frame channels (either 1 or 3)

  • max_stop – animation random pauses

  • max_classes – maximum number of classes

  • max_objects – maximum number of objects

class metavision_ml.data.moving_box.MovingSquare(h=300, w=300, max_stop=15, max_classes=3)

Responsible for endless MovingSquare

Parameters
  • h – frame height

  • w – frame width

  • max_stop – randomly pause for this many steps

  • max_classes – maximum number of classes

reset()

Resets internal variables

reset_speed()

Resets Speed Variables

metavision_ml.data.moving_box.clamp_xyxy(x1, y1, x2, y2, width, height)

Clamps a box to a frame

Parameters
  • x1 – top left corner x

  • y1 – top left corner y

  • x2 – bottom right corner x

  • y2 – bottom right corner y

Returns

clamped positions

metavision_ml.data.moving_box.move_box(x1, y1, x2, y2, vx, vy, vs, width, height, min_width, min_height)

Move bounding box around in a frame using velocity vx, vy & vscale. It returns the moved box and a flag saying if you need to change the speed because it did a collision with a wall.

Parameters
  • x1 – top left corner x

  • y1 – top left corner y

  • x2 – bottom right corner x

  • y2 – bottom right corner y

  • vx – x speed

  • vy – y speed

  • vs – scale speed

  • width – frame width

  • height – frame height

  • min_width – minimal box width

  • min_height – minimal box height

Returns

moved box

metavision_ml.data.moving_box.rotate(x, y, xo, yo, theta)

Rotates a point w.r.t origin (x0,y0)

Parameters
  • x – point x coordinate

  • y – point y coordinate

Returns

rotated point

Toy Problem Dataset that serves as an example of our streamer dataloader.

This displays moving digits from MNIST database. The digit varies in size and position.

The dataset both generates chained video clips and provides bounding box with correct class id.

The dataset procedurally generates the video clips, so it is an “Iterable” kind of dataset

class metavision_ml.data.moving_mnist.MovingMNISTDataset(tbins, num_workers, batch_size, height, width, max_frames_per_video, max_frames_per_epoch, train, dataset_dir='.')

Creates the dataloader for moving mnist

Parameters
  • tbins – number of steps per batch

  • num_workers – number of parallel workers

  • batch_size – number of animations

  • height – animation height

  • width – animation width

  • max_frames_per_video – maximum frames per animation (must be greater than tbins)

  • max_frames_per_epoch – maximum frames per epoch

  • train – use training part of MNIST dataset.

  • dataset_dir – directory where MNIST dataset is stored (will be downloaded if necessary)

class metavision_ml.data.moving_mnist.MovingMnist(idx, tbins, height, width, train, max_frames_per_video, channels=3, max_stop=15, max_objects=2, drop_labels_p=0, data_caching_path='.')

Moving Mnist Animation

Parameters
  • idx – unique id

  • tbins – number of steps delivered at once

  • height – frame height (must be at least 64 pix)

  • width – frame width (must be at least 64 pix)

  • max_stop – random pause in animation

  • max_objects – maximum number of objects per animation

  • train – use training/ validation part of MNIST

  • max_frames_per_video – maximum frames per video before reset

  • drop_labels_p – probability to drop the annotation of certain frames (in which case it is marked in the mask)

  • data_caching_path – where to store the MNIST dataset

metavision_ml.data.moving_mnist.collate_fn(data_list)

this collates batch parts to a single dictionary

Parameters

data_list – batch parts