SDK ML Data API
Subclassing Torch dataset to load DAT files and labels from events and wrapping them using the dataloader class. It supports currently DAT and HDF5 files, although we recommend to use the latter.
This class is generic to any type of labels. A function should be provided to load them.
- class metavision_ml.data.sequential_dataset.SequentialDataLoader(files, delta_t, preprocess_function_name, array_dim, load_labels=<function load_labels_stub>, durations=[], batch_size=8, num_workers=2, preprocess_kwargs={}, shuffle=False, padding=False, transforms=None)
SequentialDataLoader uses a pytorch DataLoader to read batches chronologically.
It is used simply as an iterator and returns a dictionary containing the following keys:
- inputs a torch.tensor of shape num_tbins x batch_size x channel x height x width.
Note that it is normalized to 1. The dtype depends on the preprocessing function used but can by specifying the preprocess_kwargs.
labels is the list of labels provided by the load_labels function.
- mask_keep_memory a float array of shape batch_size, with values in (0., 1.) indicating
whether memory is kept or reset at the beginning of the sequence.
- frame_is_labeled a boolean array of shape num_tbins x batch_size, indicating whether the
corresponding labels can be used for loss computation. (id est if the labels are valid or not).
- video_infos is a list of (FileMetadata, batch_start_time, duration) of size batch_size containing
infos about each recording in the batch.
- batch_size
Number of sequences being read concurrently. This can affect the loading time of the batch and has effect on the gradient statistics.
- Type
int
- num_workers
Number of processes being used by the DataLoader, 0 means it uses Python’s main process. More processes help with speed but up to a point: too many processes can actually hurt loading times.
- Type
int
- max_consecutive_batch
Maximum number of consecutive batches allowed in a sequence. If a file is longer than max_consecutive_batch x num_tbins x delta_t the rest will be considered as part of another sequence. If None, the full length of the sequence will be used.
- Type
int
- device
Indicates on which device (cpu or cuda for instance) the data will be put.
- Type
torch.device
- dataset
Instance of SequentialDataset that is used to load the data, or possibly change the scheduling. Note that if the dataset is changed, that change won’t take effect until the next iteration of the DataLoader.
- Type
- Parameters
files (list) – List of input files. Can be either DAT files or HDF5 files.
delta_t (int) – Timeslice delta_t in us.
preprocess_function_name (string) – Name of the preprocessing function used to turn events into features. Can be any of the functions present in metavision_ml.preprocessing or one registered by the user.
array_dim (int list) – Dimension of feature tensors: (num_tbins, channels, sensor_height // 2^k, sensor_width >> 2^k)
load_labels – function providing labels (see load_labels_stub).
durations (int list) – Optionally you can provide the durations in us to all the files in input. This allows to save a bit of time when there are many of them. If you provide a duration that is shorter than the actual duration of a sequence, only part of it will be read.
batch_size (int) – Number of sequences being read concurrently. This can affect the loading time of the batch and has effect on the gradient statistics.
num_workers (int) – Number of processes being used by the DataLoader, 0 means it uses Python’s main process. More processes help with speed but up to a point: too many processes can actually hurt loading times.
preprocess_kwargs – dictionary of optional arguments to the preprocessing function. This can be used to override the default value of max_incr_per_pixel for instance. {“max_incr_per_pixel”: 20} to clip and normalize tensors by 20.
shuffle (boolean) – If True, breaks the temporal continuity between batches. This should be only used when training a model without memory.
padding (boolean) – If True, at the end of an epoch the Dataset will run with incomplete batches when it can’t read a complete one until all data is read. The last incomplete batches will contain FileMetadata object, with padding = True so that no loss is computed on them. If False, the epoch stops after the last complete batch. This can be used to make sure that evaluation is computed on the whole test set for example.
transforms (torchvision Transforms) – Transformations to be applied to each frame of a sequence.
Examples
>>> array_dim = [5, 2, 480, 640] >>> dataloader = SequentialDataLoader(['train/file1.dat', 'train/file1.dat'], 50000, "histo", array_dim) >>> for ind, data_dic in enumerate(dataloader): >>> batch = data_dic["inputs"] >>> targets = data_dic["labels"] >>> mask = data_dic["mask_keep_memory"] >>> frame_is_labeled = data_dic["frame_is_labeled"]
- cpu()
Sets the SequentialDataLoader to leave tensors on CPU.
- cuda(device=device(type='cuda'))
Sets the SequentialDataLoader to copy tensors to GPU memory before returning them.
- Parameters
device (torch.device) – The destination GPU device. Defaults to the current CUDA device.
- get_vis_func()
Returns the visualization function corresponding to the preprocessing being used.
- show(viz_labels=None)
Visualizes batches of the DataLoader in parallel with open cv.
This returns a generator that draws the input and also the labels if a “viz_labels” function is provided.
- Parameters
viz_labels (function) – Optionally take a visualization function for labels. Its signature is - img (np.ndarray) a image of size (height, width, 3) and of dtype np.uint8 - labels as defined in your load_labels function.
- to(device)
Sets the SequentialDataLoader to copy tensors to the given device before returning them.
- Parameters
device (torch.device) – The destination GPU device. For instance torch.device(‘cpu’) or torch.device(‘cuda’).
- class metavision_ml.data.sequential_dataset.SequentialDataset(files, delta_t, preprocess_function_name, array_dim, load_labels=<function load_labels_stub>, durations=[], batch_size=8, preprocess_kwargs={}, padding=False, transforms=None)
Subclass of torch.data.dataset designed to stream batch of sequences chronologically.
It will read data sequentially from the same file until it jumps to another file which will also be read sequentially.
Usually it is used in conjunction with the SequentialDataLoader, in which case this object is directly initialized by the SequentialDataLoader itself.
- Parameters
files (list) – List of input files. Can be either DAT files or HDF5 files.
delta_t (int) – Timeslice delta_t in us.
preprocess_function_name (string) – Name of the preprocessing function used to turn events into features. Can be any of the functions present in metavision_ml.preprocessing or one registered by the user.
array_dim (int list) – Dimension of feature tensors: (num_tbins, channels, sensor_height * 2^-k, sensor_width * 2^-k)
load_labels (function) –
batch_size (int) – Number of sequences being read concurrently. This can affect the loading time of the batch and has effect on the gradient statistics.
preprocess_kwargs – dictionary of optional arguments to the preprocessing function.
padding (boolean) – If True, at the end of an epoch the Dataset will run with incomplete batches when it can’t read a complete one until all data is read. The last incomplete batches will contain FileMetadata object, with padding = True so that no loss is computed on them. If False, the epoch stops after the last complete batch. This can be used to make sure that evaluation is computed on the whole test set for example.
transforms (torchvision Transforms) – Transformations to be applied to each frame of a sequence.
- downsampling_factor
Parameter used to reduce the spatial dimension of the obtained feature. Actually multiply the coordinates by 2**(-downsampling_factor).
- Type
int
- get_batch_metadata(batch_idx)
Gets the metadata information of the batch obtained from the batch indices.
- Returns
List of tuple composed of (FileMetadata, start list time of sequence in us, duration of sequence in us).
- get_size()
Returns height and width of histograms/features, i.e. size after downsampling_factor.
- get_size_original()
Returns height and width of input events before downscaling.
- get_unique_files()
Returns a unique list of FileMetadata. It is useful in case of a curriculum learning (launch using reschedule) where there is several occurrences of the same file with different start_ts.
- reschedule(max_consecutive_batch, shuffle=True)
Recomputes a new schedule corresponding to the same files but a different max_consecutive_batch parameter.
This is useful when trying to do curriculum learning when you want to feed your model with sequence of increasing duration. Alternatively if you don’t want to change any parameters you can simply use the shuffle function.
- Parameters
max_consecutive_batch (int) – Maximum number of consecutive batches allowed in a sequence. If a file is longer than max_consecutive_batch x num_tbins x delta_t the rest will be considered as part of another sequence. If None, the full length of the sequence will be used.
shuffle (boolean) – Whether to apply a random shuffle to the list of files. Setting it to True, is recommended.
- shuffle(seed=None)
Shuffles the list of input files.
- metavision_ml.data.sequential_dataset.collate_fn(data_list)
Builds a batch from the result of the different __getitem__ calls of the Dataset. This function helps define the DataLoader behaviour.
By doing so it puts the temporal dimensions (each time bin) as the first dimension and the batch dimension becomes second.
- Parameters
data_list (tuple list) – List where each item is a tuple composed of a tensor, the labels, the keep memory mask and the frame_is_labeled mask.
- Returns
see SequentialDataLoader
- Return type
dictionary
- metavision_ml.data.sequential_dataset.load_labels_stub(metadata, start_time, duration, tensor)
This is a stub implementation of a function to load label data.
- This function doesn’t actually load anything and should be passed to the SequentialDataset for
self-supervised training when no actual labelling is required.
- Parameters
metadata (FileMetadata) – This class contains information about the sequence that is being read. Ideally the path for the labels should be deducible from metadata.path.
start_time (int) – Time in us in the file at which we start reading.
duration (int) – Duration in us of the data we need to read from said file.
tensor (torch.tensor) – Torch tensor of the feature for which labels are loaded. It can be used for instance to filter out the labels in area where there is no events.
- Returns
- labels should be indexable by time bin (to differentiate the labels of each time bin). It could
therefore be a list of length num_tbins.
- (boolean nd array): This boolean mask array of length num_tbins indicates
whether the frame contains a label. It is used to differentiate between time_bins that actually contain an empty label (for instance no bounding boxes) from time bins that weren’t labeled due to cost constraints. The latter timebins shouldn’t contribute to supervised losses used during training.
Utils for sequential datasets, works for sequential_dataset_map_style and sequential_dataset_iterable_style
- metavision_ml.data.sequential_dataset_common.collate_fn(data_list)
Builds a batch from the result of the different __getitem__ calls of the Dataset. This function helps define the DataLoader behaviour.
By doing so it puts the temporal dimensions (each time bin) as the first dimension and the batch dimension becomes second.
- Parameters
data_list (tuple list) – List where each item is a tuple composed of a tensor, the labels, the keep memory mask and the frame_is_labeled mask.
- Returns
see SequentialDataLoader
- Return type
dictionary
- metavision_ml.data.sequential_dataset_common.load_labels_stub(metadata, start_time, duration, tensor)
This is a stub implementation of a function to load label data.
- This function doesn’t actually load anything and should be passed to the SequentialDataset for
self-supervised training when no actual labelling is required.
- Parameters
metadata (FileMetadata) – This class contains information about the sequence that is being read. Ideally the path for the labels should be deducible from metadata.path.
start_time (int) – Time in us in the file at which we start reading.
duration (int) – Duration in us of the data we need to read from said file.
tensor (torch.tensor) – Torch tensor of the feature for which labels are loaded. It can be used for instance to filter out the labels in area where there is no events.
- Returns
- labels should be indexable by time bin (to differentiate the labels of each time bin). It could
therefore be a list of length num_tbins.
- (boolean nd array): This boolean mask array of length num_tbins indicates
whether the frame contains a label. It is used to differentiate between time_bins that actually contain an empty label (for instance no bounding boxes) from time bins that weren’t labeled due to cost constraints. The latter timebins shouldn’t contribute to supervised losses used during training.
- metavision_ml.data.sequential_dataset_common.show_dataloader(dataloader, height, width, vis_func, viz_labels=None)
Visualizes batches of the DataLoader in parallel with open cv.
This returns a generator that draws the input and also the labels if a “viz_labels” function is provided.
- Parameters
dataloader (DataLoader) – iterable of batch of sequential features.
height (int) – height of the feature maps provided by the dataloader.
width (int) – width of the feature maps provided by the dataloader
viz_func (function) – the visualization function corresponding to the preprocessing being used. Takes a tensor
image. (of shape channels x height x width and turns it into a RGB height width x 3 uint8) –
viz_labels (function) – Optionally take a visualization function for labels. Its signature is - img (np.ndarray) a image of size (height, width, 3) and of dtype np.uint8 - labels as defined in your load_labels function.
Scheduler is a File agnostic class that does the scheduling of sequence for a dataloader.
- class metavision_ml.data.scheduler.FileMetadata(file, duration, delta_t, num_tbins, labels=None, start_ts=0, padding=False)
Metadata class describing a sequence.
- Parameters
file (str) – Path to the sequence file.
duration (int) – Sequence duration in us.
delta_t (int) – Duration of a time bin in us.
num_tbins (int) – Number of time bins together.
labels (str) – Path to the label file for the sequence.
start_ts (int) – Timestamps at which we start reading the sequence. effectively cuts it.
padding (boolean) – Whether the object is padding (i.e. the FileMetadata is associated to no file or labels and is just here in case of incomplete batches.)
- path
Path to the sequence file
- Type
str
- duration
Sequence duration in us
- Type
int
- delta_t
Duration of a time bin in us
- Type
int
- num_tbins
Number of time bins together
- Type
int
- labels
Path to the label file for the sequence
- Type
str
- start_ts
Timestamps at which we start reading the sequence. effectively cuts it
- Type
int
- padding
Whether the object is padding (i.e. the FileMetadata is associated to no file or labels and is just here in case of incomplete batches.)
- Type
boolean
- get_original_size()
Returns the couple (height, width) of a file before any downsampling was optionally done.
This corresponds to the resolution of the imager used to record the original data.
- get_remaining_duration()
Returns the duration left considering the starting point.
- is_padding()
Is padding data.
- is_precomputed()
Is the data in an HDF5 file.
- class metavision_ml.data.scheduler.Scheduler(filesmetadata, total_tbins_delta_t, batch_size, max_consecutive_batch=None, padding=False, base_seed=0)
File agnostic class that does the scheduling of sequence for a dataloader. Assumes a dataloader in non shuffle mode for temporal continuity.
- Args :
filesmetadata (FileMetadata list): List of FileMetadata objects describing the dataset. total_tbins_delta_t (int): Duration in us of a sequence inside a minibatch. batch_size (int): Number of sequences being read concurrently. max_consecutive_batch (int): Maximum number of consecutive batches allowed in a sequence. If a
file is longer than max_consecutive_batch x total_tbins_delta_t the rest will be considered as part of another sequence. If None, the full length of the sequence will be used. This is used for curriculum learning to vary how long sequences are.
- padding (boolean): If True, the Scheduler will run with incomplete batches when it can’t
read a complete one until all data is read. The last incomplete batches will contain FileMetadata object, with padding = True so that no loss is computed on them. If False, the Scheduler stops at the last complete batch
base_seed (int): consistent random seed associated with each epoch.
- classmethod create_schedule(files, durations, delta_t, num_tbins, batch_size, labels=None, max_consecutive_batch=None, shuffle=True, padding=False)
Alternate way of constructing a Scheduler with paths and duration instead of FileMetadata list create a full schedule where everything is read
- remove_files(files_to_remove)
Removes some files from the scheduler and reinitialize the schedule.
- reschedule(max_consecutive_batch, num_tbins, delta_t, shuffle=True)
Returns a new schedule corresponding to the same files but some different parameters.
This is useful when trying to do curriculum learning when you want to feed your model with sequence of increasing duration. Alternatively if you don’t want to change any parameters you can simply use the shuffle function.
- Parameters
max_consecutive_batch (int) – Maximum number of consecutive batches allowed in a sequence. If a file is longer than max_consecutive_batch x num_tbins x delta_t the rest will be considered as part of another sequence. If None, the full length of the sequence will be used.
num_tbins (int) – Number of time bins in each batch (also the first dimension of the input tensor)
delta_t (int) – In us duration of a single time bin.
shuffle (boolean) – Whether to apply a random shuffle to the list of files. If max_consecutive_batch is not None, this is heavily recommended.
- Returns
scheduler, a new Scheduler object.
- shuffle(seed=None)
Shuffles the FileMetadata list held by the Scheduler and reconstructs a schedule.
- Parameters
seed (int) – seed value to make shuffling deterministic.
- metavision_ml.data.scheduler.get_duration(path)
Returns duration of a file
- metavision_ml.data.transformations.transform_ev_tensor(ev_tensor, file_path, transforms, base_seed=0)
Applies a series of 2d transformations to each frame and each channel of a ev_tensor.
- Parameters
ev_tensor (torch.tensor) – feature tensor of shape (num_ev_reps, num_channels, height, width).
file_path (string) – it will be used to calculate the seed.
transforms (torchvision.transforms) – transform to be applied to each channel of each frame.
base_seed (int) – base_seed to add to the sequence in order to have additionnal randomness. However it needs to be the constant within an epoch.
- Returns
feature tensor of shape (num_ev_reps, num_channels, height, width).
- Return type
ev_tensor (torch.tensor)
- metavision_ml.data.transformations.transform_sequence(sequence, metadata, transforms, base_seed=0)
Applies a series of 2d transformations to each frame and each channel of a sequence.
The metadata of the sequence is used to provide a seed.
- Parameters
sequence (torch.tensor) – feature tensor of shape (num_time_bins, num_channels, height, width)
metadata (FileMetadata) – object describing the metadata of the sequence to which the tensor belongs.
transforms (torchvision.transforms) – transform to be applied to each channel of each frame.
base_seed (int) – base_seed to add to the sequence in order to have additionnal randomness. However it needs to be the constant within an epoch.
- Returns
feature tensor of shape (num_time_bins, num_channels, height, width)
- Return type
sequence (torch.tensor)
Collections of functions to add bounding box loading capabilities to the SequentialDataLoader
- metavision_ml.data.box_processing.bboxes_to_box_vectors(bbox)
Converts back EventBbox bounding boxes to plain numpy array.
- Parameters
bbox – np.ndarray Nx1 dtype EventBbox (x1,y1,w,h,score,conf,track_id)
WARNING: Here class id must be in 0-C (-1: ignore, 0: background, [1,C]: classes)
- Returns
torch.array Nx6 dtype (x1,y1,x2,y2,label,track_id)
- Return type
out
- metavision_ml.data.box_processing.box_vectors_to_bboxes(boxes, labels, scores=None, track_ids=None, ts=0)
Concatenates box vectors into a structured array of EventBbox.
- Parameters
boxes (np.ndarray) – Bboxes coordinates (x1,y2,x2,y2).
labels (np.ndarray) – Class index for each box.
scores (np.ndarray) – Score for each box.
track_ids (np.ndarray) – Individual track id for each box.
ts (int) – Timestamp in us.
- Returns
Box with EventBbox.
- Return type
box_events (np.ndarray)
- metavision_ml.data.box_processing.clip_boxes(box_events, width_orig, height_orig)
Clips boxes so that they belong to the viewport width and height. Discards those that ends up being empty.
- Parameters
box_events (structured np.ndarray) – Nx1 of dtype EventBbox
width_orig (int) – Original width of sensor for annotation
height_orig (int) – Original height of sensor for annotation
- Returns
Nx1 of dtype EventBbox
- Return type
box_events (structured np.ndarray)
- metavision_ml.data.box_processing.could_frame_contain_valid_gt(batch_start_time, duration, labelling_delta_t, num_tbins)
This function returns a np.array of num_tbins boolean, indicating whether a frame was labeled or not.
This is useful if our recordings are labeled at a fix frame rate but we want to train at a higher framerate (i.e. small delta_t.) The number of frames in a batch (num_tbins) is the duration of this batch divided by delta_t
Note: If you train at faster frequency than your annotations it is also possible to interpolate your bounding box files offline to avoid this.
- For example, given the following setup
num_tbins = 5 (number of frames in a batch)
delta_t = 50 (time of each frame)
labelling_delta_t = 120 (delta_t at which labels are provided)
duration = batch_size * delta_t = 250
-> this function will be called several times, with batch_start_time = 0, then 250, then 500, etc. Each time this function is called, it returns an array of 5 booleans to indicate which frames could contain a label:
GT GT GT GT GT GT | 120 240| 360 480| 600 720 | | | | | | | | | | | | v v | v v | v v | | | | | | | | | | | | | | | | | 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 | | | | |< F > < F > < T > < F > < T >|< F > < F > < T > < F > < T >|< F > < T > < F > < F > < T >| | | | | |<-------- first call ------->|<------- second call ------->|<-------- third call ------->| | | | |
Same setup as before, but now with labelling_delta_t = 100 instead of 120:
GT GT GT GT GT GT GT | 100 200 | 300 400 500 600 700 | | | | | | | | | | | | v v | v v v v v | | | | | | | | | | | | | | | | | 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 | | | | |< F > < T > < F > < T > < F >|< T > < F > < T > < F > < T >|< F > < T > < F > < T > < F >| | | | | |<-------- first call ------->|<------- second call ------->|<-------- third call ------->| | | | |
Note: if labelling_delta_t <= delta_t, all frames could contain a valid GT
Note: If the FileMetadata is a pure distractor file (with no label at all), it will have a 1 us labelling_delta_t, and therefore all the frames will be considered labeled.
- Parameters
batch_start_time – Time from when to start loading (in us).
duration – Duration to load (in us).
labelling_delta_t – Period (in us) of your labelling annotation system.
num_tbins – Number of frames to load.
- Returns
- (boolean nd array): This boolean mask array of length num_tbins indicates
whether the frame contains a label. It is used to differentiate between time_bins that actually contain an empty label (for instance no bounding boxes) from time bins that weren’t labeled due to cost constraints. The latter timebins shouldn’t contribute to supervised losses used during training.
- Return type
frame_could_contain_gt
- metavision_ml.data.box_processing.create_class_lookup(labelmap_path, wanted_keys=[])
Takes as argument a json path storing a dictionary with class_id as key and class_name as value for the ground truth. Takes also as argument a list of wanted keys (class_names that we want to select).
- Parameters
labelmap_path (string) –
- Path to the label map ex of inside the json’{“0”: “pedestrian”,
”1”: “two wheeler”, “2”: “car”, “3”: “truck”
}’
wanted_keys (list) – List of classes to extract example: [‘car’, ‘pedestrian’]
- Returns
class_lookup numpy array [1, -1, 2, -1]
In the example we get 0 for background, 1 for pedestrians and 2 for cars. At the end, if you do new_label = class_lookup[gt_label] you can transform ground truth ids array in an array with ids that fit your network. Reminder : Ground truth does not have id for background. For our network we get id 0 for background and consecutive ids for other classes.
- metavision_ml.data.box_processing.filter_boxes(box_events, class_lookup, idx_to_filter, ignore_filtered)
Filters or ignores boxes or in box_events according to idx_to_filter.
Ignored boxes are still present but are marked with a -1 class_id. At the loss computation stage this information can be used so that they don’t contribute to the loss.This ius used when you don’t want the proposals matched with those ignored boxes to be considered as False positives in the loss. For instance if you train on cars only in a dataset containing trucks they could be ignored.
- Parameters
box_events (np.ndarray) – Box events.
class_lookup (int list) – Lookup table for converting class indices to contiguous int values.
idx_to_filter (np.ndarray) – Boxes indices to filter out or ignore (see below).
ignore_filtered (bool) – If true, ignores the boxes filtered in the loss those boxes are marked with a -1 class_id in order to discard them in a loss.
- Returns
Box_events with class_id translated using the class_lookup.
- Return type
(np.ndarray)
- metavision_ml.data.box_processing.filter_empty_tensor(array: numpy.array, box_events: numpy.array, area_box_filter: float = 0.1, shift: int = 0, time_per_bin: int = 10000, batch_start_time: int = 0, last_time_to_filter: Optional[int] = None) numpy.array
Preprocessing bounding boxes: discard bbox with empty event data inside of it
- Parameters
array – (T,C,H,W) event frame
box_events – numpy array of bbox
area_box_filter – minimum percentage area of bbox which contain events
shift – downsampling coefficient
time_per_bin – time interval per time bin along T axis
batch_start_time – starting time stamp of the array batch
last_time_to_filter – stop filtering bbox after this time stamp
Returns:
- metavision_ml.data.box_processing.load_box_events(metadata, batch_start_time, duration)
Fetches box events from FileMetadata object, batch_start_time & duration.
- Parameters
metadata (object) – Record details.
batch_start_time (int) – (us) Where to seek in the file to load corresponding bounding boxes
duration (int) – (us) How long to load events from bounding box file
- Returns
Nx1 of dtype EventBbox
- Return type
box_events (structured np.ndarray)
- metavision_ml.data.box_processing.load_boxes(metadata, batch_start_time, duration, tensor, **kwargs)
Function to fetch boxes and preprocess them. Should be passed to a SequentialDataLoader.
Since this function has additional arguments compared to load_labels_stub, one has to specialize it:
Examples
>>> from functools import partial >>> n_classes = 21 >>> class_lookup = np.arange(n_classes) # each class is mapped to itself >>> load_boxes_function = partial(load_boxes, class_lookup=class_lookup)
- Parameters
metadata (FileMetadata) – Record details.
batch_start_time (int) – (us) Where to seek in the file to load corresponding bounding boxes
duration (int) – (us) How long to load events from bounding box file
tensor (np.ndarray) – Current preprocessed input, can be used for data dependent preprocessing, for instance remove boxes without any features in them.
**kwargs –
class_lookup (np.array): Look up array for class indices. labelling_delta_t (int): Indicates the period of labelling in order to only consider time bins
with actual labels when computing the loss.
min_box_diag (int): Diagonal value under which boxes are not considerated. Defaults to 60 pixels.
containing –
class_lookup (np.array): Look up array for class indices. labelling_delta_t (int): Indicates the period of labelling in order to only consider time bins
with actual labels when computing the loss.
min_box_diag (int): Diagonal value under which boxes are not considerated. Defaults to 60 pixels.
- Returns
- List of structured array of dtype EventBbox corresponding to each time
bins.
- frames_contain_gt (np.ndarray): This boolean mask array of length num_tbins indicates
whether the frame contains a label. It is used to differentiate between time bins that actually contain an empty label (for instance no bounding boxes) from time bins that weren’t labeled due to cost constraints. The latter time bins shouldn’t contribute to supervised losses used during training.
- Return type
boxes (List[np.ndarray])
- metavision_ml.data.box_processing.nms(box_events, scores, iou_thresh=0.5)
NMS on box_events
- Parameters
box_events (np.ndarray) – nx1 with dtype EventBbox, the sorting order of those box is used as a a criterion for the nms.
scores (np.ndarray) – nx1 dtype of plain dtype, needs to be argsortable.
iou_thresh (float) – if two boxes overlap with more than iou_thresh (intersection over union threshold) with each other, only the one with the highest criterion value is kept.
- Returns
Indices of the box to keep in the input array.
- Return type
keep (np.ndarray)
- metavision_ml.data.box_processing.nms_by_class(box_events, scores, iou_thresh=0.5)
NMS on box_events done independently by class
- Parameters
box_events (np.ndarray) – nx1 with dtype EventBbox , the sorting order of those box is used as a a criterion for the nms.
scores (np.ndarray) – nx1 dtype of plain dtype, needs to be argsortable.
iou_thresh (float) – if two boxes overlap with more than iou_thresh (intersection over union threshold) with each other, only the one with the highest criterion value is kept.
- Returns
Indices of the box to keep in the input array.
- Return type
keeps (np.ndarray)
- metavision_ml.data.box_processing.rescale_boxes(box_events, width_orig, height_orig, width_dst, height_dst)
Rescales boxes to new height and width.
- Parameters
box_events (structured np.ndarray) – Array of length n of dtype EventBbox.
width_orig (int) – Original width of sensor for annotation.
height_orig (int) – Original height of sensor for annotation.
width_dst (int) – Destination width.
height_dst (int) – Destination height.
- Returns
Array of length n of dtype EventBbox.
- Return type
box_events (structured np.ndarray)
- metavision_ml.data.box_processing.split_boxes(box_events, batch_start_time, delta_t=None, num_tbins=None)
Split box_events to a list of box events clustered by delta_t Removes a bounding box from the input list box_events if: there are less than min_box_area_thr*bbox_area events in the box and timestamp of bbox < last_time_to_filter”
Box times are in range(0, num_tbins*tbin)
- Parameters
box_events (structured np.ndarray) – Box events inputs of type EventBbox
delta_t (optional int) – Duration of time bin in us. Used for chronological NMS.
num_tbins (optional int) – Number of time bins.
- Returns
List of box_events of type EventBbox separated in time bins.
- Return type
box_events (np.ndarray list)
This class allows to stream a dataset of .raw or .dat
This is yet another example how to use data.multistream_dataloader Here we go further and integrate with the same interface as SequentialDataLoader
- class metavision_ml.data.cd_processor_dataset.CDProcessorDataLoader(files, mode, delta_t, n_events, max_duration, preprocess_function_name, num_tbins, batch_size, num_workers=2, height=None, width=None, preprocess_kwargs={}, load_labels=None, padding_mode='zeros', transforms=None, base_seed=None)
Attempt at doing the same interface than SequentialDataloader but using multistream_dataloader implementation.
- get_base_seed(base_seed)
Attributes a base seed that is added to transforms when applicable.
Changing this seed after an epoch is ended allows differentiation between epochs while maintaining temporal coherency of the spatial transformation.
- Parameters
base_seed (int) – seed to be added to the hash before drawing a random transformation. if None, will use the time instead.
- get_vis_func()
Returns the visualization function corresponding to the preprocessing being used.
- class metavision_ml.data.cd_processor_dataset.CDProcessorDatasetIterator(path, height_out, width_out, load_labels, mode, n_events, delta_t, num_tbins, preprocess_function_name, preprocess_kwargs={}, start_ts=0, max_duration=None, transforms=None, base_seed=None)
This iterator reads events or preprocessed tensors, computes tensors, load labels and retrieves them difference with sequential_dataset_v1 is that load_labels cannot be a pure function it has to be a class
This class simulates a moving box in translation and zoom in a frame.
- class metavision_ml.data.moving_box.Animation(height, width, channels, max_stop=15, max_classes=1, max_objects=3)
Responsible for endless Animation of moving boxes. Mother class that can be inherited for various drawings of moving objects.
- Parameters
height – frame height
width – frame width
channels – frame channels (either 1 or 3)
max_stop – animation random pauses
max_classes – maximum number of classes
max_objects – maximum number of objects
- class metavision_ml.data.moving_box.MovingSquare(h=300, w=300, max_stop=15, max_classes=3)
Responsible for endless MovingSquare
- Parameters
h – frame height
w – frame width
max_stop – randomly pause for this many steps
max_classes – maximum number of classes
- reset()
Resets internal variables
- reset_speed()
Resets Speed Variables
- metavision_ml.data.moving_box.clamp_xyxy(x1, y1, x2, y2, width, height)
Clamps a box to a frame
- Parameters
x1 – top left corner x
y1 – top left corner y
x2 – bottom right corner x
y2 – bottom right corner y
- Returns
clamped positions
- metavision_ml.data.moving_box.move_box(x1, y1, x2, y2, vx, vy, vs, width, height, min_width, min_height)
Move bounding box around in a frame using velocity vx, vy & vscale. It returns the moved box and a flag saying if you need to change the speed because it did a collision with a wall.
- Parameters
x1 – top left corner x
y1 – top left corner y
x2 – bottom right corner x
y2 – bottom right corner y
vx – x speed
vy – y speed
vs – scale speed
width – frame width
height – frame height
min_width – minimal box width
min_height – minimal box height
- Returns
moved box
- metavision_ml.data.moving_box.rotate(x, y, xo, yo, theta)
Rotates a point w.r.t origin (x0,y0)
- Parameters
x – point x coordinate
y – point y coordinate
- Returns
rotated point
Toy Problem Dataset that serves as an example of our streamer dataloader.
This displays moving digits from MNIST database. The digit varies in size and position. To use this class, download MNIST.ZIP at https://kdrive.infomaniak.com/app/share/975517/3c529307-3cec-4fc6-bbb3-87a95c6ef6cf
The dataset both generates chained video clips and provides bounding box with correct class id.
The dataset procedurally generates the video clips, so it is an “Iterable” kind of dataset
- class metavision_ml.data.moving_mnist.MovingMNISTDataset(tbins, num_workers, batch_size, height, width, max_frames_per_video, max_frames_per_epoch, train, dataset_dir='.')
Creates the dataloader for moving mnist
- Parameters
tbins – number of steps per batch
num_workers – number of parallel workers
batch_size – number of animations
height – animation height
width – animation width
max_frames_per_video – maximum frames per animation (must be greater than tbins)
max_frames_per_epoch – maximum frames per epoch
train – use training part of MNIST dataset.
dataset_dir – directory where MNIST dataset is stored
- class metavision_ml.data.moving_mnist.MovingMnist(idx, tbins, height, width, train, max_frames_per_video, channels=3, max_stop=15, max_objects=2, drop_labels_p=0, data_caching_path='.')
Moving Mnist Animation
- Parameters
idx – unique id
tbins – number of steps delivered at once
height – frame height (must be at least 64 pix)
width – frame width (must be at least 64 pix)
max_stop – random pause in animation
max_objects – maximum number of objects per animation
train – use training/ validation part of MNIST
max_frames_per_video – maximum frames per video before reset
drop_labels_p – probability to drop the annotation of certain frames (in which case it is marked in the mask)
data_caching_path – where to store the MNIST dataset
- metavision_ml.data.moving_mnist.collate_fn(data_list)
this collates batch parts to a single dictionary
- Parameters
data_list – batch parts