SDK ML Preprocessing API
A collection of preprocessing functions to turn a buffer of events into a sequence of 2 dimensional features
This works only for numpy structured arrays representation of events This makes intensive use of numba http://numba.pydata.org/ , which is awesome.
Examples
>>> delta = 100000
>>> initial_ts = record.current_time
>>> events = record.load_delta_t(delta) # load 100 milliseconds worth of events
>>> events['t'] -= int(initial_ts) # events timestamp should be reset
>>> output_array = np.zeros((1, 2, height, width)) # preallocate output array
>>> histo(events, output_array, delta)
- metavision_ml.preprocessing.event_to_tensor.diff(xypt, output_array, total_tbins_delta_t, downsampling_factor=0, reset=True, max_incr_per_pixel=5.0)
Returns the difference of histogram of positive events and negative events. It requires output_array have a signed dtype.
- Parameters
xypt (events) – structured array containing events
output_array – numpy float32 array [tbins, 1, height, width]
total_tbins_delta_t – duration of the timeslice
downsampling_factor (int) – Parameter used to reduce the spatial dimension of the obtained feature. Actually multiply the coordinates by 2**(-downsampling_factor).
reset (boolean) – whether to reset output_array to 0 beforehand.
max_incr_per_pixel – maximum number of increments per pixel (expressed in initial resolution).
- metavision_ml.preprocessing.event_to_tensor.diff_quantized(xypt, output_array, total_tbins_delta_t, downsampling_factor=0, reset=True, negative_bit_length=8, normalization=False)
Returns the difference of histogram of positive events and negative events. It requires output_array have a signed dtype.
- Parameters
xypt (events) – structured array containing events
output_array (np.ndarray) – Pre allocated numpy array of shape (num_tbins, 1, height, width). If downsampling_factor is not 0 or normalization is True, the datatype should be floating. Otherwise it should be np.int8.
total_tbins_delta_t – duration of the timeslice
downsampling_factor (int) – Parameter used to reduce the spatial dimension of the obtained feature. Actually multiply the coordinates by 2**(-downsampling_factor).
reset (boolean) – whether to reset output_array to 0 beforehand.
negative_bit_length (int) – negative bits length set by the camera settings. It controls the precision used to compute the tensor and must be integer >2 and <=8 in Diff3D mode.
normalization (boolean) – If not used, the dtype of output_array should be set as int8 and the data ranges [-2 ** (negative_bit_length - 1), 2 ** (negative_bit_length - 1) - 1]. If used, the dtype of output_array should be set as floating point and the data ranges [-1.0, 1.0]
- metavision_ml.preprocessing.event_to_tensor.event_cube(xypt, output_array, total_tbins_delta_t, downsampling_factor=0, split_polarity=True, reset=True, max_incr_per_pixel=5)
Takes xypt events within a timeslice of length total_tbins_delta_t and updates an array with shape (num_tbins,num_utbins*2,H,W) microbins are interval that are used in the channels [Unsupervised Event-based Learning of Optical Flow, Zhu et al. 2018]
- Note: you should load delta_t * (num_tbins+1) to avoid artefacts on last timeslice
because the support of the time bilinear kernel is 2 bins.
- Parameters
xypt (events) – structured array containing events
output_array (np.ndarray) – (num_tbins,num_utbins*2,H,W) dtype MUST be floating point!
total_tbins_delta_t (int) – Length in us of the interval in which events are taken
downsampling_factor – will divide by this power of 2 the event coordinates x & y. (WARNING): This is not like in the paper where you should use bilinear weights for downsampling as well. A true event-based bilinear resize should contribute to 8 different cells in result tensor.
split_polarity (bool) – whether or not to split polarity into 2 channels or consider p as weight {-1,1}
reset (bool) – reset output_array, in most cases you should put this to True
max_incr_per_pixel – maximum number of increments per pixel (expressed in initial resolution).
- metavision_ml.preprocessing.event_to_tensor.histo(xypt, output_array, total_tbins_delta_t, downsampling_factor=0, reset=True, max_incr_per_pixel=5.0)
Computes histogram on a sequence of events.
- Parameters
xypt (events) – structured array containing events
output_array (np.ndarray) – Pre allocated numpy array of shape (num_tbins, 2, height, width)
total_tbins_delta_t (int) – Time interval of the extended time slice (with tbins).
downsampling_factor (int) – Parameter used to reduce the spatial dimension of the obtained feature. Actually multiply the coordinates by 2**(-downsampling_factor).
reset (boolean) – whether to reset output_array to 0 beforehand.
max_incr_per_pixel – maximum number of increments per pixel (expressed in initial resolution).
- metavision_ml.preprocessing.event_to_tensor.histo_quantized(xypt, output_array, total_tbins_delta_t, downsampling_factor=0, reset=True, negative_bit_length=4, total_bit_length=8, normalization=False)
Computes histogram on a sequence of events.
- Parameters
xypt (events) – structured array containing events
output_array (np.ndarray) – Pre allocated numpy array of shape (num_tbins, 2, height, width). If downsampling_factor is not 0 or normalization is True, the datatype should be floating. Otherwise it should be np.uint8.
total_tbins_delta_t (int) – Time interval of the extended time slice (with tbins).
downsampling_factor (int) – Parameter used to reduce the spatial dimension of the obtained feature. Actually multiply the coordinates by 2**(-downsampling_factor).
reset (boolean) – whether to reset output_array to 0 beforehand.
negative_bit_length (int) – negative bits length set by the camera settings. It controls the precision used to compute the tensor and must be integer >0 and <=8 in Histo3D mode.
total_bit_length – total bits used for store the data, total_bit_length = negative_bit_length + positive_bit_length
normalization (boolean) – If not used, the dtype of output_array should be set as uint8 and the data ranges [0, 2 ** negative_bit_length] for the negative channel and [0, 2 ** positive_bit_length] for the positive channel. If used, the dtype of output_array should be set as floating point and the data ranges [0, 1.0]
- metavision_ml.preprocessing.event_to_tensor.multi_channel_timesurface(xypt, output_array, total_tbins_delta_t, downsampling_factor=0, reset=True, **kwargs)
Computes a linear two channels time surface n times per delta_t.
Input channels of the output_array variable must be even. If it is 4 there will be 2 micro delta_t, 3 channels if it is 6 and so on. This feature contains precise time information and allows to perceive higher frequency phenomenon than 1 / delta_t. The dtype of output_array must be sufficient to hold values up to total_tbins_delta_t without overflow
- Parameters
xypt (events) – structured array containing events
output_array (np.ndarray) – (num_tbins,num_utbins*2,H,W) dtype MUST be floating point!
total_tbins_delta_t (int) – length in us of the interval in which events are taken
downsampling_factor – will divide by this power of 2 the event coordinates x & y.
reset (bool) – whether to reset output_array to 0 before computing the feature.
- metavision_ml.preprocessing.event_to_tensor.timesurface(xypt, output_array, total_tbins_delta_t, downsampling_factor=0, reset=True, normed=True, **kwargs)
Computes a linear two channels time surface. The dtype of output_array must be sufficient to hold values up to total_tbins_delta_t without overflow
- Parameters
xypt (events) – structured array containing events
output_array (np.ndarray) – Pre allocated numpy array of shape (num_tbins, 2, height, width)
total_tbins_delta_t (int) – Length in us of the interval in which events are taken
downsampling_factor (int) – Parameter used to reduce the spatial dimension of the obtained feature. Actually multiply the coordinates by 2**(-downsampling_factor).
reset (boolean) – whether to reset output_array to 0 beforehand.
normed (boolean) – if True, scales the timesurface between 0 and 1.
Write HDF5 feature files from event files
- metavision_ml.preprocessing.hdf5.generate_hdf5(paths, output_folder, preprocess, delta_t, height=None, width=None, start_ts=0, max_duration=None, n_processes=2, box_labels=[], store_as_uint8=False, mode='delta_t', n_events=0, preprocess_kwargs={})
Generates HDF5 files of frames at a regular interval for dataset caching.
It is meant to produce a dataset with a smaller memory footprint, that is therefore easier to load from disk. Generated files share the name of the input file but have an HDF5 extension.
- If max_duration is not None the naming of the output_file follows this pattern:
“{output_folder}/{path basename}_{index:d}.h5” where index allows you to distinguish the cut.
otherwise “{output_folder}/{path basename}.h5”
Example
>>> python3 metavision_ml/preprocessing/hdf5.py src_path/test/*.dat --o dst_path/test/ --height_width 480 640 --preprocess histo --box-labels src_path/test/*bbox.npy
- Parameters
paths (string list) – Paths of input files.
output_folder (string) – Folder path where data will be written.
preprocess (string) – Name of the preprocessing function to use.
delta_t (int) – Period at which tensor are computed (in us).
height (int) – if None the features are not downsampled, however features are downsampled to height which must be the sensor’s height divided by a power of 2.
width (int) – if None the features are not downsampled, however features are downsampled to width which must be the sensor’s width divided by a power of 2.
start_ts (int) – Timestamp (in microseconds) from which the computation begins. Either a single int for all files or a list containing exactly one int per file.
max_duration (int) – If not None, limit the duration of the file to max_duration (in us).
n_processes (int) – Number of processes writing files simultaneously.
box_labels (string list) – path to npy box label files that matches each input file. if start_ts or max_duration are specified, these files will be cut accordingly.
store_as_uint8 (boolean) – if True, casts to byte before storing to save space. Only supports 0-1 normalized data.
mode (string) – Load by timeslice or number of events. Either “delta_t” or “n_events”.
n_events (int) – Number of events in the timeslice.
preprocess_kwargs (dictionary) – A dictionary contains the kwargs used by different preprocessing method
- metavision_ml.preprocessing.hdf5.split_label(box_path, output_folder, start_ts=0, max_duration=None)
Copy the labels according to split and max_durations
- metavision_ml.preprocessing.hdf5.write_to_hdf5(path, start_ts=0, output_folder='.', delta_t=50000, preprocess='histo', height=None, width=None, max_duration=None, store_as_uint8=False, mode='delta_t', n_events=0, preprocess_kwargs={})
Generates a single HDF5 file of frames at a regular interval for dataset caching.
It is meant to produce a dataset with a smaller memory footprint, that is therefore easier to load from disk. Generated files share the name of the input RAW file but have an HDF5 extension.
- The naming of the output_file follows this pattern:
“{output_folder}/{path basename}_{start_ts}.h5”
- Parameters
path (string) – Path of input files.
output_folder (string) – Folder path where data will be written.
preprocess (string) – Name of the preprocessing function to use.
delta_t (int) – Period at which tensors are computed (in us).
height (int) – if None the features are not downsampled, however features are downsampled to height which must be the sensor’s height divided by a power of 2.
width (int) – if None the features are not downsampled, however features are downsampled to width which must be the sensor’s width divided by a power of 2.
start_ts (int) – Timestamp (in microseconds) from which the computation begins. Either a single int for all files or a list containing exactly one int per file.
max_duration (int) – If not None, limit the duration of the file to max_duration us.
store_as_uint8 (boolean) – if True, casts to byte before storing to save space. Only supports 0-1 normalized data.
mode (string) – Load by timeslice or number of events. Either “delta_t” or “n_events”.
n_events (int) – Number of events in the timeslice.
preprocess_kwargs (dictionary) – A dictionary contains the kwargs used by different preprocessing method
A collection of visualization utilities for preprocessings
Examples
>>> delta = 100000
>>> initial_ts = record.current_time
>>> events = record.load_delta_t(delta) # load 100 milliseconds worth of events
>>> events['t'] -= int(initial_ts) # events timestamp should be reset
>>> output_array = np.zeros((1, 2, height, width)) # preallocate output array
>>> histo(events, output_array, delta)
>>> img = viz_diff("histo")(output_array[0])
>>> cv2.imshow('img', img)
>>> cv2.waitKey()
- metavision_ml.preprocessing.viz.filter_outliers(input_val, num_std=2)
Filter outliers in an array
- Parameters
input_val (np.ndarray) – Array of any shape
- Returns
Normalized array of same shape as input
- Return type
output_array (np.ndarray)
- metavision_ml.preprocessing.viz.gray_to_rgb(im)
Just Repeat image 3 times
- Parameters
im (np.ndarray) – Array of shape (H,W)
- Returns
array of shape (H,W,3)
- Return type
output_array
- metavision_ml.preprocessing.viz.normalize(im)
Normalizes image by min-max
- Parameters
im (np.ndarray) – Array of any shape
- Returns
Normalized array of same shape as input
- Return type
output_array (np.ndarray)
- metavision_ml.preprocessing.viz.viz_diff(im)
Visualize difference of histogram
- Parameters
im (np.ndarray) – Array of shape (H,W)
- Returns
Array of shape (H,W)
- Return type
output_array (np.ndarray)
- metavision_ml.preprocessing.viz.viz_diff_binarized(im)
Visualize binarized difference of events (“ON”-“OFF”)
- Parameters
im (np.ndarray) – Array of shape (H,W)
- Returns
Array of shape (H,W,3)
- Return type
output_array (np.ndarray)
- metavision_ml.preprocessing.viz.viz_event_cube_rgb(im, split_polarity=True)
Visualize 3 out of 6 channels in RGB mode.
- Parameters
im (np.ndarray) – Array of shape (T,H,W) T images or T//2 group of images with 2 channels
split_polarity – whether each image is single-channel (ON-OFF) or (ON, OFF) 2 channel images.
- Returns
Array of shape (H,W,3)
- Return type
output_array (np.ndarray)
- metavision_ml.preprocessing.viz.viz_histo(im)
Visualize difference of histogram
- Parameters
im (np.ndarray) – Array of shape (2,H,W)
- Returns
Array of shape (H,W)
- Return type
output_array (np.ndarray)
- metavision_ml.preprocessing.viz.viz_histo_binarized(im)
Visualize binarized histogram of events
- Parameters
im (np.ndarray) – Array of shape (2,H,W)
- Returns
Array of shape (H,W,3)
- Return type
output_array (np.ndarray)
- metavision_ml.preprocessing.viz.viz_histo_filtered(im, val_max=0.5)
visualize strongly filtered histo image with 3 channels
- Args:
im (np.ndarray): Array of shape (2,H,W)
- Returns:
output_array: array of shape (H,W,3)
- metavision_ml.preprocessing.viz.viz_histo_rgb(im)
visualize histo image with 3 channels
- Args:
im (np.ndarray): Array of shape (H,W)
- Returns:
output_array: array of shape (H,W,3)
- metavision_ml.preprocessing.viz.viz_multichannel_timesurface(tensor, blend=False)
Visualizes three channels of a multi channel timesurface.
- Parameters
tensor (np.ndarray) – array of shape (T,H,W) T images or T//2 group of images with 2 channels
blend (boolean) – whether to blend different channels to visualize them as one.
- Returns
array of shape (H,W,3)
- Return type
output_array (np.ndarray)
- metavision_ml.preprocessing.viz.viz_timesurface(im)
Visualize timesurface
Note: In order to generate a timesurface you need to call event_to_tensor.timesurface Typically, if you want to see an exponential decay timesurface you don’t set the arg “reset” to True in order to keep the latest event’s timestamp at each pixel.
Here we assume the timesurface has already been normalized between [0,1] either by min-max normalization or with exponential time-decay.
- Parameters
im (np.ndarray) – Array of shape (2,H,W) or (H,W)
- Returns
Array of shape (H,W,3)
- Return type
output_array (np.ndarray)
- class metavision_ml.preprocessing.CDProcessor(height, width, num_tbins=5, preprocessing='histo', downsampling_factor=0, preprocess_kwargs={})
Wrapper that simplifies the preprocessing.
- Parameters
height – image height
width – image width
num_tbins – number of time slices
preprocessing – name of method, must be registered first
downsampling_factor (int) – Parameter used to reduce the spatial dimension of the obtained feature. Actually multiply the coordinates by 2**(-downsampling_factor).
- get_array_dim()
Returns the shape of the preprocessing tensor.
- Returns
shape of the tensor to be preprocessed.
- Return type
shape (int tuple)
Registering a new preprocessing function
If you write your own preprocessing function you can make it available to existing modules with this API.
- metavision_ml.preprocessing.register_new_preprocessing(preproc_name, n_input_channels, function, viz_function, kwargs={'max_incr_per_pixel': 5, 'preprocess_dtype': dtype('float32')})
Registers a new preprocessing function to be available across the package.
This must be done each time the python interpreter is invoked.
- Parameters
preproc_name (string) – Name of the preprocessing function, has to be unique.
n_input_channels (int) – Number of channels in the resulting tensor.
function (function) – Preprocessing function. Its signature must be function(events, tensor, **kwargs) -> None
viz_function (function) – Visualization function. Its signature must be viz_function(tensor) -> img (nd.array H x W x 3)
kwargs (dict) – Dictionary of optional arguments to be passed to the function.
- metavision_ml.preprocessing.get_preprocess_dict(preproc_name)
Returns a mutable dict describing a preprocessing function.
preproc_name (string): Name of the preprocessing function, already existing one or previously registered.
- metavision_ml.preprocessing.get_preprocess_function_names()
Returns the names of all existing and registered preprocessing functions.