SDK Core ML Video to Event Simulator API

Video stream dataset

Image stream data loader

class metavision_core_ml.video_to_event.video_stream_dataset.VideoDatasetIterator(metadata, height, width, rgb, mode='frames', min_tbins=3, max_tbins=10, min_dt=3000, max_dt=50000, batch_times=1, pause_probability=0.5, max_optical_flow_threshold=2.0, max_interp_consecutive_frames=20, max_number_of_batches_to_produce=None, crop_image=False, saturation_max_factor=1.0)

Dataset Iterator streaming images and timestamps

Parameters

metadata (object) – path to picture or video
height (int) – height of input images / video clip
width (int) – width of input images / video clip
rgb (bool) – stream rgb videos
mode (str) – mode of batch sampling ‘frames’,’delta_t’,’random’
min_tbins (int) – minimum number of frames per batch step
max_tbins (int) – maximum number of frames per batch step
min_dt (int) – minimum duration of frames per batch step
max_dt (int) – maximum duration of frames per batch step
batch_times (int) – number of timesteps of training sequences
pause_probability (float) – probability to add a pause (no events) (works only with PlanarMotionStream)
max_optical_flow_threshold (float) – maximum allowed optical flow between two consecutive frames (works only with PlanarMotionStream)
max_interp_consecutive_frames (int) – maximum number of interpolated frames between two consecutive frames (works only with PlanarMotionStream)
max_number_of_batches_to_produce (int) – maximum number of batches to produce
crop_image (bool) – crop images or resize them
saturation_max_factor (float) – multiplicative factor of saturated pixels (only for tiff 16 bits images. Use 1.0 to disable)

metavision_core_ml.video_to_event.video_stream_dataset.make_video_dataset(path, num_workers, batch_size, height, width, min_length, max_length, mode='frames', min_frames=5, max_frames=30, min_delta_t=5000, max_delta_t=50000, rgb=False, seed=None, batch_times=1, pause_probability=0.5, max_optical_flow_threshold=2.0, max_interp_consecutive_frames=20, max_number_of_batches_to_produce=None, crop_image=False, saturation_max_factor=1.0)

Makes a video / moving picture dataset.

Parameters

path (str) – folder to dataset
batch_size (int) – number of video clips / batch
height (int) – height
width (int) – width
min_length (int) – min length of video
max_length (int) – max length of video
mode (str) – ‘frames’ or ‘delta_t’
min_frames (int) – minimum number of frames per batch
max_frames (int) – maximum number of frames per batch
min_delta_t (int) – in microseconds, minimum duration per batch
max_delta_t (int) – in microseconds, maximum duration per batch
rgb (bool) – retrieve frames in rgb
seed (int) – seed for randomness
batch_times (int) – number of time steps in training sequence
pause_probability (float) – probability to add a pause during the sequence (works only with PlanarMotionStream)
max_optical_flow_threshold (float) – maximum allowed optical flow between two consecutive frames (works only with PlanarMotionStream)
max_interp_consecutive_frames (int) – maximum number of interpolated frames between two consecutive frames (works only with PlanarMotionStream)
max_number_of_batches_to_produce (int) – maximum number of batches to produce. Makes sure the stream will not produce more than this number of consecutive batches using the same image or video.
crop_image (bool) – crop images or resize them
saturation_max_factor (float) – multiplicative factor of saturated pixels (only for tiff 16 bits images. Use 1.0 to disable)

metavision_core_ml.video_to_event.video_stream_dataset.pad_collate_fn(data_list): Here we pad with last image/ timestamp to get a contiguous batch

CPU Event simulator

EventSimulator: Load a .mp4 video and start streaming events

class metavision_core_ml.video_to_event.simulator.EventSimulator(height, width, Cp, Cn, refractory_period, sigma_threshold=0.0, cutoff_hz=0, leak_rate_hz=0, shot_noise_rate_hz=0, verbose=False)

Event Simulator

Implementation is based on the following publications:

Video to Events: Recycling Video Datasets for Event Cameras: Daniel Gehrig et al.
V2E: From video frames to realistic DVS event camera streams: Tobi Delbruck et al.

This object allows to accumulate events by feeding it with images and (increasing) timestamps. The events are returned of type EventCD (see definition in event_io/dat_tools or metavision_sdk_base)

Parameters

Cp (float) – mean for ON threshold
Cn (float) – mean for OFF threshold
refractory_period (float) – min time between 2 events / pixel
sigma_threshold (float) – standard deviation for threshold array
cutoff_hz (float) – cutoff frequency for photodiode latency simulation
leak_rate_hz (float) – frequency of reference value leakage
shot_noise_rate_hz (float) – frequency for shot noise events

dynamic_moving_average(new_frame, ts, eps=1e-07)

Apply nonlinear lowpass filter here. Filter is 2nd order lowpass IIR that uses two internal state variables to store stages of cascaded first order RC filters. Time constant of the filter is proportional to the intensity value (with offset to deal with DN=0)

Parameters

new_frame (np.ndarray) – new image
ts (int) – new timestamp (us)

flush_events(): Erase current events

get_events(): Grab events

get_size()

Function returning the size of the imager which produced the events.

Returns: Tuple of int (height, width) which might be (None, None)

image_callback(img, img_ts)

Accumulates Events into internal buffer

Parameters

img (np.ndarray) – uint8 gray image of shape (H,W)
img_ts (int) – timestamp in micro-seconds.

Returns

current total number of events

Return type

num

leak_events(delta_t)

Leak events: switch in diff change amp leaks at some rate equivalent to some hz of ON events. Actual leak rate depends on threshold for each pixel. We want nominal rate leak_rate_Hz, so R_l=(dI/dt)/Theta_on, so R_l*Theta_on=dI/dt, so dI=R_l*Theta_on*dt

Parameters: delta_t (int) – time between 2 images (us)

log_image_callback(log_img, img_ts): For debugging, log is done outside

reset(): Resets buffers

set_config(config='noisy')

Set configuration

Parameters: config (str) – name for configuration

shot_noise_events(event_buffer, ts, num_events, num_iters)

NOISE: add temporal noise here by simple Poisson process that has a base noise rate self.shot_noise_rate_hz. If there is such noise event, then we output event from each such pixel

the shot noise rate varies with intensity: for lowest intensity the rate rises to parameter. the noise is reduced by factor SHOT_NOISE_INTEN_FACTOR for brightest intensities

Parameters

ts (int) – timestamp
num_events (int) – current number of events
num_iters (int) – max events per pixel since last round

metavision_core_ml.video_to_event.simulator.eps_log(x, eps=1e-05)

Takes Log of image

Parameters: x – uint8 gray frame

metavision_core_ml.video_to_event.single_image_make_events_cpu.make_events_cpu(events, ref_values, last_img, last_event_timestamp, log_img, last_img_ts, delta_t, Cps, Cns, refractory_period)

produce events into AER format

Parameters

events (np.ndarray) – array in format EventCD
ref_values (np.ndarray) – current log intensity state / pixel (H,W)
last_img (np.ndarray) – last image log intensity (H,W)
last_event_timestamp (int) – last image timestamp
log_img (np.ndarray) – current log intensity image (H,W)
last_img_ts (np.ndarray) – last timestamps emitted / pixel (2,H,W)
delta_t (int) – current duration (us) since last image.
Cps (np.ndarray) – array of ON thresholds
Cns (np.ndarray) – array of OFF thresholds
refractory_period (int) – minimum time between 2 events / pixel

Simple Iterator built around the Metavision Reader classes.

class metavision_core_ml.video_to_event.simu_events_iterator.SimulatedEventsIterator(input_path, start_ts=0, mode='delta_t', delta_t=10000, n_events=10000, max_duration=None, relative_timestamps=False, height=- 1, width=- 1, Cp=0.11, Cn=0.1, refractory_period=0.001, sigma_threshold=0.0, cutoff_hz=0, leak_rate_hz=0, shot_noise_rate_hz=0, override_fps=0)

SimulatedEventsIterator is a small convenience class to generate an iterator of events from any video

reader: class handling the video (iterator of the frames and their timestamps).

delta_t

Duration of served event slice in us.

Type: int

max_duration

If not None, maximal duration of the iteration in us.

Type: int

end_ts

If max_duration is not None, last time_stamp to consider.

Type: int

relative_timestamps

Whether the timestamps of served events are relative to the current reader timestamp, or since the beginning of the recording.

Type: boolean

Parameters

input_path (str) – Path to the file to read.
start_ts (int) – First timestamp to consider (in us).
mode (string) – Load by timeslice or number of events. Either “delta_t” or “n_events”
delta_t (int) – Duration of served event slice in us.
n_events (int) – Number of events in the timeslice.
max_duration (int) – If not None, maximal duration of the iteration in us.
relative_timestamps (boolean) – Whether the timestamps of served events are relative to the current reader timestamp, or since the beginning of the recording.
Cp (float) – mean for ON threshold
Cn (float) – mean for OFF threshold
refractory_period (float) – min time between 2 events / pixel
sigma_threshold (float) – standard deviation for threshold array
cutoff_hz (float) – cutoff frequency for photodiode latency simulation
leak_rate_hz (float) – frequency of reference value leakage
shot_noise_rate_hz (float) – frequency for shot noise events
override_fps (int) – override fps of the input video.

Examples

>>> for ev in SimulatedEventsIterator("beautiful_record.mp4", delta_t=1000000, max_duration=1e6*60):
>>>     print("Rate : {:.2f}Mev/s".format(ev.size * 1e-6))

get_size()

Function returning the size of the imager which produced the events.

Returns: Tuple of int (height, width) which might be (None, None)

GPU Event Simulator

More efficient reimplementation. The main difference is cuda kernels & possibility to directly stream the voxel grid.

class metavision_core_ml.video_to_event.gpu_simulator.GPUEventSimulator(batch_size, height, width, c_mu=0.1, c_std=0.022, refractory_period=10, leak_rate_hz=0, cutoff_hz=0, shot_noise_hz=0)

GPU Event Simulator of events from frames & timestamps.

Implementation is based on the following publications:

Video to Events: Recycling Video Datasets for Event Cameras: Daniel Gehrig et al.
V2E: From video frames to realistic DVS event camera streams: Tobi Delbruck et al.

Parameters

batch_size (int) – number of video clips / batch
height (int) – height
width (int) – width
c_mu (float or list) – threshold average if scalar will consider same OFF and ON thresholds if list, will be considered as [ths_OFF, ths_ON]
c_std (float) – threshold standard deviation
period (refractory) – time before event can be triggered again
leak_rate_hz (float) – frequency of reference voltage leakage
cutoff_hz (float) – frequency for photodiode latency
shot_noise_hz (float) – frequency of shot noise events

Initialize internal Module state, shared by both nn.Module and ScriptModule.

count_events(log_images, video_len, image_ts, first_times, reset=True, persistent=True)

Estimates the number of events per pixel.

Parameters

log_images (Tensor) – shape (H, W, total_num_frames) tensor containing the video frames
video_len (Tensor) – shape (B,) len of each video in the batch.
images_ts (Tensor) – shape (B, max(video_len)) timestamp associated with each frame.
first_times (Tensor) – shape (B) whether the video is a new one or the continuation of one.
reset – do reset the count variable

Returns

B,H,W

Return type

counts

dynamic_moving_average(images, num_frames, timestamps, first_times, min_pixel_range=20, max_pixel_incr=20, eps=1e-07)

Converts byte images to log and performs a pass-band motion blur of incoming images. This simulates the latency of the photodiode w.r.t to incoming light dynamic.

Parameters

images (torch.Tensor) – H,W,T byte or float images in the 0 to 255 range
num_frames (torch.Tensor) – shape (B,) len of each video in the batch.
timestamps (torch.Tensor) – B,T timestamps
first_times (torch.Tensor) – B flags
eps (float) – epsilon factor

event_volume(log_images, video_len, image_ts, first_times, nbins, mode='bilinear', split_channels=False)

Computes a volume of discretized images formed after the events, without storing the AER events themselves. We go from simulation directly to this space-time quantized representation. You can obtain the event-volume of [Unsupervised Event-based Learning of Optical Flow, Zhu et al. 2018] by specifying the mode to “bilinear” or you can obtain a stack of histograms if mode is set to “nearest”.

Parameters

log_images (Tensor) – shape (H, W, total_num_frames) tensor containing the video frames
video_len (Tensor) – shape (B,) len of each video in the batch.
images_ts (Tensor) – shape (B, max(video_len)) timestamp associated with each frame.
first_times (Tensor) – shape (B) whether the video is a new one or the continuation of one.
nbins (int) – number of time-bins for the voxel grid
mode (str) – bilinear or nearest
split_channels – if True positive and negative events have a distinct channels instead of doing their difference in a single channel.

event_volume_sequence(log_images, video_len, image_ts, target_timestamps, first_times, nbins, mode='bilinear', split_channels=False)

Computes a volume of discretized images formed after the events, without storing the AER events themselves. We go from simulation directly to this space-time quantized representation. You can obtain the event-volume of [Unsupervised Event-based Learning of Optical Flow, Zhu et al. 2018] by specifying the mode to “bilinear” or you can obtain a stack of histograms if mode is set to “nearest”. Here, we also receive a sequence of target timestamps to cut non uniformly the event volumes.

Parameters

log_images (Tensor) – shape (H, W, total_num_frames) tensor containing the video frames
video_len (Tensor) – shape (B,) len of each video in the batch.
images_ts (Tensor) – shape (B, max(video_len)) timestamp associated with each frame.
first_times (Tensor) – shape (B) whether the video is a new one or the continuation of one.
nbins (int) – number of time-bins for the voxel grid
mode (str) – bilinear or nearest
split_channels – if True positive and negative events have a distinct channels instead of doing their difference in a single channel.

forward()

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_events(log_images, video_len, image_ts, first_times)

Retrieves the AER event list in a pytorch array.

Parameters

log_images (Tensor) – shape (H, W, total_num_frames) tensor containing the video frames
video_len (Tensor) – shape (B,) len of each video in the batch.
images_ts (Tensor) – shape (B, max(video_len)) timestamp associated with each frame.
first_times (Tensor) – shape (B) whether the video is a new one or the continuation of one.

Returns

N,5 in batch_index, x, y, polarity, timestamp (micro-seconds)

Return type

events

log_images(u8imgs, eps=1e-07)

Converts byte images to log

Parameters

u8imgs (torch.Tensor) – B,C,H,W,T byte images
eps (float) – epsilon factor

randomize_broken_pixels(first_times, video_proba=0.01, crazy_pixel_proba=0.0005, dead_pixel_proba=0.005)

Simulates dead & crazy pixels

Parameters

first_times – B video just started flags
video_proba – probability to simulate broken pixels

randomize_cutoff(first_times, cutoff_min=0, cutoff_max=900)

Randomizes the cutoff rates per video

Parameters

first_times – B video just started flags
cutoff_min – in hz
cutoff_max – in hz

randomize_leak(first_times, leak_min=0, leak_max=1)

Randomizes the leak rates per video

Parameters

first_times – B video just started flags
leak_min – in hz
leak_max – in hz

randomize_refractory_periods(first_times, ref_min=10, ref_max=1000)

Randomizes the refractory period per video

Parameters

first_times – B video just started flags
ref_min – in microseconds
ref_max – in microseconds

randomize_shot(first_times, shot_min=0, shot_max=1)

Randomizes the shot noise per video

Parameters

shot_min – in hz
shot_max – in hz

randomize_thresholds(first_times, th_mu_min=0.05, th_mu_max=0.2, th_std_min=0.001, th_std_max=0.01)

Re-Randomizes thresholds per video

Parameters

first_times – B video just started flags
th_mu_min (scalar or list of scalars) – min average threshold if list, will be considered as [th_mu_min_OFF, th_mu_min_ON]
th_mu_max (scalar or list of scalars) – max average threshold if list, will be considered as [th_mu_max_OFF, th_mu_max_ON]
th_std_min – min threshold standard deviation
th_std_max – max threshold standard deviation

Implementation in numba cuda GPU of kernels used to simulate events from images GPU kernels used to simulate events from images

cpu + cuda kernels for gpu simulation of cutoff