SDK Core ML Video to Event Simulator API
Video stream dataset
Image stream data loader
- class metavision_core_ml.video_to_event.video_stream_dataset.VideoDatasetIterator(metadata, height, width, rgb, mode='frames', min_tbins=3, max_tbins=10, min_dt=3000, max_dt=50000, batch_times=1, pause_probability=0.5, max_optical_flow_threshold=2.0, max_interp_consecutive_frames=20, max_number_of_batches_to_produce=None, crop_image=False, saturation_max_factor=1.0)
Dataset Iterator streaming images and timestamps
- Parameters
metadata (object) – path to picture or video
height (int) – height of input images / video clip
width (int) – width of input images / video clip
rgb (bool) – stream rgb videos
mode (str) – mode of batch sampling ‘frames’,’delta_t’,’random’
min_tbins (int) – minimum number of frames per batch step
max_tbins (int) – maximum number of frames per batch step
min_dt (int) – minimum duration of frames per batch step
max_dt (int) – maximum duration of frames per batch step
batch_times (int) – number of timesteps of training sequences
pause_probability (float) – probability to add a pause (no events) (works only with PlanarMotionStream)
max_optical_flow_threshold (float) – maximum allowed optical flow between two consecutive frames (works only with PlanarMotionStream)
max_interp_consecutive_frames (int) – maximum number of interpolated frames between two consecutive frames (works only with PlanarMotionStream)
max_number_of_batches_to_produce (int) – maximum number of batches to produce
crop_image (bool) – crop images or resize them
saturation_max_factor (float) – multiplicative factor of saturated pixels (only for tiff 16 bits images. Use 1.0 to disable)
- metavision_core_ml.video_to_event.video_stream_dataset.make_video_dataset(path, num_workers, batch_size, height, width, min_length, max_length, mode='frames', min_frames=5, max_frames=30, min_delta_t=5000, max_delta_t=50000, rgb=False, seed=None, batch_times=1, pause_probability=0.5, max_optical_flow_threshold=2.0, max_interp_consecutive_frames=20, max_number_of_batches_to_produce=None, crop_image=False, saturation_max_factor=1.0)
Makes a video / moving picture dataset.
- Parameters
path (str) – folder to dataset
batch_size (int) – number of video clips / batch
height (int) – height
width (int) – width
min_length (int) – min length of video
max_length (int) – max length of video
mode (str) – ‘frames’ or ‘delta_t’
min_frames (int) – minimum number of frames per batch
max_frames (int) – maximum number of frames per batch
min_delta_t (int) – in microseconds, minimum duration per batch
max_delta_t (int) – in microseconds, maximum duration per batch
rgb (bool) – retrieve frames in rgb
seed (int) – seed for randomness
batch_times (int) – number of time steps in training sequence
pause_probability (float) – probability to add a pause during the sequence (works only with PlanarMotionStream)
max_optical_flow_threshold (float) – maximum allowed optical flow between two consecutive frames (works only with PlanarMotionStream)
max_interp_consecutive_frames (int) – maximum number of interpolated frames between two consecutive frames (works only with PlanarMotionStream)
max_number_of_batches_to_produce (int) – maximum number of batches to produce. Makes sure the stream will not produce more than this number of consecutive batches using the same image or video.
crop_image (bool) – crop images or resize them
saturation_max_factor (float) – multiplicative factor of saturated pixels (only for tiff 16 bits images. Use 1.0 to disable)
- metavision_core_ml.video_to_event.video_stream_dataset.pad_collate_fn(data_list)
Here we pad with last image/ timestamp to get a contiguous batch
CPU Event simulator
EventSimulator: Load a .mp4 video and start streaming events
- class metavision_core_ml.video_to_event.simulator.EventSimulator(height, width, Cp, Cn, refractory_period, sigma_threshold=0.0, cutoff_hz=0, leak_rate_hz=0, shot_noise_rate_hz=0, verbose=False)
Event Simulator
Implementation is based on the following publications:
Video to Events: Recycling Video Datasets for Event Cameras: Daniel Gehrig et al.
V2E: From video frames to realistic DVS event camera streams: Tobi Delbruck et al.
This object allows to accumulate events by feeding it with images and (increasing) timestamps. The events are returned of type EventCD (see definition in event_io/dat_tools or metavision_sdk_base)
- Parameters
Cp (float) – mean for ON threshold
Cn (float) – mean for OFF threshold
refractory_period (float) – min time between 2 events / pixel
sigma_threshold (float) – standard deviation for threshold array
cutoff_hz (float) – cutoff frequency for photodiode latency simulation
leak_rate_hz (float) – frequency of reference value leakage
shot_noise_rate_hz (float) – frequency for shot noise events
- dynamic_moving_average(new_frame, ts, eps=1e-07)
Apply nonlinear lowpass filter here. Filter is 2nd order lowpass IIR that uses two internal state variables to store stages of cascaded first order RC filters. Time constant of the filter is proportional to the intensity value (with offset to deal with DN=0)
- Parameters
new_frame (np.ndarray) – new image
ts (int) – new timestamp (us)
- flush_events()
Erase current events
- get_events()
Grab events
- get_size()
Function returning the size of the imager which produced the events.
- Returns
Tuple of int (height, width) which might be (None, None)
- image_callback(img, img_ts)
Accumulates Events into internal buffer
- Parameters
img (np.ndarray) – uint8 gray image of shape (H,W)
img_ts (int) – timestamp in micro-seconds.
- Returns
current total number of events
- Return type
num
- leak_events(delta_t)
Leak events: switch in diff change amp leaks at some rate equivalent to some hz of ON events. Actual leak rate depends on threshold for each pixel. We want nominal rate leak_rate_Hz, so R_l=(dI/dt)/Theta_on, so R_l*Theta_on=dI/dt, so dI=R_l*Theta_on*dt
- Parameters
delta_t (int) – time between 2 images (us)
- log_image_callback(log_img, img_ts)
For debugging, log is done outside
- reset()
Resets buffers
- set_config(config='noisy')
Set configuration
- Parameters
config (str) – name for configuration
- shot_noise_events(event_buffer, ts, num_events, num_iters)
NOISE: add temporal noise here by simple Poisson process that has a base noise rate self.shot_noise_rate_hz. If there is such noise event, then we output event from each such pixel
the shot noise rate varies with intensity: for lowest intensity the rate rises to parameter. the noise is reduced by factor SHOT_NOISE_INTEN_FACTOR for brightest intensities
- Parameters
ts (int) – timestamp
num_events (int) – current number of events
num_iters (int) – max events per pixel since last round
- metavision_core_ml.video_to_event.simulator.eps_log(x, eps=1e-05)
Takes Log of image
- Parameters
x – uint8 gray frame
- metavision_core_ml.video_to_event.single_image_make_events_cpu.make_events_cpu(events, ref_values, last_img, last_event_timestamp, log_img, last_img_ts, delta_t, Cps, Cns, refractory_period)
produce events into AER format
- Parameters
events (np.ndarray) – array in format EventCD
ref_values (np.ndarray) – current log intensity state / pixel (H,W)
last_img (np.ndarray) – last image log intensity (H,W)
last_event_timestamp (int) – last image timestamp
log_img (np.ndarray) – current log intensity image (H,W)
last_img_ts (np.ndarray) – last timestamps emitted / pixel (2,H,W)
delta_t (int) – current duration (us) since last image.
Cps (np.ndarray) – array of ON thresholds
Cns (np.ndarray) – array of OFF thresholds
refractory_period (int) – minimum time between 2 events / pixel
Simple Iterator built around the Metavision Reader classes.
- class metavision_core_ml.video_to_event.simu_events_iterator.SimulatedEventsIterator(input_path, start_ts=0, mode='delta_t', delta_t=10000, n_events=10000, max_duration=None, relative_timestamps=False, height=- 1, width=- 1, Cp=0.11, Cn=0.1, refractory_period=0.001, sigma_threshold=0.0, cutoff_hz=0, leak_rate_hz=0, shot_noise_rate_hz=0, override_fps=0)
SimulatedEventsIterator is a small convenience class to generate an iterator of events from any video
- reader
class handling the video (iterator of the frames and their timestamps).
- delta_t
Duration of served event slice in us.
- Type
int
- max_duration
If not None, maximal duration of the iteration in us.
- Type
int
- end_ts
If max_duration is not None, last time_stamp to consider.
- Type
int
- relative_timestamps
Whether the timestamps of served events are relative to the current reader timestamp, or since the beginning of the recording.
- Type
boolean
- Parameters
input_path (str) – Path to the file to read.
start_ts (int) – First timestamp to consider (in us).
mode (string) – Load by timeslice or number of events. Either “delta_t” or “n_events”
delta_t (int) – Duration of served event slice in us.
n_events (int) – Number of events in the timeslice.
max_duration (int) – If not None, maximal duration of the iteration in us.
relative_timestamps (boolean) – Whether the timestamps of served events are relative to the current reader timestamp, or since the beginning of the recording.
Cp (float) – mean for ON threshold
Cn (float) – mean for OFF threshold
refractory_period (float) – min time between 2 events / pixel
sigma_threshold (float) – standard deviation for threshold array
cutoff_hz (float) – cutoff frequency for photodiode latency simulation
leak_rate_hz (float) – frequency of reference value leakage
shot_noise_rate_hz (float) – frequency for shot noise events
override_fps (int) – override fps of the input video.
Examples
>>> for ev in SimulatedEventsIterator("beautiful_record.mp4", delta_t=1000000, max_duration=1e6*60): >>> print("Rate : {:.2f}Mev/s".format(ev.size * 1e-6))
- get_size()
Function returning the size of the imager which produced the events.
- Returns
Tuple of int (height, width) which might be (None, None)
GPU Event Simulator
More efficient reimplementation. The main difference is cuda kernels & possibility to directly stream the voxel grid.
- class metavision_core_ml.video_to_event.gpu_simulator.GPUEventSimulator(batch_size, height, width, c_mu=0.1, c_std=0.022, refractory_period=10, leak_rate_hz=0, cutoff_hz=0, shot_noise_hz=0)
GPU Event Simulator of events from frames & timestamps.
Implementation is based on the following publications:
Video to Events: Recycling Video Datasets for Event Cameras: Daniel Gehrig et al.
V2E: From video frames to realistic DVS event camera streams: Tobi Delbruck et al.
- Parameters
batch_size (int) – number of video clips / batch
height (int) – height
width (int) – width
c_mu (float or list) – threshold average if scalar will consider same OFF and ON thresholds if list, will be considered as [ths_OFF, ths_ON]
c_std (float) – threshold standard deviation
period (refractory) – time before event can be triggered again
leak_rate_hz (float) – frequency of reference voltage leakage
cutoff_hz (float) – frequency for photodiode latency
shot_noise_hz (float) – frequency of shot noise events
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- count_events(log_images, video_len, image_ts, first_times, reset=True, persistent=True)
Estimates the number of events per pixel.
- Parameters
log_images (Tensor) – shape (H, W, total_num_frames) tensor containing the video frames
video_len (Tensor) – shape (B,) len of each video in the batch.
images_ts (Tensor) – shape (B, max(video_len)) timestamp associated with each frame.
first_times (Tensor) – shape (B) whether the video is a new one or the continuation of one.
reset – do reset the count variable
- Returns
B,H,W
- Return type
counts
- dynamic_moving_average(images, num_frames, timestamps, first_times, min_pixel_range=20, max_pixel_incr=20, eps=1e-07)
Converts byte images to log and performs a pass-band motion blur of incoming images. This simulates the latency of the photodiode w.r.t to incoming light dynamic.
- Parameters
images (torch.Tensor) – H,W,T byte or float images in the 0 to 255 range
num_frames (torch.Tensor) – shape (B,) len of each video in the batch.
timestamps (torch.Tensor) – B,T timestamps
first_times (torch.Tensor) – B flags
eps (float) – epsilon factor
- event_volume(log_images, video_len, image_ts, first_times, nbins, mode='bilinear', split_channels=False)
Computes a volume of discretized images formed after the events, without storing the AER events themselves. We go from simulation directly to this space-time quantized representation. You can obtain the event-volume of [Unsupervised Event-based Learning of Optical Flow, Zhu et al. 2018] by specifying the mode to “bilinear” or you can obtain a stack of histograms if mode is set to “nearest”.
- Parameters
log_images (Tensor) – shape (H, W, total_num_frames) tensor containing the video frames
video_len (Tensor) – shape (B,) len of each video in the batch.
images_ts (Tensor) – shape (B, max(video_len)) timestamp associated with each frame.
first_times (Tensor) – shape (B) whether the video is a new one or the continuation of one.
nbins (int) – number of time-bins for the voxel grid
mode (str) – bilinear or nearest
split_channels – if True positive and negative events have a distinct channels instead of doing their difference in a single channel.
- event_volume_sequence(log_images, video_len, image_ts, target_timestamps, first_times, nbins, mode='bilinear', split_channels=False)
Computes a volume of discretized images formed after the events, without storing the AER events themselves. We go from simulation directly to this space-time quantized representation. You can obtain the event-volume of [Unsupervised Event-based Learning of Optical Flow, Zhu et al. 2018] by specifying the mode to “bilinear” or you can obtain a stack of histograms if mode is set to “nearest”. Here, we also receive a sequence of target timestamps to cut non uniformly the event volumes.
- Parameters
log_images (Tensor) – shape (H, W, total_num_frames) tensor containing the video frames
video_len (Tensor) – shape (B,) len of each video in the batch.
images_ts (Tensor) – shape (B, max(video_len)) timestamp associated with each frame.
first_times (Tensor) – shape (B) whether the video is a new one or the continuation of one.
nbins (int) – number of time-bins for the voxel grid
mode (str) – bilinear or nearest
split_channels – if True positive and negative events have a distinct channels instead of doing their difference in a single channel.
- forward()
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- get_events(log_images, video_len, image_ts, first_times)
Retrieves the AER event list in a pytorch array.
- Parameters
log_images (Tensor) – shape (H, W, total_num_frames) tensor containing the video frames
video_len (Tensor) – shape (B,) len of each video in the batch.
images_ts (Tensor) – shape (B, max(video_len)) timestamp associated with each frame.
first_times (Tensor) – shape (B) whether the video is a new one or the continuation of one.
- Returns
N,5 in batch_index, x, y, polarity, timestamp (micro-seconds)
- Return type
events
- log_images(u8imgs, eps=1e-07)
Converts byte images to log
- Parameters
u8imgs (torch.Tensor) – B,C,H,W,T byte images
eps (float) – epsilon factor
- randomize_broken_pixels(first_times, video_proba=0.01, crazy_pixel_proba=0.0005, dead_pixel_proba=0.005)
Simulates dead & crazy pixels
- Parameters
first_times – B video just started flags
video_proba – probability to simulate broken pixels
- randomize_cutoff(first_times, cutoff_min=0, cutoff_max=900)
Randomizes the cutoff rates per video
- Parameters
first_times – B video just started flags
cutoff_min – in hz
cutoff_max – in hz
- randomize_leak(first_times, leak_min=0, leak_max=1)
Randomizes the leak rates per video
- Parameters
first_times – B video just started flags
leak_min – in hz
leak_max – in hz
- randomize_refractory_periods(first_times, ref_min=10, ref_max=1000)
Randomizes the refractory period per video
- Parameters
first_times – B video just started flags
ref_min – in microseconds
ref_max – in microseconds
- randomize_shot(first_times, shot_min=0, shot_max=1)
Randomizes the shot noise per video
- Parameters
shot_min – in hz
shot_max – in hz
- randomize_thresholds(first_times, th_mu_min=0.05, th_mu_max=0.2, th_std_min=0.001, th_std_max=0.01)
Re-Randomizes thresholds per video
- Parameters
first_times – B video just started flags
th_mu_min (scalar or list of scalars) – min average threshold if list, will be considered as [th_mu_min_OFF, th_mu_min_ON]
th_mu_max (scalar or list of scalars) – max average threshold if list, will be considered as [th_mu_max_OFF, th_mu_max_ON]
th_std_min – min threshold standard deviation
th_std_max – max threshold standard deviation
Implementation in numba cuda GPU of kernels used to simulate events from images GPU kernels used to simulate events from images
cpu + cuda kernels for gpu simulation of cutoff