SDK Core ML Data API

Synthetic video generation

Camera Pose Generator:

This module allows to define a trajectory of camera poses and generate continuous homographies by interpolating when maximum optical flow is beyond a predefined threshold.

class metavision_core_ml.data.camera_poses.CameraPoseGenerator(height, width, max_frames=300, pause_probability=0.5, max_optical_flow_threshold=2.0, max_interp_consecutive_frames=20)

CameraPoseGenerator generates a series of continuous homographies with interpolation.

Parameters

height (int) – height of image
width (int) – width of image
max_frames (int) – maximum number of poses
pause_probability (float) – probability that the sequence contains a pause
max_optical_flow_threshold (float) – maximum optical flow between two consecutive frames
max_interp_consecutive_frames (int) – maximum number of interpolated frames between two consecutive frames

get_flow(rvec1, tvec1, rvec2, tvec2, height, width)

Computes Optical flow between 2 poses

Parameters

rvec1 (np.array) – rotation vector 1
tvec1 (np.array) – translation vector 1
rvec2 (np.array) – rotation vector 2
tvec2 (np.array) – translation vector 2
nt (np.array) – plane normal
depth (float) – depth from camera
height (int) – height of image
width (int) – width of image

get_image_transform(rvec1, tvec1, rvec2, tvec2)

Get Homography between 2 poses

Parameters

rvec1 (np.array) – rotation vector 1
tvec1 (np.array) – translation vector 1
rvec2 (np.array) – rotation vector 2
tvec2 (np.array) – translation vector 2

metavision_core_ml.data.camera_poses.add_random_pause(signal, max_pos_size_ratio=0.3)

Adds a random pause in a multidimensional signal

Parameters

signal (np.array) – TxD signal
max_pos_size_ratio (float) – size of pause relative to signal Length

metavision_core_ml.data.camera_poses.generate_homographies(rvecs, tvecs, nt, d)

Generates multiple homographies from rotation vectors

Parameters

rvecs (np.array) – N,3 rotation vectors
tvecs (np.array) – N,3 translation vectors
nt (np.array) – normal to camera
d (float) – depth

metavision_core_ml.data.camera_poses.generate_homographies_from_rotation_matrices(rot_mats, tvecs, nt, depth)

Generates multiple homographies from rotation matrices

Parameters

rot_mats (np.array) – N,3,3 rotation matrices
tvecs (np.array) – N,3 translation vectors
nt (np.array) – normal to camera
depth (float) – depth to camera

metavision_core_ml.data.camera_poses.generate_homography(rvec, tvec, nt, depth)

Generates a single homography

Parameters

rvec (np.array) – rotation vector
tvec (np.array) – translation vector
nt (np.array) – normal to camera
depth (float) – depth to camera

metavision_core_ml.data.camera_poses.generate_image_homographies_from_homographies(h, K, Kinv)

Multiplies homography left & right by intrinsic matrix

Parameters

h (np.array) – homographies N,3,3
K (np.array) – intrinsic
Kinv (np.ndarray) – inverse intrinsic

metavision_core_ml.data.camera_poses.generate_image_homography(rvec, tvec, nt, depth, K, Kinv)

Generates a single image homography

Parameters

rvec (np.array) – rotation vector
tvec (np.array) – translation vector
nt (np.array) – normal to camera
depth (float) – depth
K (np.array) – intrinsic matrix
Kinv (np.array) – inverse intrinsic matrix

metavision_core_ml.data.camera_poses.generate_smooth_signal(num_signals, num_samples, min_speed=0.0001, max_speed=0.1)

Generates a smooth signal

Parameters

num_signals (int) – number of signals to generate
num_samples (int) – length of multidimensional signal
min_speed (float) – minimum rate of change
max_speed (float) – maximum rate of change

metavision_core_ml.data.camera_poses.get_flow(rvec1, tvec1, rvec2, tvec2, nt, depth, K, Kinv, height, width)

Computes Optical Flow between 2 poses

Parameters

rvec1 (np.array) – rotation vector 1
tvec1 (np.array) – translation vector 1
rvec2 (np.array) – rotation vector 2
tvec2 (np.array) – translation vector 2
nt (np.array) – plane normal
depth (float) – depth from camera
K (np.array) – intrinsic matrix
Kinv (np.array) – inverse intrinsic matrix
height (int) – height of image
width (int) – width of image
infinite (bool) – plan is infinite or not

metavision_core_ml.data.camera_poses.get_grid(height, width)

Computes a 2d meshgrid

Parameters

height (int) – height of grid
width (int) – width of grid

metavision_core_ml.data.camera_poses.get_image_transform(rvec1, tvec1, rvec2, tvec2, nt, depth, K, Kinv)

Get image Homography between 2 poses (includes cam intrinsics)

Parameters

rvec1 (np.array) – rotation vector 1
tvec1 (np.array) – translation vector 1
rvec2 (np.array) – rotation vector 2
tvec2 (np.array) – translation vector 2
nt (np.array) – plane normal
depth (float) – depth from camera
K (np.array) – intrinsic
Kinv (np.ndarray) – inverse intrinsic

metavision_core_ml.data.camera_poses.get_transform(rvec1, tvec1, rvec2, tvec2, nt, depth)

Get geometric Homography between 2 poses

Parameters

rvec1 (np.array) – rotation vector 1
tvec1 (np.array) – translation vector 1
rvec2 (np.array) – rotation vector 2
tvec2 (np.array) – translation vector 2
nt (np.array) – plane normal
depth (float) – depth from camera

metavision_core_ml.data.camera_poses.interpolate_poses(rvecs, tvecs, nt, depth, K, Kinv, height, width, opt_flow_threshold=2, max_frames_per_bin=20)

Interpolate given poses

Parameters

rvecs (np.array) – N,3 rotation vectors
tvecs (np.array) – N,3 translation vectors
nt (np.array) – plane normal
depth (float) – depth to camera
K (np.array) – camera intrinsic
Kinv (np.array) – inverse camera intrinsic
height (int) – height of image
width (int) – width of image
opt_flow_threshold (float) – maximum flow threshold
max_frames_per_bin (int) – maximum number of pose interpolations between two consecutive poses of the original list of poses

metavision_core_ml.data.camera_poses.interpolate_times_tvecs(tvecs, key_times, inter_tvecs, inter_times, nums)

Interpolates between key times times & translation vectors

Parameters

tvecs (np.array) – key translation vectors (N, 3)
key_times (np.array) – key times (N, )
inter_tvecs (np.array) – interpolated translations (nums.sum(), 3)
inter_times (np.array) – interpolated times (nums.sum(),)
nums (np.array) – number of interpolation point between key points (N-1,) nums[i] is the number of points between key_times[i] (included) and key_times[i+1] (excluded) minimum is 1, which corresponds to key_times[i]

6-DOF motion in front of image plane All in numpy + OpenCV Applies continuous homographies to your picture in time. Also you can get the optical flow for this motion.

class metavision_core_ml.data.image_planar_motion_stream.PlanarMotionStream(image_filename, height, width, max_frames=1000, rgb=False, infinite=True, pause_probability=0.5, max_optical_flow_threshold=2.0, max_interp_consecutive_frames=20, crop_image=False, saturation_max_factor=1.0)

Generates a planar motion in front of the image

Parameters

image_filename (str) – path to image
height (int) – desired height
width (int) – desired width
max_frames (int) – number of frames to stream
rgb (bool) – color images or gray
infinite (bool) – border is mirrored
pause_probability (float) – probability to add a pause during the stream
max_optical_flow_threshold (float) – maximum optical flow between two consecutive frames
max_interp_consecutive_frames (int) – maximum number of interpolated frames between two consecutive frames
crop_image (bool) – crop images or resize them
saturation_max_factor (float) – multiplicative factor of saturated pixels (only for tiff 16 bits images. Use 1.0 to disable)

6-DOF motion in front of image plane and returns corners positions All in numpy + OpenCV Applies continuous homographies to your picture in time.

class metavision_core_ml.data.corner_planar_motion_stream.CornerPlanarMotionStream(image_filename, height, width, max_frames=1000, rgb=False, infinite=True, pause_probability=0.5, draw_corners_as_circle=True)

Generates a planar motion in front of the image, returning both images and Harris’ corners

Parameters

image_filename – path to image
height – desired height
width – desired width
max_frames – number of frames to stream
rgb – color images or gray
infinite – border is mirrored
pause_probability – probability of stream to pause
draw_corners_as_circle – if true corners will be 2 pixels circles

Video Streaming

Scheduling System for Videos

class metavision_core_ml.data.scheduling.Metadata(path, start_frame, end_frame)

Represents part of a file to be read.

Parameters

path (str) – path to video
start_frame (int) – first frame to seek to
end_frame (int) – last frame to read

metavision_core_ml.data.scheduling.build_image_metadata(folder, min_size, max_size, denominator=1)

Build Metadata from images

Parameters

folder (str) – path to pictures
min_size (int) – minimum number of frames
max_size (int) – maximum number of frames
denominator (int) – num_frames will always be a multiple of denominator. It is used to avoid having batches that are missing some frames and need to be padded. This happens when the number of time steps is not a multiple of num_frames.

metavision_core_ml.data.scheduling.build_metadata(folder, min_length, max_length, denominator=1)

Builds Metadata for Videos and Images

Parameters

folder (str) – path to videos or images
min_length (int) – minimum number of frames
max_length (int) – maximum number of frames
denominator (int) – denominator of number of frames for image metadata

metavision_core_ml.data.scheduling.build_tiff_image_metadata(folder, min_size, max_size, denominator=1)

Build Metadata from images

Parameters

folder (str) – path to pictures
min_size (int) – minimum number of frames
max_size (int) – maximum number of frames
denominator (int) – num_frames will always be a multiple of denominator. It is used to avoid having batches that are missing some frames and need to be padded. This happens when the number of time steps is not a multiple of num_frames.

metavision_core_ml.data.scheduling.build_video_metadata(folder)

Builds Metadata from videos

Parameters: folder (str) – path to videos (only looks in current directory, not subfolders)

metavision_core_ml.data.scheduling.split_video_metadata(metadatas, min_size, max_size)

Split video metadata into smaller ones.

Parameters

metadatas (list) – list of metadata objects
min_size (int) – minimum number of frames
max_size (int) – maximum number of frames

Video .mp4 or .avi Iterator Right now backend is OpenCV

class metavision_core_ml.data.video_stream.TimedVideoStream(video_filename, height=- 1, width=- 1, start_frame=0, max_frames=0, rgb=False, override_fps=0)

Video iterator opening both a video stream and a file of timestamps. If it does not exist, it generates them with a regular period of the frequency described in the metadata of the video or 1/override_fps if given. Timestamps are delivered in microseconds.

Parameters

input_path (str) – Path to the file to read.
height (int) – Height of the output frames.
width (int) – Width of the output frames.
start_frame (int) – First frame to seek to
max_frames (int) – Maximum number of frames loaded (if greater than the number of frames in the video it will return all of them).
rgb (bool) – Whether the output should be in rgb or greyscale.
override_fps (int) – Frequency of the generated timestamps Hz (if timestamp file is not available)
0 (If equal to) –
metadata (the frequency will be taken from the video's) –

get_size()

Function returning the size of the imager which produced the events.

Returns: Tuple of int (height, width) which might be (None, None)

Generic data Streaming

Module that enables Parallel Multistreaming.

We define an IterableDataset that streams several iterables. When fed to a Pytorch DataLoader with batch_size=None, this streams batches from one worker at a time. This has the effect of enabling parallel streaming.

The StreamLoader is a class built on top of DataLoader, that fuses batches together so batches are always temporally coherent.

Notice that you can also avoid using this fusion and just use a regular DataLoader, and have multiple neural networks indexed by worker’s id.

class metavision_core_ml.data.stream_dataloader.StreamDataLoader(dataset, num_workers, collate_fn)

Wraps around the DataLoader to handle the asynchronous batches.

Parameters

dataset (StreamDataset) – dataset streaming multiple iterables
num_workers (int) – number of workers
collate_fn (function) – function to collate batch parts

cpu(): Sets the StreamDataLoader to leave tensors on CPU.

cuda(device=device(type='cuda'))

Sets the StreamDataLoader to copy tensors to GPU memory before returning them.

Parameters: device (torch.device) – The destination GPU device. Defaults to the current CUDA device.

to(device)

Sets the StreamDataLoader to copy tensors to the given device before returning them.

Parameters: device (torch.device) – The destination GPU device. For instance torch.device(‘cpu’) or torch.device(‘cuda’).

class metavision_core_ml.data.stream_dataloader.StreamDataset(stream_list, streamer, batch_size, padding_mode, fill_value, seed=None)

Stream Dataset An Iterable Dataset zipping a group of iterables streams together.

Parameters

stream_list (list) – list of streams (path/ metadata)
streamer (object) – an iterator (user defined)
batch_size (int) – total batch size
padding_mode (str) – “zeros” “data” or “none”, see “get_zip” function
fill_value (object) – padding value
seed (int) – seed integer to make the dataloading deterministic

metavision_core_ml.data.stream_dataloader.resample_to_batch_size(stream_list, batch_size)

Resamples list to fit batch_size iterators

Parameters

stream_list (list) – list of streams
batch_size (int) – batch size

metavision_core_ml.data.stream_dataloader.split_batch_size(batch_size, num_workers)

Returns the number of files to handle per worker

Parameters

batch_size (int) – total batch size
num_workers (int) – number of workers

metavision_core_ml.data.stream_dataloader.split_dataset_sizes(stream_list, split_sizes)

Splits with different sizes proportional to the number of files each worker has to handle.

Parameters

stream_list (list) – list of stream path
split_sizes (list) – batch size per worker