SDK ML Algorithms

class Metavision::CDProcessing

Processes CD event to compute neural network input frame (3 dimensional tensor)

This is the base class. It handles the rescaling of the events if necessary. It also provides accessors to get the shape of the output tensor. Derived class implement the computation. Calling operator() on this base class triggers the computation

Public Functions

inline CDProcessing(timestamp delta_t, int network_input_width, int network_input_height, int event_input_width = 0, int event_input_height = 0, bool use_CHW = true)

Constructs a CDProcessing object to ease the neural network input frame.

  • delta_t – Delta time used to accumulate events inside the frame

  • network_input_width – Neural network input frame’s width

  • network_input_height – Neural network input frame’s height

  • event_input_width – Sensor’s width

  • event_input_height – Sensor’s height

  • use_CHW – Boolean to define frame dimension order, True if the fields’ frame order is (Channel, Height, Width)

inline size_t get_frame_size() const

Gets the frame size.


the frame size in pixel (height * width * channels)

inline size_t get_frame_width() const

Gets the network’s input frame’s width.


Network input frame’s width

inline size_t get_frame_height() const

Gets the network’s input frame’s height.


Network input frame’s height

inline size_t get_frame_channels() const

Gets the number of channel in network input frame.


Number of channel in network input frame

inline bool is_CHW() const

Checks the tensor’s dimension order.


true if the dimension order is (channel, height, width)

inline std::vector<size_t> get_frame_shape() const

Gets the shape of the frame (3 dim, either CHW or HWC)


a vector of sizes

template<typename InputIt>
inline void operator()(const timestamp cur_frame_start_ts, InputIt begin, InputIt end, float *frame, int frame_size) const

Updates the frame depending on the input events.

Template Parameters

InputIt – type of input iterator (either a container iterator or raw pointer to EventCD)

  • cur_frame_start_ts – starting timestamp of the current frame

  • begin – Begin iterator

  • end – End iterator

  • frame – Pointer to the frame (input/output)

  • frame_size – Input frame size

class Metavision::NonMaximumSuppressionWithRescaling

Rescales events from network input format to the sensor’s size and suppresses Non-Maximum overlapping boxes.

Public Functions

inline NonMaximumSuppressionWithRescaling()

Builds non configured NonMaximumSuppressionWithRescaling object.

inline NonMaximumSuppressionWithRescaling(std::size_t num_classes, int events_input_width, int events_input_height, int network_input_width, int network_input_height, float iou_threshold)

Constructs object that rescales detected boxes and suppresses Non-Maximum overlapping boxes.

  • num_classes – Number of possible class returned by neural network

  • events_input_width – Sensor’s width

  • events_input_height – Sensor’s height

  • network_input_width – Neural network input frame’s width

  • network_input_height – Neural network input frame’s height

  • iou_threshold – Threshold on IOU metrics to consider that two boxes are matching

template<typename InputIt, typename OutputIt>
inline void process_events(const InputIt it_begin, const InputIt it_end, OutputIt inserter)

Rescales and filters boxes.

Template Parameters
  • InputIt – Read-Only input iterator type

  • OutputIt – Read-Write output iterator type

  • it_begin – Iterator to the first box

  • it_end – Iterator to the past-the-end box

  • inserter – Output iterator or back inserter

inline void set_iou_threshold(float threshold)

Sets Intersection Over Union (IOU) threshold.


Intersection Over Union (IOU) is the ratio of the intersection area over union area


threshold – Threshold on IOU metrics to consider that two boxes are matching

inline void ignore_class_id(std::size_t class_id)

Configures the computation to ignore some class identifier.


class_id – Identifier of the class to be ignored

Public Static Functions

static inline void compute_nms_per_class(std::list<EventBbox> &bbox_list, float iou_threshold)

Suppresses non-maximum overlapping boxes over a list of EventBbox-es.


The list is modified in-place. The result is sorted by confidence.

  • bbox_list[inout] List of EventBbox on which to apply the Non-maximum suppression

  • iou_threshold – Threshold above which two boxes are considered to overlap

class Metavision::ObjectDetectorTorchJit

Public Functions

inline ObjectDetectorTorchJit(const std::string &directory, int frame_width, int frame_height, int network_input_width = 0, int network_input_height = 0, bool use_cuda = false, int ignore_first_n_prediction_steps = 0, int gpu_id = 0)

Constructor for ObjectDetectorTorchJit.


When network_input_width and network_input_height are different from frame_width and frame_height, the corresponding rescaling is performed on the output bounding boxes, such that the output detection are still returned in the original input frame of the events

  • directory – Name of the directory containing at least two files:

    • model.ptjit : PyTorch model exported using torch.jit

    • info_ssd_jit.json : JSON file which contains several information about the neural network (type of input features, dimensions, accumulation time, list of classes, default thresholds, etc.)

  • frame_width – Sensor’s width

  • frame_height – Sensor’s height

  • network_input_width – Neural network’s width which could be smaller than frame_width. In this case the network will work on a downscaled size

  • network_input_height – Neural network’s height which could be smaller than frame_height. In this case the network will work a downscaled size

  • use_cuda – Boolean to indicate if we use gpu or not

  • ignore_first_n_prediction_steps – Number of discarded neural network predictions at the beginning of a sequence. Depending on initial conditions, recurrent models sometimes have a transitory regime in which they initially produce unreliable detections before they enter normal working regime.

  • gpu_id – GPU identification number that allows the selection of the gpu if several are available.

inline void use_cpu()

Performs all computations on the CPU.

inline bool use_gpu_if_available(int gpu_id = 0)

Performs the computations on the GPU if there is one.


gpu_id – ID of the gpu on which the computations must be performed


Boolean to indicate if the provided gpu_id is available

template<typename OutputIt>
inline void process(Frame_t &input, OutputIt bbox_first, timestamp ts)

Computes the detection given the provided input tensor.

  • input – Chunk of memory which corresponds to input tensor

  • bbox_first – Output iterator to add the detection boxes

  • ts – Timestamp of current timestep. Output boxes will have this timestamp

inline int get_network_height() const

Returns the input frame height.


Network input height in pixels

inline int get_network_width() const

Returns the input frame width.


Network input width in pixels

inline int get_network_input_channels() const

Returns the number of channels in the input frame.


Network input channel number

inline int get_network_input_size() const

Returns the network input size.


Size of the input frame

inline Metavision::timestamp get_accumulation_time() const

Returns the time during which the events are accumulated to compute the NN input tensor.


Delta time used to generate the input frame

inline CDProcessing &get_cd_processor()

Returns the object responsible for computing the content of the input tensor.


CDProcessing to ease the input frame generation

inline const std::vector<std::string> &get_labels() const

Returns a vector of labels for the classes of the neural network.


Vector of strings containing labels

inline void set_ts(Metavision::timestamp ts)

Initializes the internal timestamp of the object detector.

This is needed in order to use the start_ts parameter in the pipeline to start at a ts > 0


ts – time at which the first slice of time starts

inline void set_detection_threshold(float threshold)

Uses this detection threshold instead of the default value read from the JSON file.

This is the lower bound on the confidence score for a detection box to be accepted. It takes values in range ]0;1[ Low value -> more detections High value -> less detections


threshold – Lower bound on the detector confidence score

inline void set_iou_threshold(float threshold)

Uses this IOU threshold for NMS instead of the default value read from the JSON file.

Non-Maximum suppression discards detection boxes which are too similar to each other, keeping only the best one of such group. This similarity criterion is based on the measure of Intersection-Over-Union between the considered boxes. This threshold is the upper bound on the IOU for two boxes to be considered distinct (and therefore not filtered out by the Non-Maximum Suppression). It takes values in range ]0;1[ Low value -> less overlapping boxes High value -> more overlapping boxes


threshold – Upper bound on the IOU for two boxes to be considered distinct

inline void reset()

Resets the memory cells of the neural network.

Neural networks used as object detectors are usually RNNs (typically LSTMs). Use this function to reset the memory of the neural network when feeding new inputs unrelated to the previous ones : call reset() before applying the same object detector on a new sequence