This Python sample has a corresponding C++ sample.

Inference Pipeline of Detection and Tracking using Python

This sample allows you to quickly set up an inference pipeline for object detection and tracking. You can use our pre-trained TorchScript model for detection and tracking of vehicles and pedestrians. Check our pre-trained models page to find out how to retrieve the model depending on your Metavision Intelligence Package.


Note that the network was trained on a dataset recorded by a camera positioned on top of a car facing forward. Its performance might be quite degraded in other settings.

The source code of this sample can be found in <install-prefix>/share/metavision/sdk/ml/python_samples/detection_and_tracking_pipeline when installing Metavision SDK from installer or packages. For other deployment methods, check the page Path of Samples.

Expected Output

The pipeline takes events as input and outputs detected objects with bounding boxes and their corresponding confidence level.

The detected and tracked bounding boxes will be shown in two windows set side by side: the detection is shown on the left pane, with colors indicating the class membership; the tracking is drawn on the right with colors indicating the trackID and confidence level.

Setup & requirements

To run the script, you will need:

  • a pre-trained TorchScript model with a JSON file of hyperparameters. Check our pre-trained models

  • an event-based camera or a RAW or DAT file. We suggest you to start with driving_sample.raw, downloadable from our Sample Recordings

How to start

To run this sample, make sure you followed the Machine Learning Module Dependencies in the installation guide that requires to deploy PyTorch and other Python packages.

To start the script based on recorded data, you need to provide the full path to a RAW or DAT file and the path to the pre-trained model:


python3 --object_detector_dir /path/to/model --record_file <RAW file to process> --display


python --object_detector_dir /path/to/model --record_file <RAW file to process> --display

The script comes with extensive functionalities covering the following aspects:

  • Input: Define the input source, sampling period, start and end timestamp

  • --object_detector_dir: path to a folder containing a model.ptjit torchjit model and a info_ssd_jit.json file containing a few hyperparameters.

  • Output: Produce inference video (.mp4), export detected and tracked bbox (.npy)

  • if --output_video_filename is set, the corresponding file is created

  • if --output_detections_filename is set, the corresponding file is created. It contains the output boxes of the object detector (the neural network). The format is a numpy structured array of dtype: metavision_sdk_ml.EventBbox, which is: dtype({'names':['t','x','y','w','h','class_id','track_id','class_confidence'], ... }).

  • if --output_tracks_filename is set, the corresponding file is created. It contains the output boxes of the tracking. The format is a numpy structured array of dtype: metavision_sdk_ml.EventTrackedBox, which is: dtype({'names':['t','x','y','w','h','class_id','track_id','class_confidence','tracking_confidence','last_detection_update_time','nb_detections'], ... })

  • Object Detector: Define the pre-trained detection model and its calibrated hyperparameters; Set up inference thresholds for detection confidence and NMS-IoU level

  • Geometric Preprocessing: Provide geometric preprocessing of event stream: input transposition, filter events outside of a RoI

  • Noise Filtering: Trail and STC filters

  • Data Association: Define matching thresholds for tracking confidence and NMS-IoU level, together with other association parameters

To find the full list of options, run:


python3 -h


python -h