Note
This C++ sample has a corresponding Python sample.
Inference Pipeline of Detection and Tracking using C++
This sample allows you to quickly set up an inference pipeline for object detection and tracking. You can use our pre-trained TorchScript model for detection and tracking of vehicles and pedestrians. Check our pre-trained models page to find out how to retrieve the model depending on your Metavision SDK Package.
The source code of this sample can be found in <install-prefix>/share/metavision/sdk/ml/cpp_samples/metavision_detection_and_tracking
when installing Metavision SDK from installer or packages. For other deployment methods, check the page
Path of Samples.
Expected Output
The pipeline takes events as input and outputs detected objects with bounding boxes and their corresponding confidence level.
The detected and tracked bounding boxes will be shown in two windows set side by side: the detection is shown on the left pane, with colors indicating the class membership; the tracking is drawn on the right with colors indicating the trackID and confidence level.
Note
This sample is highly sensitive to the scene and settings. Please review and keep the following items in mind:
the network was trained on a dataset recorded by a camera positioned on top of a car facing forward. Its performance might be quite degraded in other settings.
light conditions play an important role in performance, as latency is lower in high-light conditions than in low-light conditions. Beware also of flickering lights that can cause surge in event rate (you can use AFK and/or tune biases to mitigate such a situation)
make sure your camera is properly focussed and check the minimum working distance of your lens (for example with the standard lens of the EVK4, the minimum working distance is 10 cm. So you won’t achieve good focus when tracking objects at distances less than 10 cm using the standard lens)
you should also adjust the parameters of the algorithm to improve the detection and/or processing time (“Data Association options” is a good start)
Setup & requirements
To run the sample, you will need:
a pre-trained TorchScript model with a JSON file of hyperparameters. Check our pre-trained models
an event-based camera or an event file (RAW, DAT or HDF5). We suggest you to start with
driving_sample.raw
, downloadable from our Sample Recordings
How to start
First, you need to compile the sample.
Here, we assume you followed the Machine Learning Module Dependencies in the installation guide
that requires to deploy libtorch in a LIBTORCH_DIR_PATH
directory. If so, use those cmake
commands to compile:
cmake .. -DCMAKE_PREFIX_PATH=`LIBTORCH_DIR_PATH` -DCMAKE_BUILD_TYPE=Release cmake --build . --config Release
To start the sample based on recorded data, you need to provide the full path to an event
file and the path to the pre-trained model. Here, we use the file driving_sample.raw
and the model red_event_cube_05_2020
:
Linux
metavision_detection_and_tracking --record-file driving_sample.raw --detector-model red_event_cube_05_2020/model.ptjit --display
Windows
metavision_detection_and_tracking.exe --record-file driving_sample.raw --detector-model red_event_cube_05_2020/model.ptjit --display
The sample comes with extensive functionalities covering the following aspects:
Input: Define the input source, sampling period, start and end timestamp
--detector-model
: path to a file containing the object detector model. Depending on the backend ML framework used by the model, the folder should also contain the metadata file(s) expected by the model. For instance, a torch model folder should contain 1) the .ptjit file 2) a JSON file with same basename containing the model metadata.
Output: Produce inference video (.avi), export detected and tracked bbox (in csv format)
if
--output-video-filename
is set, the corresponding file is created.if
--output-detections-filename
is set, the corresponding file is created. It contains the output boxes of the object detector (the neural network). The format is a csv with one detection box per line, each line containing the following fields (separated by spaces):timestamp, class_id, 0, x, y, width, height, class_confidence
if
--output-tracks-filename
is set, the corresponding file is created. It contains the output boxes of the tracking. The format is a csv with one tracked box per line, each line containing the following fields (separated by commas):timestamp, class_id, track_id, x, y, width, height, class_confidence, tracking_confidence, last_detection_update_time, nb_detections
Object Detector: Define the pre-trained detection model and its calibrated hyperparameters; Set up inference thresholds for detection confidence and NMS-IoU level
Geometric Preprocessing: Provide geometric preprocessing of event stream: input transposition, filter events outside of a RoI
Noise Filtering: Trail and STC filters
Data Association: Define matching thresholds for tracking confidence and NMS-IoU level, together with other association parameters
To find the full list of options, run:
Linux
metavision_detection_and_tracking -h
Windows
metavision_detection_and_tracking.exe -h