Note
This C++ sample has a corresponding Python sample.
Inference Pipeline of Gesture Classification in C++
The script allows you to quickly set up an inference pipeline for gesture classification.
The source code of this sample can be found in <install-prefix>/share/metavision/sdk/ml/cpp_samples/metavision_gesture_classification
when installing Metavision SDK from installer or packages. For other deployment methods, check the page
Path of Samples.
Expected Output
The script takes event stream as input and generates a sequence of predictions.
The demo below shows a live Rock-Paper-Scissor game based on our inference script:
Setup & requirements
To run the script, you will need:
a pre-trained TorchScript model with a JSON file of hyperparameters (e.g.
convRNN_chifoumi
from our pre-trained models)event data using one of these formats:
an event-based camera
an event file: RAW, DAT or HDF5 event file (you can find event files in our Sample Recordings page)
Note
Since HDF5 tensor file contains preprocessed features, you need to be sure that the same preprocessing method is used for the classification model and the HDF5
file.
For instance, our trained classification model uses histo_quantized
method, so if you want to use HDF5 tensor files as input, they need
to be processed with histo_quantized
as well.
Note
The model might be sensitive to the input frame resolution. If you have downsampled the tensor input during training, it’s better to pass the same tensor resolution during the inference as well, by using the “–height-width” argument.
How to start
First, you need to compile the sample.
Here, we assume you followed the Machine Learning Module Dependencies in the installation guide
that requires to deploy libtorch in a LIBTORCH_DIR_PATH
directory. If so, use those cmake
commands to compile:
cmake .. -DCMAKE_PREFIX_PATH=`LIBTORCH_DIR_PATH` -DCMAKE_BUILD_TYPE=Release cmake --build . --config Release
To start the sample based on recorded data, you need to provide the full path to the pre-trained model and the path to the input file. Leave the file path empty if you want to use a live camera. For example:
Linux
metavision_gesture_classification --record-file /path/to/input_file --detector-model convRNN_chifoumi/rnn_model_classifier.ptjit --display
Windows
metavision_gesture_classification.exe --record-file </path/to/input_file --detector-model convRNN_chifoumi/rnn_model_classifier.ptjit --display
Warning
Normally you should set the accumulation time interval (--delta-t
) the same value as the one during the training.
But if there is bandwidth constraint to run it live, you can try to increase the value accordingly, at a potential loss of accuracy.
To find the full list of options, run:
Linux
metavision_gesture_classification -h
Windows
metavision_gesture_classification.exe -h