Using ML models with events
We provide some models for Core ML and ML modules as explained in Pre-trained Models.
We will describe in this section how to use them through both the C++ and Python Metavision SDK API.
C++ inference
C++ classes
The C++ API provides a Model
class which is generic and allows loading a variety of
ML models for inference. Currently, the SDK supports two backends for inference: Torch and ONNX, which are widely used
frameworks for machine learning model deployment. To instantiate a model, the user should use the
create_model
static function which expects a map of parameters (indexed by
their name) that will be used by the concrete backend implementation. Some parameters are common to all backends, and
some are specific to each backend.
Here is a breakdown of the parameters common to all backends:
Parameter |
Type |
Description |
---|---|---|
model-path |
|
The path to the model file |
use-cuda |
|
A boolean to enable or disable CUDA support |
gpu-id |
|
The GPU ID to use for CUDA support |
backend |
|
The backend to use for the inference |
And here are the parameters specific to the ONNX backend:
Parameter |
Type |
Description |
---|---|---|
optimization-level |
|
The optimization level to use for the ONNX model. It can be one of the following:
|
intra-op-num-threads |
|
The number of threads to use for intra-op parallelism. |
inter-op-num-threads |
|
The number of threads to use for inter-op parallelism. |
with-xnnpack |
|
A boolean to enable or disable XNNPACK support. |
profile |
|
The path to the profile file to use for the ONNX model. |
Note
For more information about the ONNX backend parameters, please refer to the official ONNX documentation.
The SDK is delivered as pre-compiled binaries with the support for Torch models.
Support for ONNX models can be enabled by compiling the SDK from source with the options -DUSE_ONNXRUNTIME=ON
and -DONNXRUNTIME_DIR=<ONNX_FOLDER>
.
Note
The proposed Model
class is provided to demonstrate the feasibility of applying ML models to event data (integrated
within a Tensor
structure). These classes were designed for flexibility and integration simplicity rather than speed.
The Model
class provides a get_input
function
to retrieve the inputs to be provided to the model at inference time, as well as a
get_output
function to retrieve the data provided by the model. They come as
std::unordered_map
structures containing Values
. Those are a generic way to host
Tensor
instances at various depths (some models may provide several layers of inferred data).
Note
The Values
might provide Tensors
with -1
values. This is the case when some model
has a variable dimension, such as the dimensions of images to fill the tensor with. In that case, those dimensions should
be fixed before using this model, either at model production or before inference (as is done in one of the
Optical flow sample).
output_flow_width_ = network_input_width_;
Most importantly, the infer
method realizes the actual inference from input data.
The output map is updated and can be used later on.
A few samples make use of the Model
class for inference of different types:
Model JSON file
Another important point is that this class expects the model to come along with a JSON file with the same name describing
its inputs and outputs. This same JSON file also contains preprocessing parameters. Therefore, it is important to be able
to load the model parameters from this JSON file. It can be done thanks to the
parse_preprocessors_params
function. An example is shown below.
const auto root_node = Metavision::get_tree_from_file(network_json_filename);
Metavision::parse_preprocessors_params(root_node.get_child("input").get_child("preprocessing"), preprocess_maps);
float width_scaling = 1.f;
float height_scaling = 1.f;
// Create a rescaler if the resolution of the camera/recording doesn't match the network input
if (network_input_height_ != sensor_height_ || network_input_width_ != sensor_width_) {
To generalize, the JSON file should be provided along with the model. It should contain:
An informative model type
- An
input
section describing the inputs that should be provided to the model It contains itself an
inputs
section with various inputs the model might expect Each input should define its dimensions along with its name (in form of a concatenated string) and their type.As well as a
preprocessing
section describing the event preprocessing to be applied. This section contains parameters describedhere
.
- An
- An
output
section describing the outputs of the model outputs
which defines all outputs provided by the model similarly to the inputs (dimensions [name, value] and type)Other relevant info such as labels, post-processing parameters etc. They can be used with a custom JSON loader.
- An
An example is displayed below:
{
"model_type": "classifier",
"input": {
"inputs": {
"cls_input": {
"dimensions": {
"name": "NHWC",
"dim": [
1,
120,
160,
2
]
},
"type": "UINT8"
},
"scale_factor": {
"dimensions": {
"name": "W",
"dim": [
1
]
},
"type": "FLOAT32"
}
},
"preprocessing": {
"type": "hardware_histo",
"delta_t": 10000,
"neg_saturation": 15,
"pos_saturation": 15,
"input_names": [
"cls_input"
]
}
},
"output": {
"outputs": {
"cls_output": {
"dimensions": {
"name": "NC",
"dim": [
1,
4
]
},
"type": "FLOAT32"
}
},
"labels": [
"background",
"paper",
"rock",
"scissor"
]
}
}
Python inference
For inference in Python, please refer to this page which provides inference tools with the Python API of Metavision SDK.