Using ML models with events

We provide some models for Core ML and ML modules as explained in Pre-trained Models.

We will describe in this section how to use them through both the C++ and Python Metavision SDK API.

C++ inference

C++ classes

The C++ API provides a Model class which is generic and allows loading a variety of ML models for inference. Currently, the SDK supports two backends for inference: Torch and ONNX, which are widely used frameworks for machine learning model deployment. To instantiate a model, the user should use the create_model static function which expects a map of parameters (indexed by their name) that will be used by the concrete backend implementation. Some parameters are common to all backends, and some are specific to each backend.

Here is a breakdown of the parameters common to all backends:

Parameter	Type	Description
model-path	`std::filesystem::path`	The path to the model file
use-cuda	`bool`	A boolean to enable or disable CUDA support
gpu-id	`int`	The GPU ID to use for CUDA support
backend	`Metavision::MLParameters::backend`	The backend to use for the inference

And here are the parameters specific to the ONNX backend:

Parameter	Type	Description
optimization-level	`string`	The optimization level to use for the ONNX model. It can be one of the following: `"DISABLE"`: No optimization. `"BASIC"`: Basic optimization. `"EXTENDED"`: Extended optimization. `"ALL"`: All optimizations.
intra-op-num-threads	`uint32_t`	The number of threads to use for intra-op parallelism.
inter-op-num-threads	`uint32_t`	The number of threads to use for inter-op parallelism.
with-xnnpack	`bool`	A boolean to enable or disable XNNPACK support.
profile	`std::filesystem::path`	The path to the profile file to use for the ONNX model.

Note

For more information about the ONNX backend parameters, please refer to the official ONNX documentation.

The SDK is delivered as pre-compiled binaries with the support for Torch models. Support for ONNX models can be enabled by compiling the SDK from source with the options -DUSE_ONNXRUNTIME=ON and -DONNXRUNTIME_DIR=<ONNX_FOLDER>.

Note

The proposed Model class is provided to demonstrate the feasibility of applying ML models to event data (integrated within a Tensor structure). These classes were designed for flexibility and integration simplicity rather than speed.

The Model class provides a get_input function to retrieve the inputs to be provided to the model at inference time, as well as a get_output function to retrieve the data provided by the model. They come as std::unordered_map structures containing Values. Those are a generic way to host Tensor instances at various depths (some models may provide several layers of inferred data).

Note

The Values might provide Tensors with -1 values. This is the case when some model has a variable dimension, such as the dimensions of images to fill the tensor with. In that case, those dimensions should be fixed before using this model, either at model production or before inference (as is done in one of the Optical flow sample).

        output_flow_width_ = network_input_width_;

Most importantly, the infer method realizes the actual inference from input data. The output map is updated and can be used later on.

A few samples make use of the Model class for inference of different types:

Model JSON file

Another important point is that this class expects the model to come along with a JSON file with the same name describing its inputs and outputs. This same JSON file also contains preprocessing parameters. Therefore, it is important to be able to load the model parameters from this JSON file. It can be done thanks to the parse_preprocessors_params function. An example is shown below.

    const auto root_node = Metavision::get_tree_from_file(network_json_filename);
    Metavision::parse_preprocessors_params(root_node.get_child("input").get_child("preprocessing"), preprocess_maps);

    float width_scaling = 1.f;
    float height_scaling = 1.f;
    // Create a rescaler if the resolution of the camera/recording doesn't match the network input
    if (network_input_height_ != sensor_height_ || network_input_width_ != sensor_width_) {

To generalize, the JSON file should be provided along with the model. It should contain:

An informative model type
An input section describing the inputs that should be provided to the model
- It contains itself an inputs section with various inputs the model might expect Each input should define its dimensions along with its name (in form of a concatenated string) and their type.
- As well as a preprocessing section describing the event preprocessing to be applied. This section contains parameters described here.
An output section describing the outputs of the model
- outputs which defines all outputs provided by the model similarly to the inputs (dimensions [name, value] and type)
- Other relevant info such as labels, post-processing parameters etc. They can be used with a custom JSON loader.

An example is displayed below:

{
    "model_type": "classifier",
    "input": {
        "inputs": {
            "cls_input": {
                "dimensions": {
                    "name": "NHWC",
                    "dim": [
                        1,
                        120,
                        160,
                        2
                    ]
                },
                "type": "UINT8"
            },
            "scale_factor": {
                "dimensions": {
                    "name": "W",
                    "dim": [
                        1
                    ]
                },
                "type": "FLOAT32"
            }
        },
        "preprocessing": {
            "type": "hardware_histo",
            "delta_t": 10000,
            "neg_saturation": 15,
            "pos_saturation": 15,
            "input_names": [
                "cls_input"
            ]
        }
    },
    "output": {
        "outputs": {
            "cls_output": {
                "dimensions": {
                    "name": "NC",
                    "dim": [
                        1,
                        4
                    ]
                },
                "type": "FLOAT32"
            }
        },
        "labels": [
            "background",
            "paper",
            "rock",
            "scissor"
        ]
    }
}

Python inference

For inference in Python, please refer to this page which provides inference tools with the Python API of Metavision SDK.