Quick Start

In this section, we will walk through the steps required to create a simple trained model for object detection and use it.

We will:

  • download a dataset

  • convert the dataset to a format compatible with training

  • train a model

  • evaluate the obtained model

  • create a simple pipeline

We assume you have already installed Metavision SDK. If not, then go through the installation instructions.

Note that some parts of this tutorial are computationally intensive: training a model requires significant resources. We suggest using a computer with a dedicated GPU.


This tutorial will use a simple dataset, thus the resulting model will not reach the quality required for a real application and will not give representative results that can be obtained using machine learning on event-based data.


The first step in any machine learning project is data collection. In this example, we will not collect the data ourselves, as this would require a significant amount of work. We will instead use a simplified mini-dataset (for larger-scale datasets, check the Recordings and Datasets page).

First download and unzip the dataset in a convenient directory on your computer (beware that this dataset, while called “mini”, is still 24G uncompressed, so make sure you have enough bandwidth and disk space).

The dataset is composed of a JSON file and three directories. The JSON file label_map_dictionary.json contains the mapping between the classes that are present in the dataset and their numerical IDs. In our simple dataset, this is the content of the file:

{"0": "pedestrian", "1": "two wheeler", "2": "car", "3": "truck", "4": "bus", "5": "traffic sign", "6": "traffic light"}

The three folders contain the data for training, validation and testing. Each folder contains a list of twin files, in this format:

  • <record_name>.dat, which contains the event-based data

  • <record_name>_bbox.npy, which contains the ground truth labels for supervised training

These files are examples of what you could obtain by recording your own data using Metavision Studio and creating the labels manually.

Now that we have a dataset, we need to convert the data in a format that is compatible with our ML tools.

Convert the dataset

Converting a dataset means transforming the RAW/DAT files into tensors containing a preprocessed representation of event-based data. More details can be found in the preprocessing chapter. We store the data into HDF5 tensor files, which is a storage format commonly used in machine learning.

Let’s convert the data, by running the generate_hdf5.py script:


cd <path to generate_hdf5.py>
python3 generate_hdf5.py <path to dataset>/*.dat --preprocess histo -o <path to the output folder> --height_width 360 640


cd <path to generate_hdf5.py>
python generate_hdf5.py <path to dataset>/*.dat --preprocess histo -o <path to the output folder> --height_width 360 640

We use the histo preprocessing to create histograms of events at a fixed frequency in the output directory. Remember to copy the label and dictionary files to the same output directory. We also scale the input data by a factor of 2 using --height_width 360 640. Only power of 2 of the sensor original resolutions are available. In this case the sensor resolution is 720p or (720 x 1280).

We can visualize the converted dataset using the viz_data.py script. This will allow us to have a look at the dataset, verifying that it is opened correctly by our tools and that the ground truth is correct. Add show-bbox to visualize the GT.


cd <path to viz_data.py>
python3 viz_data.py <path to dataset> [--show-bbox]


cd <path to viz_data.py>
python viz_data.py <path to dataset> [--show-bbox]

This is an example of the output of viz_data.py:

the output of viz_data.py

Now we have the .h5 files which can be used for training.


Some precomputed datasets for automative detection are listed on the Datasets page and are available for download.


Training can be done using our pre-available training script. This includes a network topology which is well suited for object detection.

To verify that everything is correctly set up and your data is in the expected format, you can do a debug run. We can do it by asking the training tool to only consider a subset of the full dataset and run only few epochs. For example, using only 1% of the dataset and two epochs will allow us to verify quickly that everything is working as expected:


cd <path to train_detection.py>
python3 train_detection.py <path to output directory> <path to dataset> --limit_train_batches 0.01 --limit_val_batches 0.01 --limit_test_batches 0.01 --max_epochs 2


cd <path to train_detection.py>
python train_detection.py <path to output directory> <path to dataset> --limit_train_batches 0.01 --limit_val_batches 0.01 --limit_test_batches 0.01 --max_epochs 2

This debug training run did not create any usable model, but it lasted only few minutes and can be a useful tool to avoid surprises.

Finally, we can run a complete training using:


cd <path to train_detection.py>
python3 train_detection.py <path to output directory> <path to dataset>


cd <path to train_detection.py>
python train_detection.py <path to output directory> <path to dataset>

This is an example of the console output. You can monitor the status of the training epoch and get KPIs at the end of each epoch. In this image you can see the KPIs of the epoch 1 and the progress of the epoch 2:

a sample of the console output

You can also supervise the training using TensorBoard by executing the following command and then opening http://localhost:6006/:

tensorboard --logdir <path to output directory>

This is an example of the TensorBoard page at the end of the training:

a sample of TensorBoard

Periodically, and at the end of the training, the script will generate checkpoints, which are savepoints of the trained model, and videos as a random collection of validation data to visually validate the training process.

After the specified number of epochs, the training process ends. The last created checkpoint will be your final model. You can find the model in <path to output directory>/checkpoints.

Now that we have the trained model, we can compute the metrics on the test set and use it in a pipeline.

As you can see, the results we get are poor. This is expected, we are using only 4 training videos, which means that the network does not have enough data to learn.

To obtain better results, you should use a larger dataset. However, this will require enough space to store all the data locally and might require several hours of processing.


Finally, we can use our trained model in a pipeline.

You can use it in our sample Python pipeline or C++ pipeline.

To run our C++ pipeline:

  • First, export the model as a torchjit model:


    cd <path to export_detector.py>
    python3 export_detector.py <path to checkpoint> <output directory for the exported model>


    cd <path to export_detector.py>
    python export_detector.py <path to checkpoint> <output directory for the exported model>
  • Launch the Metavision Detection and Tracking Pipeline (note that you need to compile it first as explained in the sample documentation)


    metavision_detection_and_tracking_pipeline --record-file <RAW file to process> --object-detector-dir <path to the exported model> --display


    metavision_detection_and_tracking_pipeline.exe --record-file <RAW file to process> --object-detector-dir <path to the exported model> --display

Next steps

In this tutorial, we used a labeled dataset to train a model and then, the resulted model was used in a pipeline to detect objects. You have seen how it is possible to benefit from machine learning using event-based data. To make the most of machine learning and event-based data and obtain the best results for your application, we suggest you to continue exploring the ML documentation and start enjoying the power of Metavision ML.