Training an EB classification model


In this tutorial, we will show you how to train a Rock-paper-scissors classifier with the supervised classification module of Metavision ML. We will walk you through the whole training pipeline, from data acquisition to model deployment.


We use Chifoumi dataset, recorded with Prophesee EVK1 Gen3.1 (VGA). The dataset contains in total 2576 samples, each of which captures a motion sequence of gesture entering and leaving the scene. The data is collected under the following conditions:

  • 3 different speed: slow, normal, fast

  • with left and right hand

  • 10-50 cm distance to the camera

  • 20 recordings contains flickering

The data was first auto-labeled based on the event rate and then corrected manually. Let’s take a quick look at the data.

import os
import numpy as np
import torch
from metavision_core.event_io import EventsIterator
from metavision_sdk_core import BaseFrameGenerationAlgorithm
import cv2
zstandard not installed, defaults to zlib
  1. Load one data sample

DAT_FILE = "../"*6 + "/datasets/metavision_ml/classification/paper_200211_101249_0_0_td.dat" # path to your chifoumi data sample
label_file = DAT_FILE.split("_td.dat")[0] + "_bbox.npy" # leveraging default EventBbox format, no bbox is required
assert os.path.isfile(label_file), "check your label input path!"
assert os.path.isfile(DAT_FILE), "check your DAT input path!"
LABEL_DICT = {"0": "paper", "1": "rock", "2": "scissor"}
VIZ_WINDOW = "Data Sample Visualization"
  1. Construct a frame based on annotation frequency

def get_delta_t(npy_data):
    deltaTs = npy_data["ts"][1:] - npy_data['ts'][:-1]
    delta_t =  np.unique(deltaTs).item()
    return delta_t
labels = np.load(label_file)
deltaT = get_delta_t(labels)
mv_it = EventsIterator(DAT_FILE, delta_t=deltaT)
height, width = mv_it.get_size()
img = np.zeros((height, width, 3), dtype=np.uint8)

for idx, ev in enumerate(mv_it):
    BaseFrameGenerationAlgorithm.generate_frame(ev, img)
    t = mv_it.get_current_time()
    if t in labels["ts"]:
        label = LABEL_DICT[str(labels[labels['ts']==t]['class_id'].item())]
        cv2.putText(img, label, (10, img.shape[0] - 20), cv2.FONT_HERSHEY_SIMPLEX, 2, (0,0, 255))
    cv2.imshow(VIZ_WINDOW, img)

## Model architecture Fa The model architecture is quite similar to the detection model described in Train Detection Tutorial, except that we only need a classifer head in the end.

Train with Pytorch lightning

To reduce your development time, we’ve created a Python sample, which allows you to train, fine-tune an EB classification model based on precomputed HDF5 dataset. The training pipeline is set up based on PyTorch Lightning framework. You need to split your whole dataset into 3 subsets: train, dev, and test. See details in Python sample

Let’s see what you can do with this training module:

rel_path = "../"*6+"sdk/modules/ml/python_extended/samples/train_classification" # path to your train_classification folder
%run $rel_path/ --help

## KPIs

Once the model is trained with train/val dataset, it will automatically run an inference evaluation on your test dataset with the following KPIs: confusion matrix, PR-curve and ROC curve.

The KPIs of the pretrained Chifoumi model on the test dataset are illustrated below.








Once the model is trained, you can run inference test with our Python sample It allows you to run live inference with Prophesee EB camera, RAW/DAT recordings, and precomputed HDF5 files.

rel_path = "../"*6+"sdk/modules/ml/python_extended/samples" # path to your python samples
%run $rel_path/classification_inference/ --help
usage: [-h] [-p PATH] [--delta-t DELTA_T]
                                   [--start-ts START_TS]
                                   [--max-duration MAX_DURATION]
                                   [--height-width HW HW] [-t CLS_THRESHOLD]
                                   [--cpu] [-s SAVE_H5] [-w WRITE_VIDEO]
                                   [--max_incr_per_pixel MAX_INCR_PER_PIXEL]
                                   [--max_low_activity_tensor MAX_LOW_ACTIVITY_TENSOR]
                                   [--max_low_activity_nb_frames MAX_LOW_ACTIVITY_NB_FRAMES]

Perform inference with the classification module

positional arguments:
  checkpoint            path to the checkpoint containing the neural network

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  RAW, HDF5 or DAT filename, leave blank to use a
                        camera.Warning if you use a HDF5 file the parameters
                        used for pre-computation must match those of the
                        model. (default: )
  --delta-t DELTA_T     duration of timeslice (in us) in which events are
                        accumulated to compute features. (default: 50000)
  --start-ts START_TS   timestamp (in microseconds) from which the computation
                        begins. (default: 0)
  --max-duration MAX_DURATION
                        maximum duration of the inference file in us.
                        (default: None)
  --height-width HW HW  if set, downscale the feature tensor to the requested
                        resolution using interpolation Possible values are
                        only power of two of the original resolution.
                        (default: None)
                        classification threshold (default: 0.7)
  --cpu                 run on CPU (default: False)
  -s SAVE_H5, --save SAVE_H5
                        Path of the directory to save the result in a hdf5
                        format (default: )
  -w WRITE_VIDEO, --write-video WRITE_VIDEO
                        Path of the directory to save the visualization in a
                        .mp4 video. (default: )
  --no-display          if set, deactivate the display Window (default: True)
  --max_incr_per_pixel MAX_INCR_PER_PIXEL
                        Maximum number of increments (events) per pixel. This
                        value needs to be consistent with that of the training
                        (default: 2)
  --max_low_activity_tensor MAX_LOW_ACTIVITY_TENSOR
                        Maximum tensor value for a frame to be considered as
                        low activity (default: 0.15)
  --max_low_activity_nb_frames MAX_LOW_ACTIVITY_NB_FRAMES
                        Maximum number of low activity frames before the model
                        internal state is reset (default: 5)
                        Displays when network is reset (low activity)
                        (default: False)