Training a Classification Model

Introduction

In this tutorial, we will show you how to train a Rock-paper-scissors classifier with the supervised classification module of Metavision ML. We will walk you through the whole training pipeline, from data acquisition to model deployment.

Dataset

We use Chifoumi dataset, recorded with Prophesee EVK1 Gen3.1 (VGA). The dataset contains in total 2576 samples, each of which captures a motion sequence of gesture entering and leaving the scene. The data is collected under the following conditions:

  • 3 different speed: slow, normal, fast

  • with left and right hand

  • 10-50 cm distance to the camera

  • 20 recordings contains flickering

The data was first auto-labeled based on the event rate and then corrected manually. Let’s take a quick look at the data.

import os
import numpy as np
import torch
from metavision_core.event_io import EventsIterator
from metavision_sdk_core import BaseFrameGenerationAlgorithm
import cv2
  1. Load one data sample

DAT_FILE = /pah/to/chifoumi/scissors_right_close_slow_standing_recording_020_2021-09-14_15-10-27_cd.dat
label_file = DAT_FILE.split("_cd.dat")[0] + "_bbox.npy" # leveraging default EventBbox format, no bbox is required
assert os.path.isfile(label_file), "check your label input path!"
assert os.path.isfile(DAT_FILE), "check your DAT input path!"
LABEL_DICT = {"0": "paper", "1": "rock", "2": "scissor"}
VIZ_WINDOW = "Data Sample Visualization"
  1. Construct a frame based on annotation frequency

def get_delta_t(npy_data):
    deltaTs = npy_data["ts"][1:] - npy_data['ts'][:-1]
    delta_t =  np.unique(deltaTs).item()
    return delta_t
labels = np.load(label_file)
deltaT = get_delta_t(labels)
mv_it = EventsIterator(DAT_FILE, delta_t=deltaT)
height, width = mv_it.get_size()
img = np.zeros((height, width, 3), dtype=np.uint8)
cv2.namedWindow(VIZ_WINDOW)


for idx, ev in enumerate(mv_it):
    BaseFrameGenerationAlgorithm.generate_frame(ev, img)
    t = mv_it.get_current_time()
    if t in labels["ts"]:
        label = LABEL_DICT[str(labels[labels['ts']==t]['class_id'].item())]
        cv2.putText(img, label, (10, img.shape[0] - 20), cv2.FONT_HERSHEY_SIMPLEX, 2, (0,0, 255))
    cv2.imshow(VIZ_WINDOW, img)
    cv2.waitKey(50)
cv2.destroyWindow(VIZ_WINDOW)

## Model architecture

The model architecture is quite similar to the detection model described in Train Detection Tutorial, except that we only need a classifier head in the end.

Train with Pytorch lightning

To reduce your development time, we’ve created a Python sample train_classification.py, which allows you to train, fine-tune an EB classification model based on precomputed HDF5 dataset. The training pipeline is set up based on PyTorch Lightning framework. You need to split your whole dataset into 3 subsets: train, dev, and test. See details in Python sample train_classification.py.

Let’s see what you can do with this training module:

python3 train_classification.py --help
usage: train_classification.py [-h] [--lr LR]
                               [--lr_scheduler_step_gamma LR_SCHEDULER_STEP_GAMMA]
                               [--batch_size BATCH_SIZE] [--delta-t DELTA_T]
                               [--wd WD] [-w WEIGHTS [WEIGHTS ...]]
                               [--val_every VAL_EVERY]
                               [--demo_every DEMO_EVERY]
                               [--save_every SAVE_EVERY]
                               [--max_epochs MAX_EPOCHS] [--just_val]
                               [--just_test] [--just_demo] [--resume]
                               [--checkpoint CHECKPOINT]
                               [--precision PRECISION] [--cpu]
                               [--limit_train_batches LIMIT_TRAIN_BATCHES]
                               [--limit_val_batches LIMIT_VAL_BATCHES]
                               [--limit_test_batches LIMIT_TEST_BATCHES]
                               [--accumulate_grad_batches ACCUMULATE_GRAD_BATCHES]
                               [--fast_dev_run] [--finetune]
                               [--height_width HEIGHT_WIDTH HEIGHT_WIDTH]
                               [--num_tbins NUM_TBINS]
                               [--classes CLASSES [CLASSES ...]]
                               [--num_workers NUM_WORKERS] [--skip_us SKIP_US]
                               [--label_delta_t LABEL_DELTA_T]
                               [--use_nonzero_labels]
                               [--disable_data_augmentation]
                               [--train_plus_val]
                               [--models {ConvRNNClassifier,LeNetClassifier}]
                               [--feature_base FEATURE_BASE]
                               [--feature_channels_out FEATURE_CHANNELS_OUT]
                               [--no_window]
                               [--show_plots {cm,pr,roc} [{cm,pr,roc} ...]]
                               [--inspect_result]
                               root_dir dataset_path

positional arguments:
  root_dir              logging directory
  dataset_path          path

optional arguments:
  -h, --help            show this help message and exit
  --lr LR               learning rate (default: 0.0001)
  --lr_scheduler_step_gamma LR_SCHEDULER_STEP_GAMMA
                        learning rate scheduler step param (disabled if None)
                        (default: None)
  --batch_size BATCH_SIZE
                        batch size (default: 4)
  --delta-t DELTA_T     timeslice duration in us (default: 10000)
  --wd WD               weight decay (default: 1e-05)
  -w WEIGHTS [WEIGHTS ...], --weights WEIGHTS [WEIGHTS ...]
                        list of weights for each class (default: [])
  --val_every VAL_EVERY
                        validate every X epochs (default: 1)
  --demo_every DEMO_EVERY
                        run demo every X epoch (default: 2)
  --save_every SAVE_EVERY
                        save checkpoint every X epochs (default: 1)
  --max_epochs MAX_EPOCHS
                        train for X epochs (default: 200)
  --just_val            run validation on val set (default: False)
  --just_test           run validation on test set (default: False)
  --just_demo           run demo video with trained network (default: False)
  --resume              resume from latest checkpoint (default: False)
  --checkpoint CHECKPOINT
                        resume from specific checkpoint (default: )
  --precision PRECISION
                        mixed precision training default to float16 (default:
                        16)
  --cpu                 use cpu (default: False)
  --limit_train_batches LIMIT_TRAIN_BATCHES
                        limit train batches to fraction of dataset (default:
                        1.0)
  --limit_val_batches LIMIT_VAL_BATCHES
                        limit val batches to fraction of dataset (default:
                        1.0)
  --limit_test_batches LIMIT_TEST_BATCHES
                        limit test batches to fraction of dataset (default:
                        1.0)
  --accumulate_grad_batches ACCUMULATE_GRAD_BATCHES
                        accumulate gradient for more than a single batch
                        (default: 1)
  --fast_dev_run        run a single batch (default: False)
  --finetune            finetune from checkpoint (default: False)
  --height_width HEIGHT_WIDTH HEIGHT_WIDTH
                        if set, downsize the feature tensor to the
                        corresponding resolution using interpolation (default:
                        None)
  --num_tbins NUM_TBINS
                        timesteps per batch for truncated backprop (default:
                        12)
  --classes CLASSES [CLASSES ...]
                        subset of classes to use (default: [])
  --num_workers NUM_WORKERS
                        number of threads (default: 2)
  --skip_us SKIP_US     skip this amount of microseconds (default: 0)
  --label_delta_t LABEL_DELTA_T
                        delta_t of annotation in (us) (default: 10000)
  --use_nonzero_labels  if set, only use labels which are non-zero (default:
                        True)
  --disable_data_augmentation
                        Disable data augmentation during training (default:
                        True)
  --train_plus_val      if set, train using train+val, test on test (default:
                        False)
  --models {ConvRNNClassifier,LeNetClassifier}
                        model architecture type (default: ConvRNNClassifier)
  --feature_base FEATURE_BASE
                        growth factor of feature extractor (default: 16)
  --feature_channels_out FEATURE_CHANNELS_OUT
                        number of channels per feature-map (default: 128)
  --no_window           Disable output window during demo (only write a video)
                        (default: False)
  --show_plots {cm,pr,roc} [{cm,pr,roc} ...]
                        select evaluation plots on test datasets (default:
                        ['cm', 'pr', 'roc'])
  --inspect_result      Inspect sequential results together with their
                        recording frames (default: False)

## KPIs

Once the model is trained with train/val dataset, you can evaluate your model on your test dataset using the --just_test parameter with the following KPIs: confusion matrix, PR-curve and ROC curve.

The KPIs of the pre-trained Chifoumi model on the test dataset are illustrated below.

Confusion Matrix ROC Curve PR Curve

Inference

Once the model is trained, you can run inference test with our Python sample classification_inference.py. It allows you to run live inference with Prophesee EB camera, RAW/DAT recordings, and precomputed HDF5 tensor files.

python3 classification_inference.py --help
usage: classification_inference.py [-h] [-p PATH] [--delta-t DELTA_T]
                                   [--start-ts START_TS]
                                   [--max-duration MAX_DURATION]
                                   [--height-width HW HW] [-t CLS_THRESHOLD]
                                   [--cpu] [-s SAVE_H5] [-w WRITE_VIDEO]
                                   [--no-display]
                                   [--max_incr_per_pixel MAX_INCR_PER_PIXEL]
                                   [--max_low_activity_tensor MAX_LOW_ACTIVITY_TENSOR]
                                   [--max_low_activity_nb_frames MAX_LOW_ACTIVITY_NB_FRAMES]
                                   [--display_reset_memory]
                                   checkpoint

Perform inference with the classification module

positional arguments:
  checkpoint            path to the checkpoint containing the neural network
                        name.

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  RAW, HDF5 or DAT filename, leave blank to use a
                        camera. Warning if you use an HDF5 tensor file the parameters
                        used for pre-computation must match those of the
                        model. (default: )
  --delta-t DELTA_T     duration of timeslice (in us) in which events are
                        accumulated to compute features. (default: 50000)
  --start-ts START_TS   timestamp (in microseconds) from which the computation
                        begins. (default: 0)
  --max-duration MAX_DURATION
                        maximum duration of the inference file in us.
                        (default: None)
  --height-width HW HW  if set, downscale the feature tensor to the requested
                        resolution using interpolation Possible values are
                        only power of two of the original resolution.
                        (default: None)
  -t CLS_THRESHOLD, --threshold CLS_THRESHOLD
                        classification threshold (default: 0.7)
  --cpu                 run on CPU (default: False)
  -s SAVE_H5, --save SAVE_H5
                        Path of the directory to save the result in an HDF5
                        format (default: )
  -w WRITE_VIDEO, --write-video WRITE_VIDEO
                        Path of the directory to save the visualization in a
                        .mp4 video. (default: )
  --no-display          if set, deactivate the display Window (default: True)
  --max_incr_per_pixel MAX_INCR_PER_PIXEL
                        Maximum number of increments (events) per pixel. This
                        value needs to be consistent with that of the training
                        (default: 2)
  --max_low_activity_tensor MAX_LOW_ACTIVITY_TENSOR
                        Maximum tensor value for a frame to be considered as
                        low activity (default: 0.15)
  --max_low_activity_nb_frames MAX_LOW_ACTIVITY_NB_FRAMES
                        Maximum number of low activity frames before the model
                        internal state is reset (default: 5)
  --display_reset_memory
                        Displays when network is reset (low activity)
                        (default: False)