Object Detection

Object Detection task with fcos model.

This document contains the explanations of arguments of each script.

You can find the tutorial for finetuning a pretrained model on custom dataset under the tutorial folder, tutorial/README.md.

The ipython notebook tutorial is also prepared under the tutorial folder as tutorial/tutorial.ipynb. You may upload and run this ipython notebook on Google colab.

Prerequisites

Python = 3.6 or 3.7

Installation

To install the dependencies, run

$ pip install -U pip
$ pip install -r requirements.txt
$ python setup.py build_ext --inplace

Dataset & Preparation

Standard Datasets

Our traning script accepts standard PASCAL VOC dataset and MS COCO dataset. You may download the dataset using the following link:

Download 2012 PASCAL VOC Dataset
Download 2017 MS COCO Dataset

Custom Datasets

You can also train the model on a custom dataset. The custom dataset is expected to follow the YOLO format. You may visit yolov5 document for more details.

Annotation Tools

You can use makesense.ai to create bounding boxes and labels for your images. For more details, you may visit makesense.ai and check their documents. An example of using makesense.ai to annotate custom data is also provided in the tutorial document.

dataset.yaml

For COCO dataset, you need to prepare the yaml file and save it under ./data/coco.yaml. The yaml file is expected to have the following format:

data_root: path to coco dataset dirtory 

# type of dataset
dataset_type: coco

val_set_name: val2017
train_set_name: train2017
train_annotations_path: path to coco training annotations path   
val_annotations_path: path to coco training validation path

For Pascal VOC dataset, you need to prepare the yaml file and save it under ./data/pascal.yaml. The yaml file is expected to have the following format:

data_root: path_to_voc_dataset/VOCdevkit/VOC2012
train: 'trainval'
val: 'val'

# type of dataset
dataset_type: pascal

For custom dataset, you need to prepare the yaml file and save it under ./data/. The yaml file is expected to have the following format (same as yolov5):

train: path to training dataset directory 
val: path to validation dataset directory   

nc: number of class

names: list of class names

Train

All outputs (log files and checkpoints) will be saved to the snapshot directory, which is specified by --snapshot-path. For training, execute the following command in fcos directory:

python train.py --backbone backbone_model_name --snapshot path_to_pretrained_model --freeze-backbone --batch-size 4 --gpu 0 --data path_to_data_yaml_file

--backbone Which backbone model to use.

--snapshot The path to pretrained model

--freeze-backbone Whether freeze the backbone when the pretrained model is used (True/False)

--gpu Which gpu to run. (-1 if cpu)

--batch-size Batch size. (Default: 4)

--epochs Number of epochs to train. (Default: 100)

--steps Number of steps per epoch. (Default: 5000)

--lr Learning rate. (Default: 1e-4)

--fpn The type of fpn model. Options: bifpn, dla, fpn, pan, simple (Default: simple) (Recommend: simple or pan)

--reg-func The type of regression function. Options: exp, simple (Default: simple)

--stage The num of stages. Options: 3, 5 (Default: 3)

--head-type The type of head. Options: ori, simple (Default: simple)

--centerness-pos Centerness branch position. Options: cls, reg (Default: reg)

--snapshot-path Path to store snapshots of models during training (Default: 'snapshots/{}'.format(today))

--input-size Input size of the model (Default: (512, 512))

--data The path to data yaml file

When the validation mAP stops increasing for 5 epochs, the early stopping will be triggered and the training process will be terminated.

Inference

For model infernce on a single image:

python inference.py --snapshot path_to_pretrained_model --input-shape model_input_size --gpu 0  --class-id-path path_to_class_id_mapping_file --img-path path_to_image --save-path path_to_saved_image

--snapshot the path to pretrained model

--gpu which gpu to run. (-1 if cpu) (Default: -1)

--input-shape Input shape of the model (Default: (512, 512))

--class-id-path Path to the class id mapping file.

--img-path Path to the image.

--save-path Path to draw and save the image with bbox.

--save-preds-path Path to save the inference bbox results.

--class-id-path Path to the class id mapping file. (Default: COCO class id mapping)

--max-objects The maximum number of objects in the image. (Default: 100)

--score-thres The score threshold of bounding boxes. (Default: 0.6)

--iou-thres the iou threshold for NMS. (Default: 0.5)

--max-objects Whether use Non-maximum Suppression (Default: 1)

You could find preprocessing and postprocessing processes in fcos/utils/fcos_det_preprocess.py and fcos/utils/fcos_det_postprocess.py.

Convert to ONNX

Pull the latest ONNX converter from github. You may read the latest document from Github for converting ONNX model. Execute commands in the folder ONNX_Convertor/keras-onnx:

python generated_onnx.py -o outputfile.onnx inputfile.h5

Evaluation

Evaluation Metric

We will use mean Average Precision (mAP) for evaluation. You can find the script for computing mAP in utils/eval.py.

mAP: mAP is the average of Average Precision (AP). AP summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight:

$AP = \sum_n (R_n-R_{n-1})P_n$

where $R_n$ and $P_n$ are the precision and recall at the nth threshold. The mAP compares the ground-truth bounding box to the detected box and returns a score. The higher the score, the more accurate the model is in its detections.

Evaluation on a Dataset

For evaluating the trained model on dataset:

python utils/eval.py --snapshot path_to_pretrained_model --gpu 0 --input-shape model_input_size --data path_to_data_yaml_file

--snapshot Path to pretrained model

--gpu Which gpu to run. (-1 if cpu) (Default: -1)

--input-shape Input shape of the model (Default: (512, 512))

--class-id-path Path to the class id mapping file.

--data The path to data yaml file

End-to-End Evaluation

If you would like to perform an end-to-end test with an image dataset, you can use inference_e2e.py under the directory fcos to obtain the prediction results. You have to prepare an initial parameter yaml file for the inference runner. You may check utils/init_params.json for the format.

python inference_e2e.py --img-path path_to_dataset_folder --params path_to_init_params_file --save-path path_to_save_json_file

--img-path Path to the dataset directory

--params Path to initial parameter yaml file for the inference runner

--save-path Path to save the prediction to a json file

--gpu GPU id (-1 if cpu) (Default: -1)

The predictions will be saved into a json file that has the following structure:

[
    {'img_path':image_path_1
    'bbox': [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id]]
    },
    {'img_path':image_path_2
    'bbox': [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id]]
    },
    ...
]

Models

Backbone	Input Size	FPN Type	FPS on 520	FPS on 720	Model Size
darknet53s	512	simple	5.96303	36.6844	25.3M
darknet53s	416	pan	7.27369	48.8437	33.9M
darknet53ss	416	simple	20.6361	136.093	6.9M
darknet53ss	320	simple	33.9502	252.713	6.9M
resnet18	512	simple	5.75156	33.9144	25.2M
resnet18	416	simple	8.04252	52.9392	25.2M
resnet18	320	simple	13.0232	94.5782	25.2M
resnet18	512	pan	4.88634	30.1866	33.8M
resnet18	416	pan	6.8977	46.9993	33.8M
resnet18	320	pan	10.9281	82.4277	33.8M

\	darknet53s
mAP	44.8%