Object Detection

Object Detection task with fcos model.

This document contains the explanations of arguments of each script.

You can find the tutorial for finetuning a pretrained model on custom dataset under the tutorial folder, tutorial/README.md.

The ipython notebook tutorial is also prepared under the tutorial folder as tutorial/tutorial.ipynb. You may upload and run this ipython notebook on Google colab.



To install the dependencies, run

$ pip install -U pip
$ pip install -r requirements.txt
$ python setup.py build_ext --inplace

Dataset & Preparation

Standard Datasets

Our traning script accepts standard PASCAL VOC dataset and MS COCO dataset. You may download the dataset using the following link:

Custom Datasets

You can also train the model on a custom dataset. The custom dataset is expected to follow the YOLO format. You may visit yolov5 document for more details.

Annotation Tools

You can use makesense.ai to create bounding boxes and labels for your images. For more details, you may visit makesense.ai and check their documents. An example of using makesense.ai to annotate custom data is also provided in the tutorial document.


For COCO dataset, you need to prepare the yaml file and save it under ./data/coco.yaml. The yaml file is expected to have the following format:

data_root: path to coco dataset dirtory 

# type of dataset
dataset_type: coco

val_set_name: val2017
train_set_name: train2017
train_annotations_path: path to coco training annotations path   
val_annotations_path: path to coco training validation path

For Pascal VOC dataset, you need to prepare the yaml file and save it under ./data/pascal.yaml. The yaml file is expected to have the following format:

data_root: path_to_voc_dataset/VOCdevkit/VOC2012
train: 'trainval'
val: 'val'

# type of dataset
dataset_type: pascal

For custom dataset, you need to prepare the yaml file and save it under ./data/. The yaml file is expected to have the following format (same as yolov5):

train: path to training dataset directory 
val: path to validation dataset directory   

nc: number of class

names: list of class names


All outputs (log files and checkpoints) will be saved to the snapshot directory, which is specified by --snapshot-path. For training, execute the following command in fcos directory:

python train.py --backbone backbone_model_name --snapshot path_to_pretrained_model --freeze-backbone --batch-size 4 --gpu 0 --data path_to_data_yaml_file 

--backbone Which backbone model to use.

--snapshot The path to pretrained model

--freeze-backbone Whether freeze the backbone when the pretrained model is used (True/False)

--gpu Which gpu to run. (-1 if cpu)

--batch-size Batch size. (Default: 4)

--epochs Number of epochs to train. (Default: 100)

--steps Number of steps per epoch. (Default: 5000)

--lr Learning rate. (Default: 1e-4)

--fpn The type of fpn model. Options: bifpn, dla, fpn, pan, simple (Default: simple) (Recommend: simple or pan)

--reg-func The type of regression function. Options: exp, simple (Default: simple)

--stage The num of stages. Options: 3, 5 (Default: 3)

--head-type The type of head. Options: ori, simple (Default: simple)

--centerness-pos Centerness branch position. Options: cls, reg (Default: reg)

--snapshot-path Path to store snapshots of models during training (Default: 'snapshots/{}'.format(today))

--input-size Input size of the model (Default: (512, 512))

--data The path to data yaml file

When the validation mAP stops increasing for 5 epochs, the early stopping will be triggered and the training process will be terminated.


For model infernce on a single image:

python inference.py --snapshot path_to_pretrained_model --input-shape model_input_size --gpu 0  --class-id-path path_to_class_id_mapping_file --img-path path_to_image --save-path path_to_saved_image

--snapshot the path to pretrained model

--gpu which gpu to run. (-1 if cpu) (Default: -1)

--input-shape Input shape of the model (Default: (512, 512))

--class-id-path Path to the class id mapping file.

--img-path Path to the image.

--save-path Path to draw and save the image with bbox.

--save-preds-path Path to save the inference bbox results.

--class-id-path Path to the class id mapping file. (Default: COCO class id mapping)

--max-objects The maximum number of objects in the image. (Default: 100)

--score-thres The score threshold of bounding boxes. (Default: 0.6)

--iou-thres the iou threshold for NMS. (Default: 0.5)

--max-objects Whether use Non-maximum Suppression (Default: 1)

You could find preprocessing and postprocessing processes in fcos/utils/fcos_det_preprocess.py and fcos/utils/fcos_det_postprocess.py.

Convert to ONNX

Pull the latest ONNX converter from github. You may read the latest document from Github for converting ONNX model. Execute commands in the folder ONNX_Convertor/keras-onnx:

python generated_onnx.py -o outputfile.onnx inputfile.h5


Evaluation Metric

We will use mean Average Precision (mAP) for evaluation. You can find the script for computing mAP in utils/eval.py.

mAP: mAP is the average of Average Precision (AP). AP summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight:

where and are the precision and recall at the nth threshold. The mAP compares the ground-truth bounding box to the detected box and returns a score. The higher the score, the more accurate the model is in its detections.

Evaluation on a Dataset

For evaluating the trained model on dataset:

python utils/eval.py --snapshot path_to_pretrained_model --gpu 0 --input-shape model_input_size --data path_to_data_yaml_file

--snapshot Path to pretrained model

--gpu Which gpu to run. (-1 if cpu) (Default: -1)

--input-shape Input shape of the model (Default: (512, 512))

--class-id-path Path to the class id mapping file.

--data The path to data yaml file

End-to-End Evaluation

If you would like to perform an end-to-end test with an image dataset, you can use inference_e2e.py under the directory fcos to obtain the prediction results. You have to prepare an initial parameter yaml file for the inference runner. You may check utils/init_params.json for the format.

python inference_e2e.py --img-path path_to_dataset_folder --params path_to_init_params_file --save-path path_to_save_json_file
--img-path Path to the dataset directory

--params Path to initial parameter yaml file for the inference runner

--save-path Path to save the prediction to a json file

--gpu GPU id (-1 if cpu) (Default: -1)

The predictions will be saved into a json file that has the following structure:

    'bbox': [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id]]
    'bbox': [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id]]


Backbone Input Size FPN Type FPS on 520 FPS on 720 Model Size
darknet53s 512 simple 5.96303 36.6844 25.3M
darknet53s 416 pan 7.27369 48.8437 33.9M
darknet53ss 416 simple 20.6361 136.093 6.9M
darknet53ss 320 simple 33.9502 252.713 6.9M
resnet18 512 simple 5.75156 33.9144 25.2M
resnet18 416 simple 8.04252 52.9392 25.2M
resnet18 320 simple 13.0232 94.5782 25.2M
resnet18 512 pan 4.88634 30.1866 33.8M
resnet18 416 pan 6.8977 46.9993 33.8M
resnet18 320 pan 10.9281 82.4277 33.8M
\ darknet53s
mAP 44.8%