Object Detection
Object Detection task with fcos model.
This document contains the explanations of arguments of each script.
You can find the tutorial for finetuning a pretrained model on custom dataset under the tutorial
folder, tutorial/README.md
.
The ipython notebook tutorial is also prepared under the tutorial
folder as tutorial/tutorial.ipynb
. You may upload and run this ipython notebook on Google colab.
Prerequisites
- Python = 3.6 or 3.7
Installation
To install the dependencies, run
$ pip install -U pip
$ pip install -r requirements.txt
$ python setup.py build_ext --inplace
Dataset & Preparation
Standard Datasets
Our traning script accepts standard PASCAL VOC dataset and MS COCO dataset. You may download the dataset using the following link:
- Download 2012 PASCAL VOC Dataset
- Download 2017 MS COCO Dataset
Custom Datasets
You can also train the model on a custom dataset. The custom dataset is expected to follow the YOLO format. You may visit yolov5 document for more details.
Annotation Tools
You can use makesense.ai to create bounding boxes and labels for your images. For more details, you may visit makesense.ai and check their documents. An example of using makesense.ai to annotate custom data is also provided in the tutorial document.
dataset.yaml
For COCO dataset, you need to prepare the yaml file and save it under ./data/coco.yaml
. The yaml file is expected to have the following format:
data_root: path to coco dataset dirtory
# type of dataset
dataset_type: coco
val_set_name: val2017
train_set_name: train2017
train_annotations_path: path to coco training annotations path
val_annotations_path: path to coco training validation path
For Pascal VOC dataset, you need to prepare the yaml file and save it under ./data/pascal.yaml
. The yaml file is expected to have the following format:
data_root: path_to_voc_dataset/VOCdevkit/VOC2012
train: 'trainval'
val: 'val'
# type of dataset
dataset_type: pascal
For custom dataset, you need to prepare the yaml file and save it under ./data/
. The yaml file is expected to have the following format (same as yolov5):
train: path to training dataset directory
val: path to validation dataset directory
nc: number of class
names: list of class names
Train
All outputs (log files and checkpoints) will be saved to the snapshot directory,
which is specified by --snapshot-path
. For training, execute the following command in fcos
directory:
python train.py --backbone backbone_model_name --snapshot path_to_pretrained_model --freeze-backbone --batch-size 4 --gpu 0 --data path_to_data_yaml_file
--backbone
Which backbone model to use.
--snapshot
The path to pretrained model
--freeze-backbone
Whether freeze the backbone when the pretrained model is used (True/False)
--gpu
Which gpu to run. (-1 if cpu)
--batch-size
Batch size. (Default: 4)
--epochs
Number of epochs to train. (Default: 100)
--steps
Number of steps per epoch. (Default: 5000)
--lr
Learning rate. (Default: 1e-4)
--fpn
The type of fpn model. Options: bifpn, dla, fpn, pan, simple (Default: simple) (Recommend: simple or pan)
--reg-func
The type of regression function. Options: exp, simple (Default: simple)
--stage
The num of stages. Options: 3, 5 (Default: 3)
--head-type
The type of head. Options: ori, simple (Default: simple)
--centerness-pos
Centerness branch position. Options: cls, reg (Default: reg)
--snapshot-path
Path to store snapshots of models during training (Default: 'snapshots/{}'.format(today))
--input-size
Input size of the model (Default: (512, 512))
--data
The path to data yaml file
When the validation mAP stops increasing for 5 epochs, the early stopping will be triggered and the training process will be terminated.
Inference
For model infernce on a single image:
python inference.py --snapshot path_to_pretrained_model --input-shape model_input_size --gpu 0 --class-id-path path_to_class_id_mapping_file --img-path path_to_image --save-path path_to_saved_image
--snapshot
the path to pretrained model
--gpu
which gpu to run. (-1 if cpu) (Default: -1)
--input-shape
Input shape of the model (Default: (512, 512))
--class-id-path
Path to the class id mapping file.
--img-path
Path to the image.
--save-path
Path to draw and save the image with bbox.
--save-preds-path
Path to save the inference bbox results.
--class-id-path
Path to the class id mapping file. (Default: COCO class id mapping)
--max-objects
The maximum number of objects in the image. (Default: 100)
--score-thres
The score threshold of bounding boxes. (Default: 0.6)
--iou-thres
the iou threshold for NMS. (Default: 0.5)
--max-objects
Whether use Non-maximum Suppression (Default: 1)
You could find preprocessing and postprocessing processes in fcos/utils/fcos_det_preprocess.py
and fcos/utils/fcos_det_postprocess.py
.
Convert to ONNX
Pull the latest ONNX converter from github. You may read the latest document from Github for converting ONNX model. Execute commands in the folder ONNX_Convertor/keras-onnx
:
python generated_onnx.py -o outputfile.onnx inputfile.h5
Evaluation
Evaluation Metric
We will use mean Average Precision (mAP) for evaluation. You can find the script for computing mAP in utils/eval.py
.
mAP
: mAP is the average of Average Precision (AP). AP summarizes a precision-recall curve as the weighted mean of precisions achieved at each threshold, with the increase in recall from the previous threshold used as the weight:
where and are the precision and recall at the nth threshold. The mAP compares the ground-truth bounding box to the detected box and returns a score. The higher the score, the more accurate the model is in its detections.
Evaluation on a Dataset
For evaluating the trained model on dataset:
python utils/eval.py --snapshot path_to_pretrained_model --gpu 0 --input-shape model_input_size --data path_to_data_yaml_file
--snapshot
Path to pretrained model
--gpu
Which gpu to run. (-1 if cpu) (Default: -1)
--input-shape
Input shape of the model (Default: (512, 512))
--class-id-path
Path to the class id mapping file.
--data
The path to data yaml file
End-to-End Evaluation
If you would like to perform an end-to-end test with an image dataset, you can use inference_e2e.py
under the directory fcos
to obtain the prediction results.
You have to prepare an initial parameter yaml file for the inference runner. You may check utils/init_params.json
for the format.
python inference_e2e.py --img-path path_to_dataset_folder --params path_to_init_params_file --save-path path_to_save_json_file
--img-path
Path to the dataset directory
--params
Path to initial parameter yaml file for the inference runner
--save-path
Path to save the prediction to a json file
--gpu
GPU id (-1 if cpu) (Default: -1)
The predictions will be saved into a json file that has the following structure:
[
{'img_path':image_path_1
'bbox': [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id]]
},
{'img_path':image_path_2
'bbox': [[l,t,w,h,score,class_id], [l,t,w,h,score,class_id]]
},
...
]
Models
Backbone | Input Size | FPN Type | FPS on 520 | FPS on 720 | Model Size |
---|---|---|---|---|---|
darknet53s | 512 | simple | 5.96303 | 36.6844 | 25.3M |
darknet53s | 416 | pan | 7.27369 | 48.8437 | 33.9M |
darknet53ss | 416 | simple | 20.6361 | 136.093 | 6.9M |
darknet53ss | 320 | simple | 33.9502 | 252.713 | 6.9M |
resnet18 | 512 | simple | 5.75156 | 33.9144 | 25.2M |
resnet18 | 416 | simple | 8.04252 | 52.9392 | 25.2M |
resnet18 | 320 | simple | 13.0232 | 94.5782 | 25.2M |
resnet18 | 512 | pan | 4.88634 | 30.1866 | 33.8M |
resnet18 | 416 | pan | 6.8977 | 46.9993 | 33.8M |
resnet18 | 320 | pan | 10.9281 | 82.4277 | 33.8M |
\ | darknet53s |
---|---|
mAP | 44.8% |