YOLOv3 Step by Step

In this document, we provide a step by step example on how to utilize our tools to compile and test with a newly downloaded YOLOv3 model.

This document is written for toolchain v0.30.0. If any description is not consistent with the latest toolchain, please refer to the main toolchain manual.

Tricks for deploying yolo-type detection models and anker based detection models

set quantization config (for 730 only)
1. input 8bit
2. const input 16bit
3. output 8bit
4. weight mixlight or 8bit
5. data mixlight or mixbalance
Model structure
1. anker based detection model has the following outputs
  1. class scores with shape 1x class # x pixel # at scale 0, 1x class # x pixel # at scale 1, ..., 1x class # x pixel # at scale S.
  2. bbox coordinates with shape 1x4xpixel # at scale 0, 1x4xpixel # at scale 1, ...,1x4xpixel # at scale S
  3. Trick: Do NOT concat class scores at different scales. Output class scores for each scale separately.
  4. Trick: Do NOT concat class score and coordinates at the same scale. Output class scores and bbox coordinates separately.
  5. Trick: Do NOT concat bbox coordinates at different scales. Output bbox coordinates for each scale separately.
  6. Trick: Typically, class scores need to pass activation functions such as exp, sigmoid or even softmax. Make sure these activation functions are in the model so that quantiztion algorithm can optimize the quantizaiton setting accordingly.
  7. Trick: sometimes, bbox coordinates need to pass exp function or other activation function. Make sure these activation functions are in the model so that quantiztion algorithm can optimize the quantizaiton setting accordingly.
  8. Trick: Do NOT concat some outputs and then split in the model. Make sure the computation of all these outputs are separate. If these computation needs to use the same op, the quantization algorithm can detect this situation and share the weights of the same op.

Step 0: Prepare environment and data

We need to download the latest toolchain docker image which contains all the tools we need.

docker pull kneron/toolchain:latest

Start the docker with a local folder mounted into the docker.

docker run --rm -it -v /your/folder/path/for/docker_mount:/data1 kneron/toolchain:latest

Go to our mounted folder and download a public keras based YOLOv3 model from Github https://github.com/qqwweee/keras-yolo3

cd /data1 && git clone https://github.com/qqwweee/keras-yolo3.git keras_yolo3

Switch to the base conda environment.

conda activate base

Follow the model's document to save the pretrained model as an h5 file:

cd keras_yolo3
wget https://data.pjreddie.com/files/yolov3.weights
python convert.py yolov3-tiny.cfg yolov3-tiny.weights /data1/yolo.h5

We now have yolo.h5 under our mounted folder /data1.

We also need to prepare some images under the mounted folder. We have provided some example input images at http://doc.kneron.com/docs/toolchain/res/test_image10.zip.

Here is how you can get it:

cd /data1
wget http://doc.kneron.com/docs/toolchain/res/test_image10.zip --no-check-certificate
unzip test_image10.zip

Now we have images in folder test_image10/ at /data1; these are needed for quantization.

We also need some extra images for accuracy testing. But considering the complexity of document, we use only one image in toolchain docker for testing.

cd /data1
cp /workspace/E2E_Simulator/app/test_image_folder/yolo/000000350003.jpg ./.

Now we have image 000000350003.jpg at /data1 for testing.

wget https://doc.kneron.com/docs/toolchain/res/yolo.opt.onnx --no-check-certificate

Now we provide a ready-made 'keras to onnx' model for testing in the following steps

Step 1: Import KTC and required lib in python shell

Now, we go through all toolchain flow by KTC (Kneron Toolchain) using the Python API in the Python shell.

Run "python" or 'ipython'to open to Python shell:

Figure 1. python shell

Import KTC and other necessary modules

import ktc
import numpy as np
import os
import onnx
from PIL import Image

Step 2: IP Evaluation

To make sure the onnx model is as expected, we should check the onnx model's performance and see if there are any unsupported operators (or CPU nodes).

# npu (only) performance simulation
m = onnx.load('/data1/yolo.opt.onnx')
km = ktc.ModelConfig(33, "0001", "720", onnx_model=m)
eval_result = km.evaluate()
print("\nNpu performance evaluation result:\n" + str(eval_result))

The estimated FPS (NPU only) report on your terminal should look similar to this:

    ***** Warning: this model has 1 CPU ops which may cause that the report's fps is different from the actual fps *****
    ***** Warning: CPU ops types: KneronResize.

    [Evaluation Result]
    estimate FPS float = 22.5861
    total time = 44.2751 ms
    total theoretical covolution time = 16.7271 ms
    average DRAM bandwidth = 0.279219 GB/s
    MAC efficiency to total time = 37.7799 %
    MAC idle time = 3.85105 ms
    MAC running time = 40.424 ms

There are two things to take note of in this report:

Found one CPU node 'KneronResize' in our model The estimated FPS is 22.5861, the report is for NPU only

At the same time, a folder called compiler will be generated in your docker mounted folder (/data1); the evaluation result will be found in this folder. One important thing is to check the 'ioinfo.csv' in /data1/compiler, which looks like this:

    i,0,input_1_o0,3,416,416
    c,0,up_sampling2d_1_o0_kn,128,26,26
    o,0,conv2d_10_o0,255,13,13
    o,1,conv2d_13_o0,255,26,26

This file gives information about the special nodes in the ONNX. Each line shows the information of each node, and the first element shows the type of the special node.

type explanation:

i: input node

o: output node

c: cpu node

We can see, under KL720, one CPU node called up_sampling2d_1_o0_kn1 in our ONNX model.

Step 3: Check ONNX model and preprocess and postprocess are good

If we can get correct detection result from the ONNX and provided preprocess and postprocess functions, everything should be correct.

First, we need to check the preprocess and postprocess methods. Here is the relevant code. We need to move under the keras_yolo3 before we start in order to import the preprocess and postprocess functions.

The following is the extracted preprocess:

from yolo3.utils import letterbox_image

def preprocess(pil_img):
    model_input_size = (416, 416)  # to match our model input size when converting
    boxed_image = letterbox_image(pil_img, model_input_size)
    np_data = np.array(boxed_image, dtype='float32')

    np_data /= 255.
    # Insert batch dimension and transpose to match model's input.
    np_data = np.expand_dims(np_data, 0)
    np_data = np.transpose(np_data, (0, 3, 1, 2))
    return np_data

This is the extracted postprocess:

import tensorflow as tf
import pathlib
import sys
sys.path.append(str(pathlib.Path("keras_yolo3").resolve()))
from yolo3.model import yolo_eval

def postprocess(inf_results, ori_image_shape):
    tensor_data = [tf.convert_to_tensor(data, dtype=tf.float32) for data in inf_results]
    tensor_data = [tf.transpose(data, perm=[0, 2, 3, 1]) for data in tensor_data]   # expects bhwc data

    # get anchor info
    anchors_path = "/data1/keras_yolo3/model_data/tiny_yolo_anchors.txt"
    with open(anchors_path) as f:
        anchors = f.readline()
    anchors = [float(x) for x in anchors.split(',')]
    anchors = np.array(anchors).reshape(-1, 2)

    # post process
    num_classes = 80
    boxes, scores, classes = yolo_eval(tensor_data, anchors, num_classes, ori_image_shape)
    with tf.Session() as sess:
        boxes = boxes.eval()
        scores = scores.eval()
        classes = classes.eval()

    return boxes, scores, classes

Now, we can check the ONNX inference result with api 'ktc.kneron_inference'.

## onnx model check

input_image = Image.open('/data1/000000350003.jpg')

# resize and normalize input data
in_data = preprocess(input_image)

# onnx inference
out_data = ktc.kneron_inference([in_data], onnx_file="/data1/yolo.opt.onnx", input_names=["input_1_o0"])

# onnx output data processing
det_res = postprocess(out_data, [input_image.size[1], input_image.size[0]])

print(det_res)

The result will be displayed on your terminal like this:

(array([[258.8878 , 470.29474, 297.01447, 524.3069 ],
       [233.62653, 218.19923, 306.79245, 381.78162]], dtype=float32), array([0.9248918, 0.786504 ], dtype=float32), array([2, 7], dtype=int32))

This result looks good.

Note that we only use one image as example. Using more data to check accuracy is a good idea.

Step 4: Quantization

Let us use the same preprocess on our quantization data and put it in a list:

# load and normalize all image data from folder
img_list = []
for (dir_path, _, file_names) in os.walk("/data1/test_image10"):
    for f_n in file_names:
        fullpath = os.path.join(dir_path, f_n)
        print("processing image: " + fullpath)

        image = Image.open(fullpath)
        img_data = preprocess(image)
        img_list.append(img_data)

Then, perform quantization. The BIE model will be generated at /data1/output.bie.

# fix point analysis
bie_model_path = km.analysis({"input_1_o0": img_list})
print("\nFix point analysis done. Save bie model to '" + str(bie_model_path) + "'")

Step 5: Check if BIE model accuracy is good enough

After quantization, the slight drop in model accuracy is expected. We should check if this accuracy is good enough to use.

Toolchain API ktc.kneron_inference can help us to check. The usage of 'ktc.kneron_inference' is similar to Step 4, but there are several differences:

The 2nd parameter is changed from onnx_file to bie_file.

## bie model check
input_image = Image.open('/data1/000000350003.jpg')

# resize and normalize input data
in_data = preprocess(input_image)

# bie inference
out_data = ktc.kneron_inference([in_data], bie_file=bie_model_path, input_names=["input_1_o0"], platform=720)

# bie output data processing
det_res = postprocess(out_data, [input_image.size[1], input_image.size[0]])
print(det_res)

The result will be displayed on your terminal like this:

(array([[258.51468, 467.71683, 293.07394, 529.15967]], dtype=float32), array([0.8253723], dtype=float32), array([2], dtype=int32))

This is slightly different from the result in Step 3: we lost one bounding box after quantization. Note that this loss is acceptable after quantization.

If you are running the example using 720 as the hardware platform, there might be one extra bounding box. This is normal.

Step 6: Compile

The final step is compile the BIE model into an NEF model.

# compile
nef_model_path = ktc.compile([km])
print("\nCompile done. Save Nef file to '" + str(nef_model_path) + "'")

You can find the NEF file under /data1/batch_compile/models_720.nef. models_720.nef is the final compiled model.

(optional) Step 7. Check NEF model

Toolchain api ktc.inference does support NEF model inference. The usage of ktc.kneron_inference is similar to the steps in Step 4 and Step 6, with minor differences.

The 2nd parameter is changed from to nef_model.

# nef model check
input_image = Image.open('/data1/000000350003.jpg')

# resize and normalize input data
in_data = preprocess(input_image)

# nef inference
out_data = ktc.kneron_inference([in_data], nef_file=nef_model_path, input_names=["input_1_o0"], platform=720)

# nef output data processing
det_res = postprocess(out_data, [input_image.size[1], input_image.size[0]])
print(det_res)

The result will be displayed on your terminal like this:

(array([[258.51468, 467.71683, 293.07394, 529.15967]], dtype=float32), array([0.8253723], dtype=float32), array([2], dtype=int32))

Note: the NEF model results should be exactly the same as the BIE model results.

Step 8. Prepare Kneron PLUS (Don't do it in toolchain docker)

To run NEF on KL720, we need help from Kneron PLUS:

Connect KL720 USB dongle to your computer
Follow the instruction in document(Kneron PLUS) to setup the environment (Note: python usage document is at "kneron_plus/python/README.md" in Kneron PLUS folder)

Step 9. Run our yolo NEF on KL720 with Kneron PLUS

We leverage the provided the example code in Kneron PLUS to run our YOLO NEF.

Replace kneron_plus/res/models/KL720/tiny_yolo_v3/models_720.nef with our YOLO NEF.
Modify kneron_plus/python/example/KL720DemoGenericInferencePostYolo.py line 20. Change input image from "bike_cars_street_224x224.bmp" to "bike_cars_street_416x416.bmp"

Figure 2. modify input image in example

Modify line 105. change normaization method in preprocess config from "Kneron" mode to "Yolo" mode

Figure 3. modify normalization method in example

Run example KL720DemoGenericInferencePostYolo.py

    cd kneron_plus/python/example
    python KL720DemoGenericInferencePostYolo.py

Then, you should see the YOLO NEF detection result is saved to "./output_bike_cars_street_416x416.bmp" :

Figure 4. detection result

Appendix

The whole model conversion process from ONNX to NEF (Steps 1-6) can be combined into one Python script:

import ktc
import os
import onnx
from PIL import Image
import numpy as np

###  post process function  ###
import tensorflow as tf
import pathlib
import sys
sys.path.append(str(pathlib.Path("keras_yolo3").resolve()))
from yolo3.model import yolo_eval

def postprocess(inf_results, ori_image_shape):
    tensor_data = [tf.convert_to_tensor(data, dtype=tf.float32) for data in inf_results]
    tensor_data = [tf.transpose(data, perm=[0, 2, 3, 1]) for data in tensor_data]   # expects bhwc data

    # get anchor info
    anchors_path = "/data1/keras_yolo3/model_data/tiny_yolo_anchors.txt"
    with open(anchors_path) as f:
        anchors = f.readline()
    anchors = [float(x) for x in anchors.split(',')]
    anchors = np.array(anchors).reshape(-1, 2)

    # post process
    num_classes = 80
    boxes, scores, classes = yolo_eval(tensor_data, anchors, num_classes, ori_image_shape)
    with tf.Session() as sess:
        boxes = boxes.eval()
        scores = scores.eval()
        classes = classes.eval()

    return boxes, scores, classes

###  pre process function  ###
from yolo3.utils import letterbox_image

def preprocess(pil_img):
    model_input_size = (416, 416)  # to match our model input size when converting
    boxed_image = letterbox_image(pil_img, model_input_size)
    np_data = np.array(boxed_image, dtype='float32')

    np_data /= 255.
    # Insert batch dimension and transpose to match model's input.
    np_data = np.expand_dims(np_data, 0)
    np_data = np.transpose(np_data, (0, 3, 1, 2))
    return np_data


# # convert h5 model to onnx
# m = ktc.onnx_optimizer.keras2onnx_flow("/data1/yolo.h5", input_shape = [1,416,416,3])
# m = ktc.onnx_optimizer.onnx2onnx_flow(m)
# onnx.save(m,'yolo.opt.onnx')


# setup ktc config
m = onnx.load('/data1/yolo.opt.onnx')
km = ktc.ModelConfig(33, "0001", "720", onnx_model=m)

# npu(only) performance simulation
eval_result = km.evaluate()
print("\nNpu performance evaluation result:\n" + str(eval_result))


## onnx model check
input_image = Image.open('/data1/000000350003.jpg')
in_data = preprocess(input_image)
out_data = ktc.kneron_inference([in_data], onnx_file="/data1/yolo.opt.onnx", input_names=["input_1_o0"])
det_res = postprocess(out_data, [input_image.size[1], input_image.size[0]])
print(det_res)

# load and normalize all image data from folder
img_list = []
for (dir_path, _, file_names) in os.walk("/data1/test_image10"):
    for f_n in file_names:
        fullpath = os.path.join(dir_path, f_n)
        print("processing image: " + fullpath)

        image = Image.open(fullpath)
        img_data = preprocess(image)
        img_list.append(img_data)


# fix point analysis
bie_model_path = km.analysis({"input_1_o0": img_list})
print("\nFix point analysis done. Save bie model to '" + str(bie_model_path) + "'")


# bie model check
input_image = Image.open('/data1/000000350003.jpg')
in_data = preprocess(input_image)
out_data = ktc.kneron_inference([in_data], bie_file=bie_model_path, input_names=["input_1_o0"], platform=720)
det_res = postprocess(out_data, [input_image.size[1], input_image.size[0]])
print(det_res)


# compile
nef_model_path = ktc.compile([km])
print("\nCompile done. Save Nef file to '" + str(nef_model_path) + "'")

# nef model check
input_image = Image.open('/data1/000000350003.jpg')
in_data = preprocess(input_image)
out_data = ktc.kneron_inference([in_data], nef_file=nef_model_path, input_names=["input_1_o0"],  platform=720)
det_res = postprocess(out_data, [input_image.size[1], input_image.size[0]])
print(det_res)