1. Introduction

 

In this document, we will introduce the KDP2 architecture comprising new software(SW) and firmware(FW) design for KL520.

In comparison with the previous SW/FW architecture, this aims to simplify the design flow for AI applications development.

When referring to a complete AI application development, actually three parts are involved:

 

Below diagram depicts three parts of development in a big picture.

 

This Getting Started document only focuses on host software usage with the AI dongle to perform following functionality.

 

For model development, please refer to the Toolchain Docker part.

For firmware development, it is more advanced, please contact Kneron.

 

Below gives some definitions regarding the KDP2 host_lib:

 

host_lib (after v0.9.0) contains KDP2 software code and examples, prebuilt KDP2 firmware and some demonstrative NEF files; and with the AI dongle it can give some edge AI examples.

Here we will demonstrate below diagram scenario with followings:

   

2. Build host_lib

 

*Note: here we use Linux-based OS Ubuntu.

Download the latest host_lib_vXXX.zip (after v0.9.0) into Ubuntu from https://www.kneron.com/tw/support/developers/, it is located at KNEON Stem(USB Dongle)/host_lib/.

Before building code, some build tools and packages must be set up for the first time.

 

Install CMake, libusb-1.0.0-dev and build-essential.

$ sudo apt install cmake
$ sudo apt install libusb-1.0-0-dev
$ sudo apt install build-essential

 

Decompress the host_lib_v0.9.0.zip. (or after v0.9.0)

$ unzip host_lib_v0.9.0.zip

 

Build code.

$ cd host_lib/
$ mkdir build ; cd build/
$ cmake ..
$ make -j

 

Once build is done, the libkdp2.so will be in build/kdp2/src/

Example executables will be in build/bin_kdp2/

Check if KDP2 examples are built successuflly.

$ ls bin_kdp2/

ex_kdp2_connect_test
ex_kdp2_generic_inference_raw
ex_kdp2_update_firmware
ex_kdp2_scan_devices
...

You should see some executables named in prefix 'kdp2_'.

   

3. Update firmware

 

Before running KDP2 examples, users need to make the AI dongle running with the KDP2 firmware.

Plug the AI dongle in the Ubuntu host and confirm it with the 'lsusb' command.

$ lsusb
...
Bus 001 Device 002: ID 3231:0100
...

If not seeing "3231:0100" (represents KL520 device), it may have a connectivity problem or some device failures.

 

Kneron provides different firmware for different applications on KL520.

It needs the KDP2 firmware to work with the host_lib KDP2 API; the firmware binaries are located at app_binaries/KL520/kdp2/.

 

Update the AI donge with the KDP2 firmware by the 'ex_kdp2_update_firmware'.

For multiple dongles, users can specify which device to be updated by the '-s' parameter.

$ cd bin_kdp2/ 
$ sudo ./ex_kdp2_update_firmware 

-h : help
-s : [device index] = '1'
-p : [firmware folder path] = '../../app_binaries/KL520/kdp2/'

start to update SCPU and NCPU firmwares to the device with port_path = 1-1

update SCPU firmware from file ../../app_binaries/KL520/kdp2//fw_scpu.bin
update SCPU firmware OK

update NCPU firmware from file ../../app_binaries/KL520/kdp2//fw_ncpu.bin
update NCPU firmware OK

   

4. Connect device tests

 

While one or multiple AI dongles are plugged into the host, they can be scanned to get some basic device information.

$ sudo ./ex_kdp2_scan_devices 

scanning kneron devices ...
number of Kneron devices found: 2

listing devices infomation as follows:

[1] scan_index: '1'
[1] product_id: '0x100' (KL520)
[1] USB link speed: 'High-Speed'
[1] USB port path: '1-2'
[1] kn_number: '0xC506203C' 
[1] Connectable: 'True'

[2] scan_index: '2'
[2] product_id: '0x100' (KL520)
[2] USB link speed: 'High-Speed'
[2] USB port path: '1-1'
[2] kn_number: '0xCC0A282C' 
[2] Connectable: 'True'

Above shows that it founds two KL520 devices, a brief descript listed below.

 

The 'ex_kdp2_connect_test' can be used to test device connection, and it needs parameters for selecing the way to connect a Kneron device.

Normally we use the scan index.

$ sudo ./ex_kdp2_connect_test

usage:
./ex_kdp2_connect_test -s [device scan index] (can be known by kdp2_scan_devices())
./ex_kdp2_connect_test -p [product_name] (KL520, KL720 or after)
./ex_kdp2_connect_test -k [kn_number] (ex: 0x7F1A0000)

 

For example, use '-s 1' to connect the first scanned device.

It device is connectable, it will make a connection to it and then pause until pressing 'enter'.

$ sudo ./ex_kdp2_connect_test -s 1

connecting to device by scan_index = 1 ...

[1] scan_index: '1'
[1] product_id: '0x100' (KL520)
[1] USB link speed: 'High-Speed'
[1] USB port path: '1-2'
[1] kn_number: '0xC506203C' 

>> press 'enter' to disconnect the device ...

disconnecting device ...

   

5. Run inference examples

 

The KDP2 host_lib provides two categories of API set for model inference.

One is the 'APP inference' category, providing some decent functions for specific applications with specified NEF models and it is designed to be used in a easy way.

Another is the 'Generic inference' category which is intended for advanced users who are interested in developing their models and implement corresponding post-processing code.

Below will demonstrate both usage in two examples.

 

App inference example

 

The 'ex_kdp2_tiny_yolo_v3' example utilizes the APP inference API and the Tiny Yolo V3 model to perform object detection.

When no parameters, it takes input_models/KL520/tiny_yolo_v3/models_520.nef as the inference model, input_images/street_cars_bike.bmp as the input image in BMP format and repeats 20 loops to calculate performance then prints inference results.

$ sudo ./ex_kdp2_tiny_yolo_v3 

-h : help
-s : [device index] = '1'
-i : [image file path] (.bmp 24bits) = '../../input_images/street_cars_bike.bmp'
-l : [test loops] = '20'


image resolution 752x504, format RGB565
(device will convert it to 224x224, RGBA8888, only support downscaling)

starting doing inference, loop = 20
....................

total inference 20 images
time spent: 1.96 secs, FPS = 10.2

box count : 4
Box 0 (x1, y1, x2, y2, score, class) = 298.0, 161.0, 422.0, 392.0, 0.904121, 0
Box 1 (x1, y1, x2, y2, score, class) = 11.0, 171.0, 57.0, 299.0, 0.256544, 0
Box 2 (x1, y1, x2, y2, score, class) = 447.0, 176.0, 719.0, 370.0, 0.987464, 2
Box 3 (x1, y1, x2, y2, score, class) = 39.0, 178.0, 311.0, 372.0, 0.976890, 2

drew bounding box(es) on './ex_kdp2_tiny_yolo_v3.bmp'

disconnecting device ...

 

Besides output results in the screen console, it also draws detected objects in a new-created ex_kdp2_tiny_yolo_v3.bmp.

The key features of APP inference are listed below: - Specified model NEF. - Normally post-process is done in SoC. - Simplfied function parameters.

 

Generic inference example

 

Generic inference API is intended for users who have their own models and applications.

It needs more complex input parameters and normally the post-process is implemented by users in host side.

 

The 'ex_kdp2_generic_inference_raw' is an example for showing how it work.

By default, it runs with a Tiny Yolo v3 model NEF and takes an BMP image as input and does post-process in host side.

Below shortly explains the key input parameters:


$ sudo ./ex_kdp2_generic_inference_raw 

-h : help
-s : [device index] = '1'
-m : [model file path] (.nef) = '../../input_models/KL520/tiny_yolo_v3/models_520.nef'
-i : [image file path] (.bmp 24bits) = '../../input_images/one_bike_many_cars_224x224.bmp'
-d : [inference model ID] = '19'
-c : [image format] (RGB565, RGBA8888) = 'RGB565'
-n : [normalize mode] ('none', '0_1', '-0.5_0.5', '-1_1') = '-0.5_0.5'
-p : [post process] ('bypass', 'yolo_v3') = 'yolo_v3'
-l : [test loops] = '20'


connect device ... OK
load model(s) ... OK
time spent: 0.57 secs

this NEF contains 1 model(s):
[1] model ID = 19
[1] model raw input width = 224
[1] model raw input height = 224
[1] model input channel = 3
[1] model raw image format = RGBA8888
[1] model raw output size = 430224

target inference model ID : 19
allocate memory 430224 bytes for RAW output
image resolution 224x224, format RGB565
(device will convert it to 224x224, RGBA8888, only support downscaling)

starting doing inference, loop = 20
....................

total inference 20 images
time spent: 0.71 secs, FPS = 28.2

================== [Post Process] 'yolo_v3' =========================

class count : 80
box count : 6
Box 0 (x1, y1, x2, y2, score, class) = 104.0, 78.0, 141.0, 217.0, 0.949131, 0
Box 1 (x1, y1, x2, y2, score, class) = 106.0, 145.0, 137.0, 214.0, 0.253075, 1
Box 2 (x1, y1, x2, y2, score, class) = 35.0, 49.0, 131.0, 118.0, 0.989020, 2
Box 3 (x1, y1, x2, y2, score, class) = 199.0, 48.0, 221.0, 131.0, 0.916225, 2
Box 4 (x1, y1, x2, y2, score, class) = 129.0, 46.0, 197.0, 95.0, 0.887819, 2
Box 5 (x1, y1, x2, y2, score, class) = 12.0, 30.0, 64.0, 79.0, 0.618796, 2

drew bounding box(es) on './ex_kdp2_generic_inference_raw.bmp'

===========================================================================

disconnecting device ...

 

From the console output, it can be observed that the information of models in the NEF is printed, including model ID, raw resolution, intput channel, raw image format and raw output size.

Raw output size indicates that a buffer of the size should be prepared to receive the output directly from the AI dongle, and it cannot be used until converting the raw output to well-structed floating point values.

 

If [post process] is set to 'yolo_v3', it draws detected objects in a new-created ex_kdp2_generic_inference_raw.bmp.

 

Otherwise if [post process] is set to 'bypass', the example dumps floating point values into .txt files for each output node.

$ sudo ./ex_kdp2_generic_inference_raw -p bypass

-h : help
-s : [device index] = '1'
-m : [model file path] (.nef) = '../../input_models/KL520/tiny_yolo_v3/models_520.nef'
-i : [image file path] (.bmp 24bits) = '../../input_images/one_bike_many_cars_224x224.bmp'
-d : [inference model ID] = '19'
-c : [image format] (RGB565, RGBA8888) = 'RGB565'
-n : [normalize mode] ('none', '0_1', '-0.5_0.5', '-1_1') = '-0.5_0.5'
-p : [post process] ('bypass', 'yolo_v3') = 'bypass'
-l : [test loops] = '20'


connect device ... OK
load model(s) ... OK
time spent: 0.48 secs

this NEF contains 1 model(s):
[1] model ID = 19
[1] model raw input width = 224
[1] model raw input height = 224
[1] model input channel = 3
[1] model raw image format = RGBA8888
[1] model raw output size = 430224

target inference model ID : 19
allocate memory 430224 bytes for RAW output
image resolution 224x224, format RGB565
(device will convert it to 224x224, RGBA8888, only support downscaling)

starting doing inference, loop = 20
....................

total inference 20 images
time spent: 0.53 secs, FPS = 37.6

================== [Post Process] 'bypass' =========================

number of output node : 2

node 0:
width: 7:
height: 7:
channel: 255:
number of data (float): 12495:
first 20 data:
        1.699, 0.340, -0.170, -0.340, -0.340, -0.340,
        -1.019, 1.189, 1.189, 0.679, 0.510,
        0.340, 0.340, 0.510, -0.170, 0.000,
        0.170, 0.170, 0.170, 0.000,

node 1:
width: 14:
height: 14:
channel: 255:
number of data (float): 49980:
first 20 data:
        1.048, 0.000, -0.699, 1.048, 0.175, -0.175,
        0.349, -0.175, 0.349, -0.524, 0.175,
        -0.175, 0.000, -0.349, 1.573, 1.223,
        0.699, 0.874, 0.699, 0.349,

dumped node 0 output to 'node0_7x7x255.txt'
dumped node 1 output to 'node1_14x14x255.txt'

===========================================================================

disconnecting device ...

 

For example, 'node0_7x7x255.txt' represents node 0, width x height x channel = 7x7x255, and its content looks like below.

1.699, 0.340, -0.170, -0.340, -0.340, -0.340, -1.019, 1.189, 1.189, 0.679, 0.510, 0.340, 0.340, 0.510, -0.170, 0.000, 0.170, 0.170, 0.170, 0.000, -0.340, -0.340, -0.679, -0.679, -0.849, -0.679, -0.510, 0.000, -7.984, -4.586, -4.247, -4.586, -5.266, -5.266, -8.153, -2.718, -4.416, -4.416, -4.077, -3.397, -3.737, -3.058, -6.115, -7.984, -8.833, -8.663, -9.003, -9.682, -8.323, -1.019, -1.019, -1.529, -1.529, -1.869, -3.227, -4.077, -5.096, -6.115, -7.304, -6.964, -7.304, -8.323, -8.153, -7.304, -10.192, -10.701, -10.022, -9.512, -8.153, -6.285, -1.529, -1.359, -1.189, -1.019, -0.510, 0.170, 0.170, -5.266, -7.134, -7.474, -6.795, -6.625, -5.606, -4.416, 0.340, 0.679, 0.510, -0.170, -0.679, -0.679, -0.510, -7.474, -10.192, -11.041, -10.192, -10.192, -8.833, -6.625, -5.775, -7.134, -6.625, -5.775, -5.436, -5.096, -5.096, -7.304, -9.512, -10.362, -9.852, -9.512, -9.512, -8.323, -9.173, -10.362, -10.532, -9.852, -9.512, -8.663, -7.644, -7.984, -9.512, -9.512, -8.833, -8.493, -8.493, -7.814, -5.606, -6.964, -6.625, -6.115, -6.285, -6.625, -6.115, -8.663, -11.041, -11.721, -10.871, -10.871, -10.871, -9.343, -8.663, -9.852, -9.682, -9.343, -9.173, -10.022, -9.682, -7.474, -8.833, -9.003, -8.833, -8.663, -9.852, -9.512, -9.003, -11.551, -12.230, -11.551, -11.721, -11.211, -9.343, -8.323, -10.022, -10.701, -10.192, -9.852, -9.512, -8.493, -8.663, -9.682, -10.022, -9.512, -10.022, -10.022, -9.003, -6.964, -7.304, -7.134, -6.964, -6.795, -7.304, -6.795, -8.153, -9.343, -9.682, -9.173, -9.512, -10.022, -9.512, -7.474, -9.512, -10.701, -10.022, -10.532, -10.022, -8.493, -5.775, -7.474, -7.814, -6.795, -6.795, -6.795, -5.096, -8.663, -10.701, -11.041, -10.701, -9.852, -9.682, -9.003, -6.115, -8.493, -8.493, -7.984, -6.964, -5.436, -3.907, -8.663, -10.362, -10.362, -9.852, -9.003, -8.493, -7.304, -9.003, -10.192, -10.532, -9.512, -9.512, -10.192, -10.362, -7.984, -10.532, -10.701, -10.192, -9.173, -8.663, -7.644, -11.041, -12.740, -12.910, -12.570, -12.060, -11.721, -10.532, -7.644, -10.362, -11.381, -11.041, -10.532, -10.871, -10.022, -8.493, -11.211, -11.381, -10.362, -11.041, -11.041, -9.852, -7.814, -9.512, -10.022, -9.512, -8.663, -8.663, -7.814, -6.795, -9.512, -10.022, -9.343, -9.343, -8.663, -7.134, -8.833, -10.871, -11.211, -11.041, -10.701, -10.192, -9.003, -9.852, -11.551, -13.080, -12.400, -11.890, -11.721, -10.701, -8.323, -10.532, -10.701, -10.362, -10.362, -9.852, -9.003, -10.192, -12.570, -12.400, -11.721, -11.721, -11.551, -9.682, -8.153, -10.532, -11.041, -10.701, -10.192, -10.022, -8.323, -7.474, -9.173, -9.343, -8.323, -7.984, -7.814, -7.304, -9.173, -11.551, -12.740, -11.721, -11.041, -10.192, -8.153, -9.343, -11.890, -12.740, -11.721, -11.211, -10.362, -8.493, -8.153, -10.701, -11.381, -10.362, -10.022, -9.512, -8.663, -8.493, -10.362, -10.362, -9.343, -9.003, -8.663, -7.814, -9.003, -11.211, -11.721, -10.701, -10.701, -10.532, -9.512, -9.852, -10.022, -9.852, -9.343, -9.512, -9.003, -8.323, -9.682, -12.060, -12.400, -11.890, -11.381, -10.871, -9.852, -8.663, -10.192, -10.532, -10.192, -9.512, -9.173, -8.493, -10.532, -12.400, -12.400, -11.721, -10.701, -10.192, -8.833, -7.304, -9.173, -9.512, -8.663, -7.814, -7.474, -6.795, -5.945, -6.964, -6.795, -6.455, -5.945, -6.625, -6.455, -7.644, -9.343, -9.512, -9.173, -8.663, -7.814, -7.474, -10.871, -12.910, -13.419, -12.570, -11.721, -10.532, -9.852, -10.701, -12.570, -12.740, -12.400, -11.890, -11.551, -10.022, -7.304, -8.663, -9.682, -8.833, -8.323, -8.153, -7.304, -10.701, -12.910, -13.080, -12.400, -11.381, -11.041, -9.512, -7.304, -9.852, -9.343, -8.323, -7.984, -7.984, -6.964, -8.493, -10.701, -10.362, -9.852, -9.512, -9.682, -8.663, -4.756, -6.625, -6.625, -5.606, -4.416, -5.096, -4.416, -9.852, -11.551, -11.890, -11.041, -10.701, -10.871, -9.852, -8.323, -10.532, -9.682, -8.833, -8.833, -9.512, -8.323, -8.153, -10.022, -10.022, -8.833, -8.323, -8.323, -7.814, -8.153, -10.362, -10.022, -9.343, -8.833, -7.984, -6.455, -9.682, -11.551, -11.211, -10.871, -10.362, -9.512, -8.323, -10.532, -12.400, -12.910, -12.400, -11.721, -11.381, -10.532, -9.343, -11.890, -12.230, -11.551, -11.211, -10.532, -9.512, -8.663, -10.532, -10.192, -9.852, -9.173, -8.833, -8.153, -9.173, -10.532, -11.041, -10.362, -10.022, -8.833, -8.323, -7.984, -8.663, -8.323, -8.323, -8.153, -7.814, -6.795, -7.814, -8.833, -8.663, -8.323, -8.153, -8.153, -7.474, -9.852, -10.192, -10.362, -9.852, -9.682, -9.343, -8.663, -9.682, -11.551, -11.211, -10.532, -10.022, -10.022, -8.833, -7.304, -9.173, -8.493, -7.814, -7.304, -6.625, -5.096, -7.984, -9.852, -10.022, -9.512, -9.512, -9.343, -8.153, -9.003, -10.701, -11.381, -10.701, -10.871, -9.852, -8.153, -9.173, -11.041, -11.381, -10.192, -9.173, -9.003, -7.474, -9.003, -10.871, -11.381, -10.362, -10.022, -9.852, -9.343, -8.323, -10.022, -10.362, -10.022, -9.512, -10.192, -9.512, -10.362, -11.381, -11.721, -11.041, -10.701, -10.701, -10.192, -11.041, -12.910, -12.740, -12.060, -11.381, -11.211, -10.701, -0.170, 0.170, -0.340, -0.679, -0.340, -0.679, -0.170, 0.000, 0.510, 0.849, 0.849, 0.849, 0.510, 0.340, -0.170, 0.340, 0.340, 0.510, 0.510, 0.170, -0.510, -0.510, -0.679, -0.679, -0.679 ....