Lite-HRNet: A Lightweight High-Resolution Network

Introduction

Pose estimation task with Lite-HRNet model.

Prerequisites

Python 3.6 or above
PyTorch 1.3 or above
CUDA 9.2 or above
GCC 5+

Important: Please note that CUDA training is not supported in Kneron docker. You are expected to use your own GPUs and have correct cuda version installed.

Installation

For installing Pytorch, you have to check your CUDA version and select the correct Pytorch version. You can check your CUDA version by executing nvidia-smi in your terminal. For example, install Pytorch 1.7.0 with CUDA 11.0:

$ conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch

Install all necessary packages in the requirements.txt:

$ pip install -r requirements.txt

Install mmcv with the version 1.3.3:

$ pip install mmcv-full==1.3.3 -f https://download.openmmlab.com/mmcv/dist/{cu_version}/{torch_version}/index.html

Replace {cu_version} and {torch_version} in the url to your desired versions. For example, to install mmcv-full==1.3.3 with CUDA 10.2 and PyTorch 1.6.0, use the following command:

pip install mmcv-full==1.3.3 -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.6.0/index.html

See here for different versions of MMCV compatible to different PyTorch and CUDA versions. Important: You need to run pip uninstall mmcv first if you have mmcv installed. If mmcv and mmcv-full are both installed, there will be ModuleNotFoundError.

Dataset & Preparation

It is recommended to symlink the dataset root to litehrnet/data. To create symlink, run:

ln -s source_file symbolic_link

Replace source_file with the name of the existing file for which you want to create the symbolic link and symbolic_link with the name of the symbolic link.

If your folder structure is different, you may need to change the corresponding paths in config files.

For COCO data, please download from COCO download, 2017 Train/Val is needed for COCO keypoints training and validation. HRNet-Human-Pose-Estimation provides person detection result of COCO val2017 to reproduce the multi-person pose estimation results. Please download from OneDrive Download and extract them under litehrnet/data, and make them look like this:

lite_hrnet
├── configs
├── models
├── tools
`── data
    │── coco
        │-- annotations
        │   │-- person_keypoints_train2017.json
        │   |-- person_keypoints_val2017.json
        |-- person_detection_results
        |   |-- COCO_val2017_detections_AP_H_56_person.json
        │-- train2017
        │   │-- 000000000009.jpg
        │   │-- 000000000025.jpg
        │   │-- 000000000030.jpg
        │   │-- ...
        `-- val2017
            │-- 000000000139.jpg
            │-- 000000000285.jpg
            │-- 000000000632.jpg
            │-- ...

For MPII data, please download from MPII Human Pose Dataset. The original annotation files have been converted into json format, please download them from mpii_annotations. Extract them under $LITE_HRNET/data, and make them look like this:

lite_hrnet
├── configs
├── models
├── tools
`── data
    │── mpii
        |── annotations
        |   |── mpii_gt_val.mat
        |   |── mpii_test.json
        |   |── mpii_train.json
        |   |── mpii_trainval.json
        |   `── mpii_val.json
        `── images
            |── 000001163.jpg
            |── 000003072.jpg

Modify MMPose for Kneron PPP

To use Kneron pre-post-processing during training and testing, you have to replace some files in the mmpose package in your python/anaconda env. You can use python -m site to check you env. Specific files are:

site-packages/mmpose/core/post_processing/post_transforms.py
site-packages/mmpose/datasets/pipelines/top_down_transform.py
site-packages/mmpose/datasets/pipelines/loading.py
site-packages/mmpose/datasets/pipelines/shared_transform.py

You may replace these files by the cooresponding files in the mmpose_replacement folder.

Moreover, you need copy and paste prepostprocess/kneron_preprocessing/ to your python/anaconda env site-packages.

Train

A configuration file is needed for training of Lite-HRNet. We prepared several config files in /litehrnet/configs/top_down/lite_hrnet for different settings.

All outputs (log files and checkpoints) will be saved to the working directory, which is specified by work_dir as an optional argument (default: /litehrnet/work_dirs/).

By default, we evaluate the model on the validation set after each epoch, you can change the evaluation interval by modifying the interval argument in the config file CONFIG_FILE.

# train with a signle GPU
python train.py ${CONFIG_FILE} [optional arguments]

Optional arguments are:

CONFIG_FILE (required) Path to config file.
--no-validate: Not perform evaluation at every k epochs during the training.
--work-dir ${WORK_DIR}: Override the working directory specified in the config file.
--gpus ${GPU_NUM}: Number of gpus to use.
--gpu-ids: IDs of gpus to use.
--deterministic: If specified, it will set deterministic options for CUDNN backend.

Difference between resume-from and load-from in CONFIG_FILE: resume-from loads both the model weights and optimizer status, and the epoch is also inherited from the specified checkpoint. It is usually used for resuming the training process that is interrupted accidentally. load-from only loads the model weights and the training epoch starts from 0. It is usually used for finetuning.

Convert to ONNX

To export onnx model, we have to modify a forward function in the mmpose package. The specific file is site-packages/mmpose/models/detectors/top_down.py in your python/anaconda env. You can use python -m site to check you env. Change the forward function in line 81 from:

def forward(self,
            img,
            target=None,
            target_weight=None,
            img_metas=None,
            return_loss=True,
            return_heatmap=False,
            **kwargs):
    """Calls either forward_train or forward_test depending on whether
    return_loss=True. Note this setting will change the expected inputs.
    When `return_loss=True`, img and img_meta are single-nested (i.e.
    Tensor and List[dict]), and when `resturn_loss=False`, img and img_meta
    should be double nested (i.e.  List[Tensor], List[List[dict]]), with
    the outer list indicating test time augmentations.

    Note:
        batch_size: N
        num_keypoints: K
        num_img_channel: C (Default: 3)
        img height: imgH
        img width: imgW
        heatmaps height: H
        heatmaps weight: W

    Args:
        img (torch.Tensor[NxCximgHximgW]): Input images.
        target (torch.Tensor[NxKxHxW]): Target heatmaps.
        target_weight (torch.Tensor[NxKx1]): Weights across
            different joint types.
        img_metas (list(dict)): Information about data augmentation
            By default this includes:
            - "image_file: path to the image file
            - "center": center of the bbox
            - "scale": scale of the bbox
            - "rotation": rotation of the bbox
            - "bbox_score": score of bbox
        return_loss (bool): Option to `return loss`. `return loss=True`
            for training, `return loss=False` for validation & test.
        return_heatmap (bool) : Option to return heatmap.

    Returns:
        dict|tuple: if `return loss` is true, then return losses.
          Otherwise, return predicted poses, boxes, image paths
              and heatmaps.
    """
    if return_loss:
        return self.forward_train(img, target, target_weight, img_metas,
                                  **kwargs)
    return self.forward_test(
        img, img_metas, return_heatmap=return_heatmap, **kwargs)

def forward(self,
            img,
            target=None,
            target_weight=None,
            img_metas=None,
            return_loss=True,
            return_heatmap=False,
            **kwargs):
    """Calls either forward_train or forward_test depending on whether
    return_loss=True. Note this setting will change the expected inputs.
    When `return_loss=True`, img and img_meta are single-nested (i.e.
    Tensor and List[dict]), and when `resturn_loss=False`, img and img_meta
    should be double nested (i.e.  List[Tensor], List[List[dict]]), with
    the outer list indicating test time augmentations.

    Note:
        batch_size: N
        num_keypoints: K
        num_img_channel: C (Default: 3)
        img height: imgH
        img width: imgW
        heatmaps height: H
        heatmaps weight: W

    Args:
        img (torch.Tensor[NxCximgHximgW]): Input images.
        target (torch.Tensor[NxKxHxW]): Target heatmaps.
        target_weight (torch.Tensor[NxKx1]): Weights across
            different joint types.
        img_metas (list(dict)): Information about data augmentation
            By default this includes:
            - "image_file: path to the image file
            - "center": center of the bbox
            - "scale": scale of the bbox
            - "rotation": rotation of the bbox
            - "bbox_score": score of bbox
        return_loss (bool): Option to `return loss`. `return loss=True`
            for training, `return loss=False` for validation & test.
        return_heatmap (bool) : Option to return heatmap.

    Returns:
        dict|tuple: if `return loss` is true, then return losses.
          Otherwise, return predicted poses, boxes, image paths
              and heatmaps.
    """
    return self.forward_dummy(img)

Then, execute the following command under the directory litehrnet:

python export2onnx.py ${CONFIG_FILE} ${CHECKPOINT_FILE}

Next, pull the latest ONNX converter from github. You may read the latest document from Github for converting ONNX model. Execute commands in the folder ONNX_Convertor/optimizer_scripts: (reference: https://github.com/kneron/ONNX_Convertor/tree/master/optimizer_scripts)

python pytorch_exported_onnx_preprocess.py output.onnx output_optimized.onnx

Inference

Before model inference, we assume that the model has been converted to onnx model as in the previous section. Create yaml files containing the initial parameter information. Some yaml files are provided in utils folder. For model inference on a single image, execute commands under the folder litehrnet:

python inference.py --img-path ${IMAGE_PATH} --yolov5_params ${YOLOV5_INIT_PARAMS} --rsn_affine_params ${RSN_AFFINE_INIT_PARAMS} --lite_hrnet_params ${LITEHRNET_INIT_PARAMS}

Evaluation

You can use the following commands to test a dataset.

# single-gpu testing
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRIC}] [--average_clips ${AVG_TYPE}]

Optional arguments:

CONFIG_FILE (required) Path to config file.
CHECKPOINT_FILE (required) Path to pretrained model.
RESULT_FILE: Filename of the output results. If not specified, the results will not be saved to a file.
EVAL_METRIC: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., "mAP" for MSCOCO.
NUM_PROC_PER_GPU: Number of processes per GPU. If not specified, only one process will be assigned for a single gpu.
AVG_TYPE: Items to average the test clips. If set to prob, it will apply softmax before averaging the clip scores. Otherwise, it will directly average the clip scores.

End-to-End Evaluation

If you would like to perform an end-to-end test with an image dataset, you can use inference_e2e.py under the directory litehrnet to obtain the prediction results. Here, yolov5 is used for detecting person bbox. You have to prepare an initial parameter yaml file for each model runner. You may check utils/yolov5_init_params.yaml for the format.

python inference_e2e.py --img-path ${IMAGE_PATH} --yolov5_params ${YOLOV5_INIT_PARAMS} --rsn_affine_params ${RSN_AFFINE_INIT_PARAMS} --lite_hrnet_params ${LITEHRNET_INIT_PARAMS} --save-path ${OUTPUT_JSON_FILE}

The predictions will be saved into a json file that has the following structure:

[
    {'img_path':image_path_1
    'lmk_coco_body_17pts': [...]
    },
    {'img_path':image_path_2
    'lmk_coco_body_17pts': [...]
    },
    ...
]

Note that your image path has to be the same as the image path in ground truth json.

Model

Backbone	Input Size	FPS on 520	FPS on 720	Model Size	mAP
litehrnet_no_shuffle_no_avgpool.py	256x192	8.81063	119.38	8M	87.4%