GPU – Dan Worgan

A cross platform machine learning deployment solution.

Enable fast and efficient deployment of deep learning neural network models on embedded devices.

Uses a separated front-end/back-end design. I.e. a single model can be transplanted and deployed onto multiple hardware platforms like CPU, GPU, NPU and so on.

Code once, run anywhere – and make use of acceleration.

We want to develop on x86, and deploy to ARM and NPU accelerator.

Tengine logo

Installation

Follow these steps for installing Tengine.

Make a workspace and then download the Tengine code:

cd ~
mkdir Tengine
cd Tengine
git clone -b tengine-lite https://github.com/OAID/Tengine.git

Then setup a build directory to store the compiled files:

cd Tengine-Lite
mkdir build 
cd build

Then build the files. Use -j 12 when calling make to use multi-core and speed up compilation. Change 12 to however many threads/cores the PC has.

cmake ..
make -j 12
make install

Confirm the installation was okay by using tree.

sudo apt-get install tree
tree install

The output should look like this

install
├── bin
│   ├── tm_classification
│   ├── tm_classification_int8
│   ├── tm_classification_uint8
│   ├── tm_efficientdet
│   ├── tm_efficientdet_uint8
│   ├── tm_landmark
│   ├── tm_landmark_uint8
│   ├── tm_mobilefacenet
│   ├── tm_mobilefacenet_uint8
│   ├── tm_mobilenet_ssd
│   ├── tm_mobilenet_ssd_uint8
│   ├── tm_retinaface
│   ├── tm_ultraface
│   ├── tm_yolofastest
│   └── tm_yolov5
├── include
│   └── tengine
│       ├── c_api.h
│       └── defines.h
└── lib
    ├── libtengine-lite.so
    └── libtengine-lite-static.a

Test inference

The examples page walks through running Tengine demos.

First make a folder to store the models in the root of the Tengine directory.

cd ~/Tengine/Tengine-Lite
mkdir models

Download the efficientdet.tmfile model from the model zoo. Save it in to the models directory.

Download the model from the Google drive model zoo

Create another folder to store our test images.

mkdir images

Then download an image to detect, e.g.

curl https://camo.githubusercontent.com/beb822ba942ae1904a1355586fd964b8a2374a6ebf31a1e10c1cf41243e3d784/68747470733a2f2f7a332e617831782e636f6d2f323032312f30362f33302f5242566471312e6a7067 --output images/ssd_dog.jpg

Use the command from the example – it exports the path for the Tengine-library first, and then runs the detector.

export LD_LIBRARY_PATH=./build/install/lib
./build/install/bin/tm_efficientdet -m models/efficientdet.tmfile -i images/ssd_dog.jpg -r 1 -t 1

Should see some output – the detector worked successfully.

dan@antec:~/Tengine/Tengine-Lite$ ./build/install/bin/tm_efficientdet -m models/efficientdet.tmfile -i images/ssd_dog.jpeg -r 1 -t 1

Image height not specified, use default 512
Image width not specified, use default  512
Scale value not specified, use default  0.017, 0.018, 0.017
Mean value not specified, use default   123.7, 116.3, 103.5
tengine-lite library version: 1.5-dev

model file : models/efficientdet.tmfile
image file : images/ssd_dog.jpeg
img_h, img_w, scale[3], mean[3] : 512 512 , 0.017 0.018 0.017, 123.7 116.3 103.5
Repeat 1 times, thread 1, avg time 512.70 ms, max_time 512.70 ms, min_time 512.70 ms
--------------------------------------
17:  80%, [ 132,  222,  315,  535], dog
 7:  73%, [ 467,   74,  694,  169], truck
 1:  42%, [ 103,  119,  555,  380], bicycle
 2:  29%, [ 687,  113,  724,  156], car
 2:  25%, [  57,   77,  111,  124], car

First install CUDA 10.1. Go to the releases page for NVIDIA and download the installer for Cuda 10.1, Windows 10.

After the installer has finished, download CUDNN 7.5 from the NVIDIA archive and install into cuda 10.1. To install, extract the CUDNN download and then copy the CUDNN bin file to the CUDA 10.1 bin directory, the CUDNN include file into the CUDA include directory, and the CUDNN lib file into the CUDA lib directory.

Now open a PowerShell in the C: drive or wherever we want to install OpenPose and enter:

git clone https://github.com/CMU-Perceptual-Computing-Lab/openpose
cd openpose

There are a few submodules in the OpenPose code that do not download automatically (like pybind, clang and caffe modules). We need to issue an extra command to download the submodules.

git submodule update --init --recursive

To configure and compile OpenPose use CMake (I used 3.17), so download and install that; and we also need the Visual Studio 2017 compiler tools so download and install that as well. If you already have Visual Studio installed you should be able to use the Visual Studio installer to install 2017 tool chains. Also for this to work with Python we need to make sure that our Python interpreter is installed, I have Anaconda installed, CMake should find it automatically.

CMake is used to configure and prepare the software for our system before it is run through a compiler and turned into executables. Open CMake and click “browse source” button to set the location of the OpenPose code. E.g. select E:\OpenPoseDemo\openpose. Then click “browse build” to set the specify where to save the compiled code. Standard practice is to save the compiled executables to a folder called “build” inside the root directory of the OpenPose code. I.e. make a new folder in the E:\OpenPoseDemo\openpose directory called “build”, then select E:\OpenPoseDemo\openpose\build as the location to build the binaries.

Click the configure button and select VS 2017 as our generator, x64 as the platform, leave the last entry blank and use default native compilers. The CMake program will use the make files in the repo to configure and download any extra dependencies and files.

When the configuration is done CMake should be populated with parameters of the build. We want to build the Python API so find the BUILD_PYTHON option and enable it by clicking the tickbox. I also wanted the COCO 18 point estimation model so I checked the DOWNLOAD_BODY_COCO_MODEL option. Then click Generate to prepare the software for compilation. It will take a while to download the extra models as well.

Assuming there were no errors, we should be able to click the Open Project button which will open up Visual Studio with the configured OpenPose code for compilation. Next to the green play button use the drop down to set to “release” mode and “x64” mode then in the tool bar click build -> build solution.

Assuming that has run without errors, we can now test the code. Back in the PowerShell, change into the build/examples/tutorial_api_python directory and run an example and hopefully see some positive output.

cd \build\examples\tutorial_api_python
python 01_body_from_image.py

Tag: GPU

Getting started with Tengine

Installation

Test inference

Compiling OpenPose for GPU and Python