Py学习  »  机器学习算法

如何将 MATLAB 中开发的深度学习应用部署到 NVIDIA Jetson Xavier NX?

MATLAB • 3 年前 • 293 次点击  
如果要将 MATLAB 里的一个从摄像头实时获得画面并进行分类的例子跑在 NVIDIA Jetson Xavier NX 上,我该怎么做?

    ◆  

譬如从 Darknet 导入的 YOLOv3 或者 YOLOv4:yolov3-yolov4-matlabhttps://ww2.mathworks.cn/matlabcentral/fileexchange/75305-yolov3-yolov4-matlab),或者参考文末的脚本。
又或者是简单点的用摄像头做个图像分类的例子:Deployment and Classification of Webcam Images on NVIDIA Jetson TX2 Platform https://ww2.mathworks.cn/help/gpucoder/ug/deployment-classification-webcam-images-on-NVIDIA-Jetson-TX2.html
第二个例子本身并不复杂,大家按 MATLAB 的文档就可以跑,问题常常出在环境设置上。
那咱们今天就重点说一下环境设置。
至于第一个例子以及完整流程(以及解决中间可能遇到的问题),如果你遇到困难,欢迎在评论中留言,我们将定期回复。

    ◆  

桌面电脑软件安装与设置:
1. MATLAB R2021a,运行于 Windows 10 企业版
2. CUDA 11.1
3. cuDNN 8.0.4
4. TensorRT 7.2.2
5. Visual Studio 2017
安装文档参见:https://ww2.mathworks.cn/help/gpucoder/gs/install-prerequisites.html

Installing Prerequisite Products

设置文档参见:https://ww2.mathworks.cn/help/gpucoder/gs/setting-up-the-toolchain.html

Setting Up the Prerequisite Products

为 MATLAB 安装附加功能/硬件支持包,下图列出的可以都装上:

必须安装的包括:
1. MATLAB Coder Interface for Deep Learning Libraries
2. GPU Coder Interface for Deep Learning Libraries
3. MATLAB Coder Support Package for NVIDIA Jetson and NVIDIA DRIVE Platforms
墙裂建议安装:
1. Deep Learning Toolbox Model Quantization Library
2. Simulink Coder Support Package for NVIDIA Jetson and NVIDIA DRIVE Platforms
其它如有缺的,后续步骤应该都会自动报错提醒。
附加功能/硬件支持包搞不定的看这里 ↓↓↓
经验分享 | 玩转 MATLAB 附加功能/硬件支持包安装

    ◆  

NVIDIA Jetson Xavier NX的安装设置:
初始步骤
1. 从 http://nvidia.com/JetsonXavierNX-start 下载最新的 Jetson Xavier NX Developer Kit SD Card Image
2. 按步骤完成 SD 卡的写入及安装
3. 链接显示器、键盘、鼠标、网线,上电
4. 开机成功,配置机器名及用户账户,并顺手更新一下系统
5. 通过 ssh(如 putty 或者 MobaXterm 远程登陆,成功)
给 Xavier NX 安装必要的库与设置环境变量
参考文档:https://ww2.mathworks.cn/help/releases/R2021a/supportpkg/nvidia/ug/install-and-setup-prerequisites.html

Installing Prerequisite Products

1. 安装必要的库
sudo apt-get install libsdl1.2-dev v4l-utils sox libsox-fmt-all libsox-dev
2. 设置环境变量
修改/etc/environment,如下
PATH="/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
LD_LIBRARY_PATH="/usr/local/cuda/lib64/"
3. OpenCV
新版本的系统都是自带 OpenCV 4,MATLAB R2021a 是支持的。
但咱们得注意区分 opencv3 和 opencv4 的引用,否则可能生成代码后编译会出错。
本文开头给的第二个例子里有脚本区分的。

Deployment and Classification of Webcam Images on NVIDIA Jetson TX2 Platform

https://ww2.mathworks.cn/help/gpucoder/ug/deployment-classification-webcam-images-on-NVIDIA-Jetson-TX2.html
但如果你还是想回到opencv 3,也有人把一切都做好了,简单点可以直接拿来用吧,参考这里 Build OpenCV 3.4 on NVIDIA Jetson AGX Xavier Developer Kithttps://www.jetsonhacks.com/2018/11/08/build-opencv-3-4-on-nvidia-jetson-agx-xavier-developer-kit/卸载 opencv 4,编译安装 opencv 3,耗时较长,需耐心等待。

    ◆  

蜜汁冲突
NVIDIA Jetson Xavier NX的系统映像里默认装好了以下软件包:
 CUDA Version       : 10.2
cuDNN Version : 8.0
TensorRT Version : 7.1
OpenCV Version : 4.1.1

我们在这个系统上,除了指定 nvcc 的路径和对应的库路径并不需要额外指定 tensorrt 和 cudnn 的路径。

但我们在开发机的设置步骤里,环境变量部分有设置 NVIDIA_TENSORRT 和 NVIDIA_CUDNN,测试发现,前者会导致对 Jetson 部署基于 tensorrt 的代码失败,所以,得去掉这个环境变量。

    ◆  

检查环境是否可用了
1. 连接设备
>> hwobj= jetson('jetson-host','user','password')
Checking for CUDA availability on the Target...
Checking for 'nvcc' in the target system path...
Checking for cuDNN library availability on the Target...
Checking for TensorRT library availability on the Target...
Checking for prerequisite libraries is complete.
Gathering hardware details...
Checking for third-party library availability on the Target...
Gathering hardware details is complete.
Board name : NVIDIA Jetson AGX Xavier
CUDA Version : 10.2
cuDNN Version : 8.0
TensorRT Version : 7.1
GStreamer Version : 1.14.5
V4L2 Version : 1.14.2-1
SDL Version : 1.2
OpenCV Version : 4.1.1
Available Webcams :
Available GPUs : Xavier

hwobj =

jetson - 属性:

DeviceAddress: 'sha-xaviernx'
Port: 22
BoardName: 'NVIDIA Jetson AGX Xavier'
CUDAVersion: '10.2'
cuDNNVersion: '8.0'
TensorRTVersion: '7.1'
SDLVersion: '1.2'
V4L2Version: '1.14.2-1'
GStreamerVersion: '1.14.5'
OpenCVVersion: '4.1.1'
GPUInfo: [1×1 struct]
WebcamList: []
2. 检查基于 cudnn 的代码生成
>> envCfg = coder.gpuEnvConfig('jetson');
envCfg.DeepLibTarget = 'cudnn';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 0;
envCfg.HardwareObject = hwobj;
coder.checkGpuInstall(envCfg)
Compatible GPU : PASSED
CUDA Environment : PASSED
Runtime : PASSED
cuFFT : PASSED
cuSOLVER : PASSED
cuBLAS : PASSED
cuDNN Environment : PASSED
Deep Learning (cuDNN) Code Generation: PASSED

ans =

包含以下字段的 struct:

gpu: 1
cuda: 1
cudnn: 1
tensorrt: 0
basiccodegen: 0
basiccodeexec: 0
deepcodegen: 1
deepcodeexec: 0
tensorrtdatatype: 0
profiling: 0

3. 检查基于 tensorrt 的代码生成

>> envCfg = coder.gpuEnvConfig('jetson');
envCfg.DeepLibTarget = 'tensorrt';
envCfg.DeepCodegen = 1;
envCfg.Quiet = 0;
envCfg.HardwareObject = hwobj;
coder.checkGpuInstall(envCfg)
Compatible GPU : PASSED
CUDA Environment : PASSED
Runtime : PASSED
cuFFT : PASSED
cuSOLVER : PASSED
cuBLAS : PASSED
cuDNN Environment : PASSED
TensorRT Environment : PASSED (Warning: Deep learning code generation has been tested with TensorRT v7.2. The provided TensorRT library v7.1 may not be fully compatible.)
Deep Learning (TensorRT) Code Generation: PASSED

ans =

包含以下字段的 struct:

gpu: 1
cuda: 1
cudnn: 1
tensorrt: 1
basiccodegen: 0
basiccodeexec: 0
deepcodegen: 1
deepcodeexec: 0
tensorrtdatatype: 1
profiling: 0

都好了,可以开始跑上面给的或者 MATLAB 里其它的示例了。

跑起来的界面可能是这样子的(不用手写一行 C/C++/CUDA 代码):

    ◆  

附:yolov3_detection 脚本
%% Object Detection Using YOLO v3 608x608
functionout =yolov3_detection()%% Update buildinfo with the OpenCV library flags.
%opencv_link_flags = '`pkg-config --cflags --libs opencv`'; % opencv 3
opencv_link_flags = '`pkg-config --cflags --libs opencv4`'; % opencv 4
coder.updateBuildInfo('addLinkFlags',opencv_link_flags);
%coder.inline('never');

% Connect to webcam
hwobj = jetson;
wcam = webcam(hwobj, 1, '1280x720');
img_w = 1280;
img_h = 720;
player = imageDisplay(hwobj);

%%
orgImg = snapshot(wcam);
image(player, orgImg);

%%
imgSize = 608;
out = zeros([img_h img_w 3], 'uint8');

ratio = min(imgSize/img_w, imgSize/img_h);

% Image height and width after resizing image
w = round(img_w * ratio);
h = round(img_h * ratio);
st_h = round((imgSize - h)/2) + 1;
st_w = round((imgSize - w)/ 2) + 1;

fps = 0;
while true
orgImg = snapshot(wcam);
orgImg = fliplr(orgImg);
in = im2single(orgImg);
% img = imadjust(img, stretchlim(img,[0.01,0.80]));
% img = histeq(img);
%Creating background
in3 = ones(imgSize, imgSize, 3, 'like', in) * 0.5;
in2 = imresize(in, [h, w]); %,'Method','bilinear','AntiAliasing',false);
in3(st_h:st_h+h-1, st_w:st_w+w-1, :) = in2;

tic; % Count FPS
predictions = yolov3_detect(in3);
elapsedTime = toc;
fps = .9*fps + .1*(1/elapsedTime);

% post-processing and display the results
out = postProcess(predictions, orgImg, w, h);
out = insertText(out, [1, 1], sprintf('FPS %2.2f', fps), 'FontSize', 26, 'BoxColor', [0,150,0]);
out = imresize(out, [img_h img_w]);
image(player, out);
end
end
附:将 yolov3_detection 生成为基于 TensorRT FP16 的 CUDA 代码并部署到 NVIDIA Jetson Xavier NX 上:
%% connect hardware
hwobj = jetson('host-name','user','password');

%% Generate CUDA Code for the Target Using GPU Coder
% To generate a CUDA executable that can be deployed on to a NVIDIA
% target, create a GPU code configuration object for generating an executable.
cfg = coder.gpuConfig('exe');
cfg.GenerateReport = true;
cfg.Hardware = coder.hardware('NVIDIA Jetson');
cfg.DeepLearningConfig = coder.DeepLearningConfig('tensorrt');
cfg.DeepLearningConfig.DataType = 'fp16';
cfg.GpuConfig.ComputeCapability = '7.0';
cfg.Hardware.BuildDir = '~/remoteBuildDir';
cfg.GpuConfig.SelectCudaDevice = 0;
cfg.GenerateExampleMain = 'GenerateCodeAndCompile';

codegen('-config ',cfg,'yolov3_detection', '-report')

%% Run the Sobel Edge Detection on the Target
% Run the generated executable on the target.
%
pid = hwobj.runApplication('yolov3_detection');

    ◆  

MATLAB EXPO 2021 中国用户大会

AI 助力科学与工程创新

5月25日,6月8日 | 上海,北京

扫码立即注册 ↓↓↓

Python社区是高质量的Python/Django开发社区
本文地址:http://www.python88.com/topic/112151
 
293 次点击