1. 首页 > 生活常识

瑞芯微rknnnpu支持的算子 - iT社区 瑞芯微rk3566

特意说明,参考官网开源的yolov8代码、瑞芯微官网文档、地平线的官网文档,如有侵权告知删,谢谢,模型和完整仿真测试代码,放在github上参考链接模型和代码,跟上技术的步调,yolov8首个板端芯片部署,1模型和训练训练代码参考官网开源的yolov8训练代码, 由于SiLU在有些板端芯片上还不允许,因此将其改为ReLU,2导出yolo...。

特意说明:参考官网开源的yolov8代码、瑞芯微官网文档、地平线的官网文档,如有侵权告知删,谢谢。

模型和完整仿真测试代码,放在github上参考链接模型和代码。

跟上技术的步调,yolov8 首个板端芯片部署。

1 模型和训练

训练代码参考官网开源的yolov8训练代码, 由于SiLU在有些板端芯片上还不允许,因此将其改为ReLU。

2 导出 yolov8 onnx

后处置中有些算在板端芯片上效率低或许不允许,导出 onnx 须要将板端芯片不友好或不允许算子规避掉。导出onnx修正的局部。(留意 下面内容步骤顺序不能乱,且有些步骤会运转保留,但只需能生成对应的文件就可以,报错不用管。)

第一步: 将pt只保留权重值,参与代码如下图。

# 保留权重值import torchselfmodelfuseselfmodelevaltorchsaveselfmodelstate_dict './weights/Yolov8_dict.pt'# self.model.load_state_dict(torch.load('./weights/Yolov8_dict.pt', p_location='cpu'))

修正后运转 下面内容代码:

from ultralytics import YOLO model YOLO'./weights/yolov8n_coco128.pt' results modeltask'detect' mode'predict' source'./i ges/test.jpg' line_thickness3 showTrue saveTrue device'cpu'

第二步: 导出onnx,去除不须要的算子。修正代码如下。

# heads class DetectnnModule# YOLOv8 Detect head for detection modelsdynamic False # force grid reconstructionexport False # export modeshape Noneanchors torchempty0 # initstrides torchempty0 # initdef __init__self nc80 ch # detection layersuper__init__selfnc nc # number of classesselfnl lench # number of detection layersselfreg_ x 16 # DFL channels (ch[0] // 16 to scale 4/8/12/16/20 for n/s/m/l/x)selfno nc selfreg_ x 4 # number of outputs per anchorselfstride torchzerosselfnl # strides computed during buildc2 c3 x16 ch0 4 selfreg_ x 4 xch0 selfnc # channelsselfcv2 nnModuleListnnSequentialConvx c2 3 Convc2 c2 3 nnConv2dc2 4 selfreg_ x 1 for x in chselfcv3 nnModuleListnnSequentialConvx c3 3 Convc3 c3 3 nnConv2dc3 selfnc 1 for x in chselfdfl DFLselfreg_ x if selfreg_ x 1 else nnIdentity# 导出 onnx 参与selfconv1x1 nnConv2d16 1 1 biasFalserequires_grad_Falsex torcharange16 dtypetorchfloatselfconv1x1weightdata nnParameterxview1 16 1 1def forwardself xshape x0shape # BCHWy for i in rangeselfnlt1 selfcv2ixit2 selfcv3ixiyappendselfconv1x1t1viewt1shape0 4 16 1transpose2 1soft x1# y.append(t2.sigmoid())yappendt2return yfor i in rangeselfnlxi torchcatselfcv2ixi selfcv3ixi 1if selftrainingreturn xelif selfdynamic or selfshape shapeselfanchors selfstrides xtranspose0 1 for x in ke_anchorsx selfstride 0.5selfshape shapebox cls torchcatxiviewshape0 selfno 1 for xi in x 2splitselfreg_ x 4 selfnc 1dbox dist2bboxselfdflbox selfanchorsunsqueeze0 xywhTrue dim1 selfstridesy torchcatdbox clssigmoid 1return y if selfexport else y xdef bias_initself# Initialize Detect() biases, WARNING: requires stride availabilitym self # self.model[-1] # Detect() module# cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1# ncf = th.log(0.6 / (m.nc - 0.999999)) if cf is None else torch.log(cf / cf.sum()) # nominal class frequencyfor a b s in zipmcv2 mcv3 mstride # froma1biasdata 1.0 # boxb1biasdatamnc thlog5 mnc 0 s 2 # cls (.01 objects, 80 classes, 0 img)

参与保留onnx模型代码,如下:

# 导出 onnx 参与import torchselfmodelfuseselfmodelevalselfmodelload_state_dicttorchload'./weights/Yolov8_dict.pt' p_location'cpu' strictFalseprint"=========== onnx =========== "dummy_input torchrandn1 3 0 0input_names "data"output_names "cls1" "reg1" "cls2" "reg2" "cls3" "reg3"torchonnxexportselfmodel dummy_input "./weights/yolov8n_ZQ.onnx" verboseFalse input_namesinput_names output_namesoutput_names opset_version11print"======================== convert onnx Finished! .... "

以上修正后完运转 下面内容代码(留意和第一次性运转的不一样,这次加载的是yaml):

from ultralytics import YOLOmodel YOLO'./ultralytics/models/v8/yolov8n.yaml' results modeltask'detect' mode'predict' source'./i ges/test3.jpg' line_thickness3 showFalse saveTrue device'cpu'

3 yolov8 onnx 测试成果

onnx模型和测试完整代码,放在github上代码。 注:图片起源coco128

4 yolov8导出瑞芯微rknn和地平线horizon仿真测试

4.1 瑞芯微 rknn 仿真

瑞芯微环境搭建和具体步骤参考上一篇【瑞芯微RKNN模型转换和PC端仿真】。 yolov8导出rknn模型代码和后处置参考yolov8_rknn

4.2 地平线仿真

地平线环境搭建和具体步骤参考上一篇【地平线Horizon模型转换和PC端仿真测试】。 yolov8导出地平线模型代码和后处置参考yolov8_horizon

5 官网导出onnx模式启动瑞芯微rknn和地平线horizon仿真测试

yolov8 官网模型启动瑞芯微RKNN和地平线Horizon芯片仿真测试部署

6 rknn 板端C++部署

C++完整部署代码和模型示例

把板端C++代码的模型和时耗也给贴进去供大家参考,经常使用芯片rk3588。

7 yolov8seg 部署

想尝试yolov8seg的小同伴看上来,参考链接yolov8seg完整部署代码和模型示例