1 / 20

基于可重构处理器的算子库研究

基于可重构处理器的算子库研究. 目录. OpenMP 并行编程模型. 视觉处理算子库. harris 角点检测算法优化. 下一步工作展望. OpenMP 并行编程模型. OpenMP 并行编程模型 是 一种面向共享 内存的多处理器 核心 多 线程并行编程语言,是一种能够被用于显示指导多 线程的 应用程序接口,具有良好的可移植性。. OpenMP 并行编程模型. OpenMP 并行编程模型. 设计并行算法的四个阶段 划分 ( Partitioning ) 通信 ( Communication ) 组合 ( Agglomeration )

etta
Download Presentation

基于可重构处理器的算子库研究

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 基于可重构处理器的算子库研究

  2. 目录 OpenMP并行编程模型 视觉处理算子库 harris角点检测算法优化 下一步工作展望

  3. OpenMP并行编程模型 OpenMP并行编程模型是一种面向共享内存的多处理器核心多线程并行编程语言,是一种能够被用于显示指导多线程的应用程序接口,具有良好的可移植性。

  4. OpenMP并行编程模型

  5. OpenMP并行编程模型 • 设计并行算法的四个阶段 • 划分(Partitioning) • 通信(Communication) • 组合(Agglomeration) • 映射(Mapping)

  6. OpenMP并行编程模型

  7. 视觉处理算子库 核心算子表达式为: 核心算子(IN1,IN2,IN3,OUT1)_MN

  8. 视觉处理算子库 • concurrent(ccrt) • ccrt[核心算子A(in1,in2,in3,out1)_MN,核心算子B(in1,in2,in3,out1)_MN] • loop • loop[核心算子(in1,in2,in3,out1)_MN,迭代次数,增量1,增量2,增量3] • fork join • fork • 第一部分表达式 • fork • 第二部分表达式 • join

  9. 视觉处理算子库 扩展算子表达式为: {核心算子A(in1,in2,in3,out1)_MN;核心算子B(in1,in2,in3,out1)_MN} ADD4(a,b,c,d,out) {ccrt[ADD(a,b,,out1)_11,ADD(c,d,,out2)]_12;ADD(ab,cd,,out)_11} 其中ab表示a与b的和;cd表示c与d的和。

  10. 视觉处理算子库

  11. 视觉处理算子库

  12. 视觉处理算子库 • 核心算子: • SUB • CO • 扩展算子: • {SUB;CO} T=a-b if a>=b if a<b Out2=T[32] Out2=0 Out2=1 R=Out2 ? a:b R=b R=a

  13. 视觉处理算子库 int m[4][256]={0}; //shared memory int n1[16]={0}; //16个PE中的router通道1 int n2[16]={0}; //16个PE中的router通道2 int n1_buf[16]={0}; //router通道1缓存 int n2_buf[16]={0}; //router通道2缓存 int r[16][16]={0}; //PE中的寄存器堆 m[0][0]=88;m[1][0]=99;m[2][0]=1;m[3][0]=0;m[0][1]=22;m[1][1]=30;m[2][1]=77;m[3][1]=6;m[0][2]=2; //输入来源 omp_set_num_threads(16);

  14. harris角点检测算法优化

  15. harris角点检测算法优化 C(i+x,j+y)= N(i,j)= x,y 并且x,y不同时为0

  16. harris角点检测算法优化

  17. harris角点检测算法优化 [1] E. Rostenand T. Drummond, “Fusing Points and Lines for High Performance Tracking,” Proc. 10th IEEE Int’l Conf. Computer Vision, vol. 2, pp. 1508-1515, 2005. [2] E. Rosten, G. Reitmayr, and T. Drummond, “Real-Time Video Annotations for Augmented Reality,” Proc. Int’l Symp. Visual Computing, 2005. [3] E. Rosten and T. Drummond, “Machine Learning for High Speed Corner Detection,” Proc. Ninth European Conf. Computer Vision, vol. 1, pp. 430-443, 2006. [4] E. Rosten, R. Porter, and T. Drummond. Faster and better: A machine learning approach to corner detection. IEEE Trans. Pattern Analysis and Machine Intelligence, 32:105–119, 2010. 1 [5] G. Chenguang, L. Xianglong, Z. Linfeng, L. Xiang, “A Fast and Accurate Corner Detector Based on Harris Algorithm”, in Proc. of Third Int. Symposium Intelligent Information Technology Application, Nanchang, 2009, pp. 49–52.

  18. harris角点检测算法优化 [6] A. Neubeck and L. Van Gool, “Efficient non-maximum suppression,” in Proc. IEEE Int. Conf. Patt. Recog., vol. 3. Sep. 2006, pp. 850–855. [7] P. Mainali, Q. Yang, G. Lafruit, R. Lauwereins, and L. V. Gool, “LOCOCO: Low complexity corner detector,” in Proc. IEEE Int. Conf. Acou., Speech Signal Process., Mar. 2010, pp. 810–813. [8] P. Mainali, Q. Yang, G. Lafruit, L. Gool, and R. Lauwereins, “Robust low complexity corner detector,” IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 4, pp. 435–445, Apr. 2011.

  19. 下一步工作展望 • 完成小论文的初稿。(5星期) • 完成harris角点检测和NCC匹配的分解映射。(4星期) • 使用perl语言完成从算子表达式到OpenMP的翻译。(2星期) • 基于小论文再完成一篇专利的撰写。(1星期)

  20. Thank You !

More Related