1 / 10

Application Performance through Hardware Acceleration

Dan L egorreta, Moshe Looks, Shobana Padmanabhan CSE 560 Oct 2005. Application Performance through Hardware Acceleration. [Hierarchical] Clustering [in Hardware]. Clustering Assign points in a space to non-overlapping clusters Minimize inter-cluster distances

kiele
Download Presentation

Application Performance through Hardware Acceleration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dan Legorreta, Moshe Looks, Shobana Padmanabhan CSE 560 Oct 2005 Application Performance throughHardware Acceleration

  2. [Hierarchical] Clustering [in Hardware] • Clustering • Assign points in a space to non-overlapping clusters • Minimize inter-cluster distances • Maximize intra-cluster distances • Hierarchical Clustering • Cluster the clusters; generates a tree (dendogram) showing hierarchical structure of the data • Agglomerative (bottom-up) or Partitioning (top-down) • Why do it in hardware? • Clustering often applied to biology or internet data with millions of items to cluster, and thousands of dimensions • Clustering may be applied to high-volume datastreams • Clustering algorithms are slow ~ O(n2d) or worse

  3. What’s Been Done? • K-means, the most popular flat clustering algorithm, has been implemented in hardware: • M. Estlick, M. Leeser, J. Theiler, and J. J. Szymanski, “Algorithmic Transformations in the Implementation of K-means Clustering on Reconfigurable Hardware” (FPGA2001). • 17 citations, incl. other hardware implementations of flat clustering algorithms • Hierarchical Clustering • M.Y. Niamat, D. Bitter, and M.M. Jamali, “FPGA Implementation of Hierarchical Clustering Algorithms” (ISCAS1998). • Simple agglomerative clustering on 8 Xilinx 4003APC84 FPGAs • They just coded in VHDL and simulated it; no results given! • No other papers found • No known experimental results or implementations of top-down hierarchical clustering in hardware!

  4. Liquid architecture platform Workstation program FPGA gcc SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Clustering application FPX LEON 001010 110110 001110 • LEON - SPARC8 compatible & • Open soft core

  5. Application runtime Workstation Non-intrusive, cycle-accurate profiling from hardware implementation FPGA SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Statistics Module Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Request Timings FPX dotproduct 70% LEON

  6. Improve performance through hardware implementation + dot product

  7. Improve performance through hardware implementation Workstation program FPX FPGA gcc LEON SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller + dot product Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface APB

  8. Hardware acceleration Workstation program FPX FPGA gcc LEON SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller + dot product Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface APB 001010 110110 001110

  9. Dot product implementation Core Cache Controller Address/ Data bus AHB I-CACHE D-CACHE FPGA 0x800000D0 #2 #3 bitV #1 LEON 0x800000D4 Dot product circuit 001010 110110 001110 #2 #3 bitV #1 APB Memory Controller 0x800000D8 #2 #3 bitV #1 0x800000DC stat re result Command Controller

  10. Plan • Changes: • APB device with memory-mapped registers, instead of changing compiler. • Due to the overhead with APB, we are planning to also look at co-processor interface. • New schedule: • APB implementation, including dot-product, this week. • Co-processor interface, as much as possible, from next week.

More Related