1 / 8

Application Performance through Hardware Acceleration

Dan Legorreta, Moshe Looks, Shobana Padmanabhan CSE 560 Oct 2005. Application Performance through Hardware Acceleration. [Hierarchical] Clustering [in Hardware]. Clustering Assign points in a space to non-overlapping clusters Minimize inter-cluster distances

Download Presentation

Application Performance through Hardware Acceleration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dan Legorreta, Moshe Looks, Shobana Padmanabhan CSE 560 Oct 2005 Application Performance throughHardware Acceleration

  2. [Hierarchical] Clustering [in Hardware] • Clustering • Assign points in a space to non-overlapping clusters • Minimize inter-cluster distances • Maximize intra-cluster distances • Hierarchical Clustering • Cluster the clusters; generates a tree (dendogram) showing hierarchical structure of the data • Agglomerative (bottom-up) or Partitioning (top-down) • Why do it in hardware? • Clustering often applied to biology or internet data with millions of items to cluster, and thousands of dimensions • Clustering may be applied to high-volume datastreams • Clustering algorithms are slow ~ O(n2d) or worse

  3. What’s Been Done? • K-means, the most popular flat clustering algorithm, has been implemented in hardware: • M. Estlick, M. Leeser, J. Theiler, and J. J. Szymanski, “Algorithmic Transformations in the Implementation of K-means Clustering on Reconfigurable Hardware” (FPGA2001). • 17 citations, incl. other hardware implementations of flat clustering algorithms • Hierarchical Clustering • M.Y. Niamat, D. Bitter, and M.M. Jamali, “FPGA Implementation of Hierarchical Clustering Algorithms” (ISCAS1998). • Simple agglomerative clustering on 8 Xilinx 4003APC84 FPGAs • They just coded in VHDL and simulated it; no results given! • No other papers found • No known experimental results or implementations of top-down hierarchical clustering in hardware!

  4. What we plan to do: • Start with a software implementation. • Profile it to find out where most of the runtime is being spent. • Implement that function in hardware as a special instruction. • Interface it with the processor. • Modify the compiler to use it during code generation. • Study & report the performance speedup.

  5. How we plan to do it: • Use Liquid Architecture platform developed by Reconfigurable Network Group (RNG) of ARL and DOC group at Washington University. • It provides a cycle-accurate, non-intrusive profiler and an easy-to-use web interface. Profiling can be per method & any microarchitecture parameter can be profiled. • It uses an open source soft core processor called LEON (SPARC8), on FPX platform developed by the RNG. • FPX comprises Xilinx Virtex-E, network connectivity & off-chip SRAM (4MB). • Limitation is the lack of an operating system to run the application on. • Interface the hardware instruction through LEON’s AMBA bus interface.

  6. Plan

  7. References for hierarchical clustering in hardware • Hierarchical Clustering in Hardware - Papers1. Transformation Algorithms for Data StreamsJohn W. Lockwood, Stephen G. Eick, Doyle J. Weishar, Ron Loui, James Moscola, Chip Kastner, Andrew Levine, Mike Attighttp://www.arl.wustl.edu/~lockwood/publications/WashU-AERO_2005-AFE_Summer_Experiment_Paper.pdf2. • Implementation of a Content-Scanning Module for an Internet FirewallJames Moscola, John Lockwood, Ronald P. Loui, Michael Pachoshttp://www.arl.wustl.edu/projects/fpx/references/FCCM03/wu-content_scanning_firewall-FCCM_03-paper.pdf3. • FPsed: A Streaming Content Search-and-Replace Module for an Internet FirewallJames Moscola, Michael Pachos, John Lockwood, Ronald P. Louihttp://www.arl.wustl.edu/~lockwood/publications/hoti11_fpsed.pdf4. • Methods and Architectures for Realizing Fast Phylogenetic ComputationEngines Using VLSI Array Based LogicJames P. Davis, Sreesa Akella, Peter Waddellhttp://www.cse.sc.edu/~jimdavis/Research/Papers-PDF/Bioinformatics02-Davis-Akella-Waddell%5B1%5D.pdf5. • FPGA Implementation of Hierarchical Clustering AlgorithmsNiamat, M.Y., Bitter, D., Jamali, M.M.http://ieeexplore.ieee.org/iel4/5627/15118/00694410.pdf?arnumber=6944106. • Parallel Algorithms for Hierarchical ClusteringClark F. Olsonhttp://citeseer.ist.psu.edu/olson95parallel.html7. • Digital VLSI for Neural NetworksDan Hammerstromhttp://www.cecs.pdx.edu/~strom/papers/hammerstrom_draft2.pdf8. • Simulation of paleocortex performs hierarchical clusteringJ Ambros-Ingerson, R Granger, G Lynchhttp://www.jstor.org/view/00368075/di002048/00p0487f/0#&origin=sfx%3Asfx9. • Algorithmic Transformations in the Implementation ofK-means Clustering on Reconfigurable HardwareMike Estlick, Miriam Leeser, James Theiler, John J. Szymanskihttp://delivery.acm.org/10.1145/370000/360311/p103-estlick.pdf?key1=360311&key2=4848397211&coll=GUIDE&dl=ACM&CFID=54014978&CFTOKEN=8441184810. • Design Issues for Hardware Implementation of an Algorithm for Segmenting Hyperspectral Imagery James Theiler, Miriam Leeser, Michael Estlick, and John J. Szymanskihttp://mrfrench.lanl.gov/~jt/Papers/kmeans-spie-00.ps11. • FPGA Implementation of a Network of Neuronlike Adaptive Elements Andres Perez-Uribe and Eduardo Sanchezhttp://lslwww.epfl.ch/~aperez/ps/PerezSanchez_icann97.ps.gz12. • A Phylogenetic, Ontogenetic, and Epigenetic View of Bio-Inspired Hardware SystemsMoshe Sipper, Eduardo Sanchez, Daniel Mange,Marco Tomassini, Andres Perez-Uribe, and Andre Staufferhttp://www.cs.virginia.edu/bio/Sipper_POEmodel_97.pdf

  8. References for custom instructions • Gaisler Research. http://www.gaisler.com • Shobana Padmanabhan, Phillip Jones, et. al. Extracting and Improving Microarchitecture Performance on Reconfigurable Architectures. In Workshop on Compilers and Tools for Constrained Embedded Systems workshop at Inter. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Washington DC, Sep 2004. • John W. Lockwood. The Fieldprogrammable Port Extender (FPX). http://www.arl.wustl.edu/arl/projects/fpx/, December 2003. • Paolo Ienne Kubilay Atasu, Laura Pozzi. Automatic application-specific instruction-set extensions under microarchitectural constraints. Int’l Symp. on Field Programmable Gate Arrays, pages 190–199, 2004. • Michael Gschwind. Instruction set selection for ASIP design. In Proc. of the 7th Int’l Symp. on Hardware/Software Codesign, pages 7–11, May 1999. • N. Clark, W. Tang, S. Mahlke. Automatically Generating Custom Instruction Set Extensions. Workshop on Application Specific Processors. Nov 2002, Istanbul, Turkey. • A. K. Verma, K. Atasu, M. Vuleti´c, L. Pozzi, P. Ienne. Automatic Application-Specific Instruction-Set Extensions under Microarchitectural Constraints. Nov 2002, Istanbul, Turkey. • Kenshu Seto, Kojima Yoshihisa, Masahiro Fujita. Compiler Techniques for Field Modifiable Architectures. In Workshop on Compilers and Tools for Constrained Embedded Systems workshop at Inter. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Washington DC, Sep 2004.

More Related