1 / 14

Application Performance through Hardware Acceleration

Shobana Padmanabhan, Dan L egorreta, Moshe Looks CSE 560 Oct 2005. Application Performance through Hardware Acceleration. Application Performance. Architecture. Compiler. Algorithm. Liquid architecture platform. Workstation. program. FPGA. gcc. SRAM / SDRAM. Memory Controller.

Download Presentation

Application Performance through Hardware Acceleration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shobana Padmanabhan, Dan Legorreta, Moshe Looks CSE 560 Oct 2005 Application Performance throughHardware Acceleration

  2. Application Performance Architecture Compiler Algorithm

  3. Liquid architecture platform Workstation program FPGA gcc SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Clustering application FPX LEON 001010 110110 001110 • LEON - SPARC8 compatible & • Open soft core

  4. Application runtime Workstation FPGA SRAM / SDRAM Memory Controller Results & Timing 001010 110110 001110 Core Cache Controller Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Slow! Where is time spent? FPX LEON

  5. Pipeline Stalls Branch Predict Function Time / Cycles Cache Hits / Misses Read Write .text main findMatch Can profile all aspects of micro-architecture addQuery computeKey computeBase coreLoop fillQuery Rnd

  6. Cycle-accurate profiling for free Workstation FPGA SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller Statistics Module Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface Request Timings FPX findMatch 500ms coreLoop 300ms LEON

  7. Improve application performance • By reconfiguring the processor • By creating special hardware instructions

  8. Reconfigure architecture

  9. Special hardware instruction Workstation program FPX FPGA gcc LEON SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller + dot product Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface

  10. Special hardware instruction Workstation program FPX FPGA gcc LEON SRAM / SDRAM Memory Controller 001010 110110 001110 Core Cache Controller + dot product Address/ Data bus AHB I-CACHE D-CACHE Command Controller Control S/W Interface 001010 110110 001110

  11. Related work • Gaisler Research. http://www.gaisler.com • Lesley Shannon and Paul Chow. Using reconfigurability to achieve real-time profiling for hardware/software codesign. In Proc. ACM Int’l Symp. on Field Programmable Gate Arrays, pages 190–199, 2004. • T. Vinod Kumar Gupta, Roberto E. Ko, and Rajeev Barua. Compiler-directed customization of ASIP cores. In Proc. of the 10th Int’l Symp. on Hardware/Software Codesign, pages 97–102, May 2002. • Shobana Padmanabhan, Phillip Jones, et. al. Extracting and Improving Microarchitecture Performance on Reconfigurable Architectures. In Workshop on Compilers and Tools for Constrained Embedded Systems workshop at Inter. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Washington DC, Sep 2004. • Stretch, Inc. http://www.stretchinc.com. • Tensilica, Inc. http://www.tensilica.com. • John W. Lockwood. The Fieldprogrammable Port Extender (FPX). http://www.arl.wustl.edu/arl/projects/fpx/, December 2003. • Paolo Ienne Kubilay Atasu, Laura Pozzi. Automatic application-specific instruction-set extensions under microarchitectural constraints. Int’l Symp. on Field Programmable Gate Arrays, pages 190–199, 2004. • Michael Gschwind. Instruction set selection for ASIP design. In Proc. of the 7th Int’l Symp. on Hardware/Software Codesign, pages 7–11, May 1999. • N. Clark, W. Tang, S. Mahlke. Automatically Generating Custom Instruction Set Extensions. Workshop on Application Specific Processors. Nov 2002, Istanbul, Turkey. • A. K. Verma, K. Atasu, M. Vuleti´c, L. Pozzi, P. Ienne. Automatic Application-Specific Instruction-Set Extensions under Microarchitectural Constraints. Nov 2002, Istanbul, Turkey. • Kenshu Seto, Kojima Yoshihisa, Masahiro Fujita. Compiler Techniques for Field Modifiable Architectures. In Workshop on Compilers and Tools for Constrained Embedded Systems workshop at Inter. Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES), Washington DC, Sep 2004.

  12. Related work – cntd. • Hierarchical Clustering in Hardware - Papers1. Transformation Algorithms for Data StreamsJohn W. Lockwood, Stephen G. Eick, Doyle J. Weishar, Ron Loui, James Moscola, Chip Kastner, Andrew Levine, Mike Attighttp://www.arl.wustl.edu/~lockwood/publications/WashU-AERO_2005-AFE_Summer_Experiment_Paper.pdf2. • Implementation of a Content-Scanning Module for an Internet FirewallJames Moscola, John Lockwood, Ronald P. Loui, Michael Pachoshttp://www.arl.wustl.edu/projects/fpx/references/FCCM03/wu-content_scanning_firewall-FCCM_03-paper.pdf3. • FPsed: A Streaming Content Search-and-Replace Module for an Internet FirewallJames Moscola, Michael Pachos, John Lockwood, Ronald P. Louihttp://www.arl.wustl.edu/~lockwood/publications/hoti11_fpsed.pdf4. • Methods and Architectures for Realizing Fast Phylogenetic ComputationEngines Using VLSI Array Based LogicJames P. Davis, Sreesa Akella, Peter Waddellhttp://www.cse.sc.edu/~jimdavis/Research/Papers-PDF/Bioinformatics02-Davis-Akella-Waddell%5B1%5D.pdf5. • FPGA Implementation of Hierarchical Clustering AlgorithmsNiamat, M.Y., Bitter, D., Jamali, M.M.http://ieeexplore.ieee.org/iel4/5627/15118/00694410.pdf?arnumber=6944106. • Parallel Algorithms for Hierarchical ClusteringClark F. Olsonhttp://citeseer.ist.psu.edu/olson95parallel.html7. • Digital VLSI for Neural NetworksDan Hammerstromhttp://www.cecs.pdx.edu/~strom/papers/hammerstrom_draft2.pdf8. • Simulation of paleocortex performs hierarchical clusteringJ Ambros-Ingerson, R Granger, G Lynchhttp://www.jstor.org/view/00368075/di002048/00p0487f/0#&origin=sfx%3Asfx9. • Algorithmic Transformations in the Implementation ofK-means Clustering on Reconfigurable HardwareMike Estlick, Miriam Leeser, James Theiler, John J. Szymanskihttp://delivery.acm.org/10.1145/370000/360311/p103-estlick.pdf?key1=360311&key2=4848397211&coll=GUIDE&dl=ACM&CFID=54014978&CFTOKEN=8441184810. • Design Issues for Hardware Implementation of an Algorithm for Segmenting Hyperspectral Imagery James Theiler, Miriam Leeser, Michael Estlick, and John J. Szymanskihttp://mrfrench.lanl.gov/~jt/Papers/kmeans-spie-00.ps11. • FPGA Implementation of a Network of Neuronlike Adaptive Elements Andres Perez-Uribe and Eduardo Sanchezhttp://lslwww.epfl.ch/~aperez/ps/PerezSanchez_icann97.ps.gz12. • A Phylogenetic, Ontogenetic, and Epigenetic View of Bio-Inspired Hardware SystemsMoshe Sipper, Eduardo Sanchez, Daniel Mange,Marco Tomassini, Andres Perez-Uribe, and Andre Staufferhttp://www.cs.virginia.edu/bio/Sipper_POEmodel_97.pdf

  13. Plan

  14. Reconfigurable architecture • Generic processor - cheap but application-agnostic; compilers exist; compiler optimization is the key • Reconfigurable logic - subject of our study;architecture and compiler research are the key • Customized logic - ideal for an application but expensive; logic/architecture research is key Pentium FPGA Custom

More Related