1 / 26

Efficient Implementation of a String Matching Algorithm for SRC and Cray Reconfigurable Computers

Efficient Implementation of a String Matching Algorithm for SRC and Cray Reconfigurable Computers. Esam El-Araby 1 , Mohamed Taher 1 , Tarek El-Ghazawi 1 , Mohamed Abouellail 1 , Nandakishore Sastry 2 , and Kris Gaj 2 1 The George Washington University, 2 George Mason University. Outline.

libitha
Download Presentation

Efficient Implementation of a String Matching Algorithm for SRC and Cray Reconfigurable Computers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Implementation of a String Matching Algorithm for SRC and Cray Reconfigurable Computers Esam El-Araby1, Mohamed Taher1, Tarek El-Ghazawi1, Mohamed Abouellail1, Nandakishore Sastry2, andKris Gaj2 1The George Washington University,2George Mason University

  2. Outline • Introduction • SRC Hardware & Software • Cray XD1 Hardware & Software • String Matching Algorithms • Implementation Methodology • Results and Comparisons • Conclusions

  3. Microprocessor System Reconfigurable Processor System . . . P P . . . FPGA FPGA P memory P memory FPGA memory FPGA memory . . . . . . Interface Interface I/O I/O Introduction

  4. Outline • Introduction • SRC Hardware & Software • Cray XD1 Hardware & Software • String Matching Algorithms • Implementation Methodology • Results and Comparisons • Conclusions

  5. SRC Hi-Bar Switch Common Memory SNAP™ Memory SNAP Memory Common Memory MAP® MAP Wide Area Network P P Chaining GPIO Local Area Network SRC-6 Gig Ethernet etc. PCI-X PCI-X Storage Area Network Disk Customers’ Existing Networks SRC Architecture(Hi-BarTM Based Systems) • Hi-Bar sustains 1.4 GB/s per port with 180 ns latency per tier • Up to 256 input and 256 output ports with two tiers of switch • Common Memory (CM) has controller with DMA capability • Controller can perform other functions such as scatter/gather • Up to 8 GB DDR SDRAM supported per CM node

  6. SRC Reconfigurable Processor

  7. SRC Programming Environment HDL (VHDL) HLL (C) FPGA system P system SRC Programming Environment

  8. User Macro sources Application sources Application sources HDL HDL sources . . vhd vhd or or .v files .v files .c or .f files .c or .f files .v files .edf files Logic synthesis Logic synthesis m m MAP Compiler MAP Compiler P P Compiler Compiler . . edf files files Object Object .o files .o files .o files .o files files files Place & Route Place & Route Linker Linker .bin files .bin files Configuration Configuration Application Application bitstreams bitstreams executable executable SRC Programming Environment (cnt’d)

  9. SRC Programming Environment (cnt’d) FPGA contents after the Function_1 call Program in C or Fortran Main program Function_1 a …… FPGA Macro_1(a, b, c) Macro_2(b, d) Macro_2(c, e) Function_1(a, d, e) Macro_1 …… c b Function_2 Macro_2 Macro_2 Macro_3(s, t) Macro_1(n, b) Macro_4(t, k) Function_2(d, e, f) d e ……

  10. Outline • Introduction • SRC Hardware & Software • Cray XD1 Hardware & Software • String Matching Algorithms • Implementation Methodology • Results and Comparisons • Conclusions

  11. RapidArray components in a Cray XD1 chassis Cray XD1 System Architecture(One Chassis) Compute • 12 AMD Opteron 32/64 bit, x86 processors • High Performance Linux RapidArray Interconnect • 12 communications processors • 1 Tb/s switch fabric Active Management • Dedicated processor Application Acceleration • 6 co-processors FPGA and 2nd RAP are on Expansion Module

  12. UserLogic RapidArray Transport Core QDR RAM Interface Core ADDR(20:0) D(35:0) Q(35:0) ADDR(20:0) D(35:0) TX QDR II SRAM Q(35:0) RAP RX ADDR(20:0) D(35:0) Q(35:0) ADDR(20:0) D(35:0) RapidArrayTransport Q(35:0) Virtex-II Pro Cray XD1 Application Acceleration Interfaces • XC2VP30-50 running at up to 200 MHz • 4 QDR II RAM  with over 400 HSTL-I I/O at 200 MHz DDR (400 MTransfers/s) • 16 bit simplified HyperTransport I/F at 400 MHz DDR (800 MTransfers/s) • QDR and HT I/F take up <20 % of XC2VP30.  The rest is available for user applications

  13. Hardware Flow Software Flow Standard Hardware Flow Cray XD1 Development Flow

  14. Standard Flow Additional High-Level Tools Cray XD1 Hardware Development Flow

  15. Design Methodology using Cray XD1 • Write application in C for system microprocessor • Identify computation intense routine(s) • Generate a bitstream using Cray Cores (RT & QDRII) and language of choice • Create module in HDL (Verilog, VHDL) • Create module using High Level Language Tools • Validate Module • Synthesize using (XST, Leonardo, Synplify Pro) • Create bitstream using Xilinx place & route tools • Replace routines with Cray API calls • Run Application

  16. Outline • Introduction • SRC Hardware & Software • Cray XD1 Hardware & Software • String Matching Algorithms • Implementation Methodology • Results and Comparisons • Conclusions

  17. String Matching - Introduction • String Matching – detecting the occurrence of a particular substring, called the pattern, in another string, called the text • Types of String matching: • Exact string matching • Approximate string matching • Exact string matching: • Involves match patterns, where they exist completely, that is unbroken and with no irrelevant data in between any letters • Numerous Applications : NIDS, text editing, …etc. • Approximate string matching: • Pattern rarely matches the text completely • Finds application in Computational biology (DNA matching), image detection, handwriting recognition…etc.

  18. Why align two protein or DNA sequences? Determine whether they are descended from a common ancestor (homologous) Infer a common function Locate functional elements Infer protein structure, if the structure of one of the sequences is known Problem: find the best pairwise alignment of GAATC and CATAC GAATC CATAC GAAT-C C-ATAC -GAAT-C C-A-TAC GAATC- CA-TAC GAAT-C CA-TAC GA-ATC CATA-C DNA Matching Basics • We need a way to measure the quality of a candidate alignment • Alignment scores consist of two parts: • substitution matrix • gap penalty

  19. A C G T A 10 -5 0 -5 C -5 10 -5 0 G 0 -5 10 -5 T -5 0 -5 10 A hypothetical substitution matrix DNA Matching Basics (cnt’d) Scoring aligned bases Transversion (expensive) GAAT-C CA-TAC Transition (cheap) -5 + 10 + ? + 10 + ? + 10 = ? Scoring gaps • Linear gap penalty: every gap receives a score of d GAAT-C d=-4 CA-TAC -5 + 10 + -4 + 10 + -4 + 10 = 17 • Affine gap penalty: opening a gap receives a score of d; extending a gap receives a score of e G--AATC d=-4 CATA--C e=-1 -5 + -4 + -1 + 10 + -4 + -1 + 10 = 5

  20. A Read sequences A & B Into two arrays Compute Similarity Matrix [i] [j] Set traceback & Similarity matrix to (A+1) * (B+1) Update traceback Array 1’s row & column of Similarity Matrix = 0 Similarity Matrix Complete? NOTE: Traceback array carries the coordinates of one of three cells involved in the calculation of the cell [i] [j] in the similarity matrix no Initialize traceback Arrays by setting to -1 (default value) yes Traceback for best alignments A Approximate String Matching Algorithm(Smith-Waterman Algorithm)

  21. Outline • Introduction • SRC Hardware & Software • Cray XD1 Hardware & Software • String Matching Algorithms • Implementation Methodology • Results and Comparisons • Conclusions

  22. C function for P P System Software Only Implementation C function for MAP VHDL Macro Software/Hardware Implementation FPGA System VHDL Hardware Only Implementation Implementation Schemes in SRC

  23. Operational Environment FPGA-Initiated Transfers Write-Only Transfers µP-Initiated Transfers Operational Scenarios for Cray XD1

  24. Outline • Introduction • SRC Hardware & Software • Cray XD1 Hardware & Software • String Matching Algorithms • Implementation Methodology • Results and Comparisons • Conclusions

  25. Performance Results • Rate = (FPGA freq.) X (cycles/cell) X (# SWPEs) • Opteron Implementation (SSEARCH34)* • 100 Million Cell Updates Per Second (CUPS) • Cray Inc. Implementation* • Current unoptimized design • 80 MHz X 1 X 32 = 2.56 Billion CUPS (GCUPS) • With optimization • 100 MHZ x 1 x 50 = 5.0 GCUPS • With future Virtex 4 FPGA • 100 MHZ x 1 x 150 = 15 GCUPS • 25x speedup vs. Opteron • Our Implementation • SRC-6 • Current unoptimized design • 100 MHz X 1 X (16x16) = 25.6 GCUPS • 10x speedup vs. Cray • 256x speedup vs. Opteron • Cray XD1 • Current unoptimized design • 200 MHz X 1 X (16x16) = 51.2 GCUPS • 20x speedup vs. Cray • 512x speedup vs. Opteron *CUG’05, New Mexico, May 2005

  26. Conclusions • Smith-Waterman sequence alignment algorithm has been implemented on both SRC-6 and Cray XD1 systems • Similarities and differences are highlighted with regard to: • System hardware architecture • Ease of programming • Programming model • Development time • Hardware/software libraries • Performance • The speed-up vs. microprocessor is reported • Primary bottlenecks limiting the performance of both systems are recognized • The capability to share and port applications between the SRC and Cray systems is explored

More Related