fast exact string matching on the gpu n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Fast Exact String Matching On the GPU PowerPoint Presentation
Download Presentation
Fast Exact String Matching On the GPU

Loading in 2 Seconds...

play fullscreen
1 / 17

Fast Exact String Matching On the GPU - PowerPoint PPT Presentation


  • 280 Views
  • Uploaded on

Fast Exact String Matching On the GPU. Michael C. Schatz and Cole Trapnell May 8, 2007 CMSC 740 Computer Graphics. String Matching Applications. A very common problem in computational biology is to find all occurrences (or approximate occurrences) of one string in another string

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Fast Exact String Matching On the GPU' - johana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
fast exact string matching on the gpu

Fast Exact String Matching On the GPU

Michael C. Schatz and Cole Trapnell

May 8, 2007

CMSC 740 Computer Graphics

string matching applications
String Matching Applications
  • A very common problem in computational biology is to find all occurrences (or approximate occurrences) of one string in another string
    • Genome Assembly, Gene Finding, Comparative Genomics, Functional analysis of proteins, Motif discovery, SNP analysis, Phylogenetic analysis, Primer Design
    • Short Read Resequencing: 200 Million 50bp reads
  • Sequence databases are huge, and growing exponentially
    • We need ever faster methods for string matching
suffix trees to the rescue
Suffix Trees to the Rescue
  • Tree of all suffixes of string S
    • Suffix i encoded on path to leaf i
    • Nodes: positions where suffixes diverge
    • Edges: substrings of S
    • Leaves: starting position of suffix
    • Suffix Links: traverse to next suffix
  • O(n) Construction
    • Ukkonen’s Algorithm
    • Exploits inter-suffix relationships and suffix links
  • O(k) Query Match
    • Every substring S[i,j] is a prefix of suffix i.
    • Walk from root following the characters in the query Q.
    • One leaf for each occurrence of Q in T.

Suffix tree of “ACATAC$”

*858E Algorithms for Biosequence Analysis

suffix tree search
Suffix Tree Search

TAC$

$

A

C

7

4

TAC$

$

ATAC$

C

3

6

2

ATAC$

$

5

1

Searching for “ATA”…

Suffix tree of “ACATAC$”

suffix tree search1
Suffix Tree Search

TAC$

$

A

C

7

4

TAC$

$

ATAC$

C

3

6

2

ATAC$

$

5

1

Searching for “ATA”…

Suffix tree of “ACATAC$”

suffix tree search2
Suffix Tree Search

TAC$

$

A

C

7

4

TAC$

$

ATAC$

C

3

6

2

ATAC$

$

5

1

Searching for “ATA”…

Suffix tree of “ACATAC$”

suffix tree search3
Suffix Tree Search

TAC$

$

A

C

7

4

TAC$

$

ATAC$

C

3

6

2

ATAC$

$

5

1

Searching for “ATA”

found at position 3!

Suffix tree of “ACATAC$”

suffix tree search4
Suffix Tree Search

TAC$

$

A

C

7

4

TAC$

$

ATAC$

C

3

6

2

ATAC$

$

5

1

Searching for “AC”

found at positions 1 & 5

Searching for “ACT”

“falls off” tree => Not in S

Suffix tree of “ACATAC$”

gpgpu programming
GPGPU Programming
  • Utilize the highly parallel SIMD architecture of the GPU
    • Nominally used for in parallel triangle rendering, texture application
    • Each processor executes same kernel
    • Dramatic runtime improvement for scientific applications
  • CUDA Architecture
    • API and runtime library to implement C style programming of stream processors
  • nVidia GeForce 8800 GTX (G80)
    • 16 multiprocessors w/ 8 processors
      • 128 stream processors @ 1.35 GHz
    • 768 MB total on board RAM
    • 2D Texture Cache for large readonly data

*Image from CUDA Programming Guide

cmatch gpu algorithm
Cmatch GPU Algorithm
  • Load Reference String
  • Create Suffix Tree
  • Load Query Strings
  • Transfer data to GPU
  • Execute Query Kernel
    • Up to 128 simultaneous matches on GPU
  • Fetch Results from GPU
  • Output results
data structures on the gpu
Data Structures on the GPU
  • Suffix tree nodes => 2D Texture
    • Encode node information & children pointers as RGBA color of texel
    • Arrange nodes in 32x32 blocks along space filling curve
    • Optimize near root for inter-thread caching, further down for an individual thread.
  • Reference String => 2D Texture
    • Access many successive characters along edge
  • Query Strings => On Board RAM
    • |Q| array with offsets in a large array of strings
  • Results buffer => On Board RAM
    • |Q| array with id of last visited node for query i
experimental protocol
Experimental Protocol
  • Comparing running time of (serial) CPU versus (parallel) GPU programs
    • CPU: 3.0 GHz Intel Xeon
    • GPU: nVidia GeForce 8800 GTX (128 processors @ 1.35 GHz)
  • Simulate short read resequencing projects by extracting substrings of reference sequences
    • References
      • Genome of Bacillus anthracis (5.20 Mbp)
      • Genome of Yersinia pestis (4.6 Mbp)
      • BAC-sized portion of Human Chromosome 2 (200 kbp)
    • Query sets (250 Mbp total)
      • 10 million x 25 bp
      • 5 million x 50 bp
      • 1.25 million x 200 bp
      • 312,500 x 800 bp
query time results
Query Time Results

Speedup of the GPU match kernel versus CPU match program.

long read query time results
Long Read Query Time Results

Future work to improve cache hit rate for longer reads.

processing time
Processing Time

GPU Cmatch is bounded by time to construct suffix tree and IO processing time

conclusions
Conclusions
  • We have reduced the computation processing time for short read resequencing from hours to minutes.
    • Make sure you have sufficient cooling available
  • Low arithmetic intensity GPGPU programs can have dramatic performance improvements (35x) over CPU execution
    • Utilizing the texture cache with careful node placement and minimizing register use were essential to high performance
  • A single GPU can supply same processing power as a small computer cluster at a fraction of the cost
    • Installing GPUs into an existing cluster can provide an order of magnitude increase in computing capacity.
  • More information:
    • http://www.cbcb.umd.edu/software/cmatch
texture space filling curve

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

Texture Space filling curve
  • Texture cache organized in 2x2 blocks.
  • Try to place all children of a node are in the same cache block