1 / 26

Fast Sequence Alignment Methods Using CUDA-enabled GPU

Fast Sequence Alignment Methods Using CUDA-enabled GPU. Authors: Yeim-Kuan Chang, De-Yu Chen Present: Pei-Hua Huang Date : 2014/06/04. Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C. INTRODUCTION.

trinh
Download Presentation

Fast Sequence Alignment Methods Using CUDA-enabled GPU

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Sequence Alignment Methods Using CUDA-enabled GPU Authors: Yeim-KuanChang, De-Yu Chen Present: Pei-Hua Huang Date: 2014/06/04 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

  2. INTRODUCTION • Sequence alignment is the first step in bioinformatics research that calculates the degree of similarity between two sequences • A large of biological data is stored in the form of DNA, RNA and protein sequences, and made of their respective letters set shown below: • DNA : {A, C, G, T} • RNA : {A, C, G, U} • Protein : {A, R, N, D, C, Q, E, G, H, I, L, K, M, F, P, S, T, W, Y, V} Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  3. INTRODUCTION match gap mismatch • only a selected set of operations is allowed at each aligning position: • match one letter with another, • mismatch one letter with another, • align one letter with a gap (denoted by "-") symbol for increasing similarity Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  4. INTRODUCTION Global alignment: Local alignment • Sequence alignment can be further classified as: • Global alignment: align every letter of both sequences • Local alignment: find the similar region within two given sequences Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  5. Smith-Waterman algorithm • In order to quantify an alignment, SW algorithm uses a scoring mechanism which has three types of score as follows: • Match score if S1[i] = S2[j]. • Mismatch score if S1[i] ≠ S2[j] • Gap penalty score if S1[i] = ”-” or S2[j] = ”-”, or both • The match and mismatch scores obtained by looking up a substitution matrix in which each cell contains a score for the corresponding pair of based Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  6. Smith-Waterman algorithm Given two sequences S1 and S2 of length m and n Let H be a score matrix of size (m+1)(n+1). The recursive formula is defined as: where 1 ≤ i≤ m, 1 ≤ j ≤ n, sbt(S1[i], S2[j]) gets the match or mismatch score, g is the score of gap penalty H[ i][ 0 ] and H[ 0 ][ j ] for 0 ≤ i ≤ m and 0 ≤ j ≤ n are initialized as 0 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  7. Smith-Waterman algorithm Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  8. Smith-Waterman algorithm j → i↓ Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  9. Parallelism mechanisms (i-2) i (i-1) exploits the parallel characteristics of cells in the anti-diagonals, that is, each anti-diagonal cell can be computed independently of the others Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  10. Parallelism mechanisms The matrix is computed vector by vector in order parallel to the query sequence Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  11. Parallelism mechanisms processe the vector in parallel to the query sequence and at the same time it processes the vector from the next column in anti-diagonal direction Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  12. Proposed method j → i↓ Redefine the recursive formula of SW algorithm so that one row of the matrix can be calculated in parallel Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  13. Proposed method In order to compute the scores in parallel, we have the following new formula: Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  14. Proposed method Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  15. Proposed method expand to Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  16. Proposed method Next, the expanded formula is as follows. Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  17. Proposed method Next, the expanded formula is as follows. Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  18. Proposed method Where1≦i≦m, 1≦j≦n. The formula (a) can be performed by the threads assigned to the cells in parallel. After the first step is finished, formulas (b) and (c) are then executed. Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  19. Optimization • use prefix max scan to speed up the computation of the formula (b) The prefix max scan is defined as: Input: array[a1, a2,..., an]. Output: array[a1, M(a1, a2), M(a1, a3),...., M(a1, an)] The prefix max scan is performed in k steps where k = log2(n). For step i = 0 to k – 1, the element array[j] will be compared with array[j–2i] for j = 2i + 1 to n in parallel, and store the result in array[j] Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  20. Optimization Step 1 Step 2 Step 3 H’[1][1] Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  21. CUDA implementation One block to process one alignment task. Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  22. CUDA implementation Each thread in the block is used to process each cell of one row of the matrix Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  23. Experimental results execute on NVIDIA C1060 graphics card, with 30 MPs comprising 240 SPs and 4GB RAM, installed in a PC with an Intel Core 2 6420 2.13 GHz processor running the Ubuntu OS Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  24. Experimental results five databases are produced by Seq-Gen [16]. Each database includes 100 sequences of equal length, ranging from 128 to 8192 the substitution matrix BLOSUM 62 is used with gap penalty -4 Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  25. Experimental results Computer & Internet Architecture Lab CSIE, National Cheng Kung University

  26. Experimental results Computer & Internet Architecture Lab CSIE, National Cheng Kung University

More Related