1 / 19

Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA

Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA. Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang M ü ller-Wittig. Presenter: Erkan Okuyan. Motivation. Massive amount of sequencing data (Illumina – 454 - SOLID) (short reads - with high error rate)

mimi
Download Presentation

Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig Presenter: Erkan Okuyan

  2. Motivation • Massive amount of sequencing data (Illumina – 454 - SOLID) (short reads - with high error rate) • Assembly processes sensitive to errors in reads thus sequencing errors needs to be corrected • Size of error correction problem is computationally demanding

  3. Definitions - Let R = {r1, r2,…,rk} be a set of k reads with |ri| = L - Let ri be in {A, C, G, T}Lfor all 1 ≤ i ≤ k. - Let m (multiplicity) and l (length) satisfy m>1 and l<L • Definition1 (Solid and Weak): An l-tuple (a DNA string of length l) is called solid with respect to R and m if it is a substring of at least m reads in R and weak otherwise. • m-way replicated l-tuple is probably a correct l-tuple • Definition2 (Spectrum): The spectrum of R with respect to m and l, denoted as Tm,l(R), is the set of all solid l-tuples with respect to R and m. • Spectrum Tm,l(R) is the set of all correct l-tuples

  4. Definitions - Let R = {r1, r2,…,rk} be a set of k reads with |ri| = L - Let ri be in {A, C, G, T}Lfor all 1 ≤ i ≤ k. - Let m (multiplicity) and l (length) satisfy m>1 and l<L • Definition3 (T-string): A DNA string s is called aTm,l(R)-string if every l-tuple in s is an element of Tm,l(R). • Definition4 (SAP): Given a DNA string s and spectrum Tm,l(R). Find aTm,l(R)-string s* in the set of Tm,l(R)-strings that minimizes the distance function d(s,s*).

  5. CUDA (Compute UnifiedDevice Architecture) • Integrated host+device app program • Serial or modestly parallel parts in host C code • Highly parallel parts in device SPMD kernel C code Serial Code (host) Parallel Kernel (device) KernelA<<< nBlk,nTid >>>(args); Serial Code (host) Parallel Kernel (device) KernelB<<< nBlk,nTid >>>(args);

  6. CUDA Execution • A GPU device • Is a coprocessor to the CPU or host • Has its own DRAM (device memory) • Runs many threads in parallel • Data-parallel portions of an application are expressed as device kernels which run on many threads • Differences between GPU and CPU threads • GPU threads are extremely lightweight • Very little creation overhead • GPU needs 1000s of threads for full efficiency

  7. Parallel Error Correction with CUDA • Each kernel thread is responsible for correction of a single read ri. • Voting based algorithm • First Step: Calculation of voting matrix • Second Step: Single-Mutation fixing/trimming/discarding

  8. Step1: Voting Matrix Calculation

  9. Step2: Fixing/Trimming/Discarding Reads

  10. Fast Membership Tests • First algorithm(kernel) dominates time • (L-l).(l+3.p.l) membership tests required where p is the number of l-tuples that do not belong in the spectrum. • Space efficient Bloom filter speeds up membership test of spectrum • Compute bloom filter on CPU and store it on texture memory (fast read only cache) on device

  11. Bloom Filter • Probabilistic data structure • No false negatives • Small percentage of false positives • Space efficient and fast • Uses a bit array B of length m and d hash functions • to insert x, we set B[hi(x)] = 1, for i=1,…,d • to query y, we check if B[hi(y)] all equal 1, for i=1,…,d

  12. Bloom Filter Example • a and b are inserted to a m=10 n=2 d=3 bloom filter • Query of c on bloom filter returns false since some bits are 0. • Query of d on bloom filter returns true since all bits are 1 (False positive).

  13. Overall Algorithm • Pre-Computation on the CPU: Program the Bloom filter (counting bloom filter) bit-vector by hashing each l-tuple present on read R. • Data transfer from CPU to GPU: Allocate memory/transfer Bloom filter and reads. • Execute CUDA kernel. • Data transfer from GPU to CPU: Transfer the set of corrected/trimmed reads.

  14. Performance Evaluation • System Parameters • Nvidia Geforce GTX 280 with 1GB memory • AMD Opteron dual core 2.2Ghz CPU with 2GB memory • Datasets • Artificial Sets (1%, 2%, 3% error rates) • Yeast Chromosomes (S.cer5, S.cer7) • Bacterial Genomes (H.inf, E.col) • Real Set • Staphylococcus Aureus strain MW2 (H.Aci) (error rate ~1%)

  15. Performance Evaluation

  16. Performance Evaluation

  17. Discussion/Conclusion (GOOD) • Runtime savings of 10 to 19 times reported. • Bigger datasets is not an issue as long as Bloom filter fits in texture memory. (More than one round of read-load/read-correct approach) • Possible to even further parallelize on distributed memory GPU farms.

  18. Discussion/Conclusion (BAD) • Does not exploit fast shared memory within thread blocks (i.e. each read ri does not really have to be handled by a single thread, voting matrix can be constructed in parallel) thus further speed-up is possible. • Predetermined read length Lis a bit restrictive.

  19. Thank You

More Related