1 / 40

Modeling Inter-Motif Dependence without increasing the complexity

Modeling Inter-Motif Dependence without increasing the complexity. Zhizhuo Zhang. PWM Model. TTGACT TCGACT TTGACT TTGAAA ATGAGG TTGAAA GTGAAA TTGACT TTGAGG TTGAAA. Positional Weight Matrix (PWM). High -Order Dependency. 1 st -order. TTGACT TCGACT TTGACT TTGAAA ATGAGG

boaz
Download Presentation

Modeling Inter-Motif Dependence without increasing the complexity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Inter-Motif Dependence without increasing the complexity Zhizhuo Zhang

  2. PWM Model TTGACT TCGACT TTGACT TTGAAA ATGAGG TTGAAA GTGAAA TTGACT TTGAGG TTGAAA Positional Weight Matrix (PWM)

  3. High -Order Dependency • 1st -order TTGACT TCGACT TTGACT TTGAAA ATGAGG TTGAAA GTGAAA TTGACT TTGAGG TTGAAA

  4. High -Order Dependency • Assume only one dependency group

  5. Two Modeling Principles • Inter-dependence bases only exists in the diverged positions. • There is no inter-dependence relationship across the conserved base.

  6. Principle One • People use KL-Divergence to measure the dissimilarity between two probability distribution • To show the KL-divergence between K+1 order distribution and K order distribution + 0 order distribution is small when the K+1 position base is very conserved.

  7. Principle One The KL-divergence between K+1 order distribution and K order distribution + 0 order distribution is followed:

  8. Principle Two Cys2His2 Zinc Finger DNA-binding family, which is the largest known DNA-binding family in multi-cellular organisms. Independent

  9. Control the complexity • The larger the dependence group, the more parameters, the easier to overfit. • We want to model the k-order dependence using the same number of parameters as (k+1) independent position PWM. (i.e.,4k+4 parameters)

  10. Control the complexity Dependence Positional Weight Matrix (PWM) TTGACT TCGACT TTGACT TTGAAA ATGAGG TTGAAA GTGAAA TTGACT TTGAGG TTGAAA

  11. Control the complexity • Model the problem: • Given a set of binding site sequences X (each is length k), find a DPWM Ω maximize the likelihood P(X| Ω) (or minimize the KL-divergence), with 4k parameters • We can prove that taking top 4k-1 kmer probability as the first 4k-1 paramter value is the best solution:

  12. Exhaustive Search Dependence • Naive method • Enumerate all the combinations and find the max likelihood combination. • Example: length 5 • 1,2,3,4,5 • (1,2,3),4,5 • (1,2)3,(4,5) • (1,2,4,5)3 • (1,2,3,4,5) • ….

  13. Exhaustive Search Dependence • improved method: • Enumerate only single dependence group • If D1 and D2 are two independent groups • Then {D1}, {D2} can be used to compute {D1,D2} • In fact, greedy search • Example: sorted combination (log likelihood) • (1,2),3,4,5: -32 • (1,2,3),4,5:-44 • 1,2,3,(4,5):-50 • … • 1,2,3,4,5:-100 (1,2),3,(4,5) The best

  14. Result • Run MEME, Cisfinder, Amadeus, ChIPMunk, HMS, Trawler, Weeder, JPomoda on 15 ES ChIPseq datasets • Using one half of ChIPseq peaks to learn de novo PWM, and the other half to validate their performances.

  15. Result

  16. Adjacent Dependency • MEME CTCF motif • 1-2-3,10-11 • AUC Result: • MEME:0.9809 • Dependence: 0.9854

  17. Large Dependency Group • MEME SOX2 motif • 1-2-3-4-5-7,14-15 • AUC Result: • MEME:0.845 • Dependence:0.884

  18. Long Dependency • MEME NMYC motif • 10-21,11-12 • AUC Result: • MEME:0.7785 • Dependence: 0.7803

  19. New Servers Configuration

  20. Model & Price Hostname: biogpu 1U server 2X Intel Xeon X5680 Processor(6-core each) 2XM2050 GPU 48GB RAM 3X2TB SATA2 Disks 2X1G network interfaces Price:18k SGD Hostname: genome 3U server 2X Intel Xeon X5680 Processor(6-core each) 144GB RAM 16X2TB SASDisks 2X1G network interfaces Price:20kSGD

  21. File System • genome: RAID-6, 28TB , Centos5.5 • Home:23TB • biogpu: RAID-5, 4TB , YellowDoglinux (Centos5.4) • Home: 3TB

  22. Server software • NIS: using the same account for 2 servers • NFS: • Home directory : genome server • Public_html: biogpu server • Share software: /cluster/biogpu/programs/bin/ • Apache: biogpu server • Mysql: genome server

  23. Current Problems Filesystem Size Used Avail Use% Mounted on /dev/mapper/VolGroup00-LogVol02 393G 4.7G 368G 2% / /dev/mapper/VolGroup00-LogVol00 2.0T 199M 1.9T 1% /tmp /dev/sdb2 23T 23T 439G 99% /home /dev/sda1 920M 47M 826M 6% /boot tmpfs 71G 0 71G 0% /dev/shm I/O killer

  24. To do • Swap backup • Connect to Tembusu • Install SGE

  25. GPU computing

  26. Fermi M2050

  27. Code Example to Add Two Arrays A CUDA kernel • CUDA C Program __global__ void addMatrixG( float *a, float *b, float *c, int N ) { inti = blockIdx.x * blockDim.x + threadIdx.x; int j = blockIdx.y * blockDim.y + threadIdx.y; int index = i + j * N; if ( i < N && j < N ) c[index] = a[index] + b[index]; } void main() { ...... dim3dimBlk( 16, 16 ); dim3dimGrd( N/dimBlk.x, N/dimBlk.y ); addMatrixG<<<dimGrd, dimBlk>>>( a, b, c, N ); } Device code Host code

  28. Host CUDA Memory Model • Each thread can • R/W per-thread registers • R/W per-thread local memory • R/W per-block shared memory • R/W per-grid global memory • RO per-grid constant memory • RO per-grid texture memory • Host can R/W global, constant and texture memory

  29. CUDA Memory Hierarchy • The CUDA platform has three primary memory types Local Memory – per thread memory for automatic variables and register spilling. Shared Memory – per block low-latency memory to allow for intra-block data sharing and synchronization. Threads can safely share data through this memory and can perform barrier synchronization through _ _syncthreads() Global Memory – device level memory that may be shared between blocks or grids

  30. Moving Data… CUDA allows us to copy data from one memory type to another. This includes dereferencing pointers, even in the host’s memory (main system RAM) To facilitate this data movement CUDA provides cudaMemcpy()

  31. CUDA Example 1 – Vector Addition (1) // Device code __global__ void VecAdd( float *A, float *B, float *C ) { int i = blockIdx.x * blockDim.x + threadIdx.x; if ( i < N ) C[i] = A[i] + B[i]; } // Host code int main() { // Allocate vectors in device memory size_t size = N * sizeof(float); float *d_A; cudaMalloc( (void**)&d_A, size ); float *d_B; cudaMalloc( (void**)&d_B, size ); float *d_C; cudaMalloc( (void**)&d_C, size );

  32. CUDA Example 1 – Vector Addition (2) // Copy vectors from host memory to device memory // h_A and h_B are input vectors stored in host memory cudaMemcpy( d_A, h_A, size, cudaMemcpyHostToDevice ); cudaMemcpy( d_B, h_B, size, cudaMemcpyHostToDevice ); // Invoke kernel intthreadsPerBlock = 256; intblocksPerGrid = (N + threadsPerBlock – 1) / threadsPerBlock; VecAdd<<<blocksPerGrid, threadsPerBlock>>>( d_A, d_B, d_C ); // Copy result from device memory to host memory // h_C contains the result in host memory cudaMemcpy( h_C, d_C, size, cudaMemcpyDeviceToHost ); // Free device memory cudaFree(d_A); cudaFree(d_B); cudaFree(d_C); }

  33. Optimization • Minimize the diverse path (if,else …) • Collapsed access global memory • Scattering to gathering • Use Share memory as much as possible

  34. Compiling code Linux Command line. CUDA provides nvcc (a NVIDIA “compiler-driver”. Use instead of gcc nvcc –O3 –o <exe> <input> -I/usr/local/cuda/include –L/usr/local/cuda/lib –lcudart Separates compiled code for CPU and for GPU and compiles code. Need regular C compiler installed for CPU. Make files also provided. Windows NVIDIA suggests using Microsoft Visual Studio

  35. CUDA too hard? • Use others software with cuda acceleration • Use wrapper library

  36. Cuda Accelerated Software • cuBlas, cudaLAPACK • CudaR, CudaPy • Cuda Bioinformatics Softwares:

  37. Thrust • Reordering • Partitioning • Stream Compaction • Prefix Sums • Segmented Prefix Sums • Transformed Prefix Sums • Set Operations • Sorting • Transformations • Filling • Modifying • Replacing • Searching • Binary Search • Vectorized Searches • Copying • Gathering • Scattering • Reductions • Counting • Comparisons • Extrema • Transformed Reductions • Logical • Predicates

  38. Example 1 • #include<thrust/count.h>#include<thrust/device_vector.h>...// put three 1s in a device_vector  thrust::device_vector<int>vec(5,0);vec[1]=1;vec[3]=1;vec[4]=1;// count the 1sint result = thrust::count(vec.begin(),vec.end(),1);// result is three

  39. Example 2 • #include<thrust/transform_reduce.h>#include<thrust/functional.h>#include<thrust/device_vector.h>#include<thrust/host_vector.h>#include<cmath>// square<T> computes the square of a number f(x) -> x*xtemplate<typename T>struct square{ __host__ __device__ T operator()(const T& x)const{return x * x;}};int main(void){// initialize host arrayfloat x[4]={1.0,2.0,3.0,4.0};// transfer to device thrust::device_vector<float>d_x(x, x +4);// setup argumentssquare<float>unary_op; thrust::plus<float>binary_op;float init =0;// compute normfloat norm = std::sqrt( thrust::transform_reduce(d_x.begin(), d_x.end(),unary_op, init,binary_op)); std::cout<< norm << std::endl;return0;}

  40. References • NVIDIA CUDA Programming Guide, Version 2.3 • http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_Programming_Guide_2.3.pdf • NVIDIA CUDA C Programming Best Practices Guide, Version 2.3 • http://developer.download.nvidia.com/compute/cuda/2_3/toolkit/docs/NVIDIA_CUDA_BestPracticesGuide_2.3.pdf • http://code.google.com/p/thrust/wiki/Documentation

More Related