1 / 25

Applying haplotype models to association study design

Finding Bit Patterns. Applying haplotype models to association study design. Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005. Problem: Applying haplotype models. Input: Output:. 10000010100010010 00010100101101001 01101101001000010 10101011111000010.

bsantana
Download Presentation

Applying haplotype models to association study design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Bit Patterns Applying haplotype models to association study design Natalie Castellana Kedar Dhamdhere Russell Schwartz August 16, 2005

  2. Problem: Applying haplotype models • Input: • Output: 10000010100010010 00010100101101001 01101101001000010 10101011111000010 (14,17,“0010”) a set of recurring patterns of the form (start column, end column, pattern)

  3. Haplotype Minor allele Major Allele SNP Association Test Background 1000011010110100000010 Given that this sample has haplotype 1101, does it have the disease?

  4. …1001001… …1000001… …1000101… …1110011… Genetic Variation Mutation: Recombination: …1000011… …1110101… Because of recombination, similar genetic variation can be found within closely linked regions.

  5. Cases: Download from HapMap.org Apply Disease Model Controls: Generate using MS Apply Haplotype Model Perform Association Tests Data Sets 10010011101 01100101101 10010010101 10001110100 Input: 1001001010110 1001001110100 0110010110100 1000111010010

  6. Testing individual SNP’s • Go through each SNP and determine which SNP’s accurately predict which samples have the disease and which do not. Case: 0 0 1 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 0 Control: 0 0 0 0 1 0 1 0 0 0 1 0 0 1 1 0 1 1 1 0 0 0 0 1

  7. Haplotype block method • Instead of looking at each individual SNP, we can look at groups of contiguous SNP’s. 1101000000…11… 1101100100…01… 0111000000…10… 1101100100…00…

  8. Haplotype motif method • Notion that a sequence is the concatenation of segments (like the block method) but does not require conservation of boundaries. 1101000000… 1100100100… 0111000000… 1101100111…

  9. c c c c c c c c 10000100………………………………… 00011100………………………………… 11011110………………………………… 01010110………………………………… Approximation Algorithm General idea: Pick the best partition, minimizing the number of motifs needed to explain all the data.

  10. C 0 1 000…000 111…111 000..100 ……… Finding Motifs 0 0 0 1 1 0 0 1 0 1 1 0 1 0 0 1 1

  11. Problems Really, really, really slow Took over a week to partition our biggest data set. Added a ‘max leaves explored’ feature. Useless for larger c.

  12. Real Data

  13. Simulated Data

  14. False Positives

  15. General Linear Program Objective Function: minimize: x + y + z Constraints: x + y <= 2 1 1 0 x 2 x +2z <= 5 1 0 2 * y <= 5 z 0 <= x <= 3 0 <= y <= Inf -Inf <= z <= 0

  16. A Linear Program Input: A matrix with M rows and N columns Output: The minimum number of motifs.

  17. Variables X’s: each x corresponds to a motif Define a motif by a tuple: (start column, end column, string pattern) Y’s: each y corresponds to a row partition Define a row partition by a set of motifs: {(1,e1,“…”),(e1+1,e2,“…”),...,(en,N,“…”)}

  18. Constraints Exactly one partition must be chosen per row. If a motif used in a row partition is not chosen, then the row partition may not be chosen. Minimize the sum of all X’s.

  19. Example 10001101 X’s: (1,1,“1”),(1,2,“10”),(1,3,“100”), etc. Y’s: (1,1,“1”),(1,8,“0001101”) (1,2,“10”),(3,3,“0”),(4,8,“01101”)

  20. Constraint Matrix(1) Exactly one row partition must be chosen per row. all X’s all Y’s (1,1,“1”) (1,1,“0”)…(1,2,“10”) Y_1 Y_2 … Row 1 0 0 … 0 1 1 … Row 2 0 0 … 0 0 0 … Row 3 0 0 … 0 1 1 … .. Row M 0 0 0 Y_1 := (1,1,“1”),(1,8,“0001101”) Y_2 := (1,2,“10”),(3,3,“0”),(4,8,“01101”) =1 =1 =1 … =1

  21. Constraint Matrix(2) If a motif used in a row partition is not chosen, then the row partition may not be chosen. all X’s all Y’s (1,1,“1”) (1,1,“0”)…(1,2,“10”) Y_1 Y_2 … Row i: (1,1,“1”) 1 0 … 0 -1 0 … (1,2,“10”) 0 0 … 1 0 -1 … (1,3,“100”) 0 0 … 0 0 0 … .. … … … … … … … (8,8,“1”) 0 0 … 0 0 0 Y_1 := (1,1,“1”),(1,8,“0001101”) Y_2 := (1,2,“10”),(3,3,“0”),(4,8,“01101”) >=0 >=0 >=0 … >=0

  22. Constraint Matrix x’s y’s 1 K K+1 K+P 0 1 0 0 0 0 0 …0 0 0 0 1 1 1 0 0 0 0…. 0 ** Constraint 1 ** 2 0 0 0 0 0 …0 0 0 0 1 0 0 1 1 1 0…. 0 == 1 … M 0 0 0 0 0 …0 0 0 0 0 0 1 0 0 0 1…. 1 1 1 1 0 0 0 0 …0 0 0 0 -1 0 0 0 ….0 0 ** Constraint 2 ** 2 0 1 0 0 0 …0 0 0 0 -1 -1 0 0….-1 0 >= 0 … K_1 0 0 1 0 0 …0 0 0 0 0 0 0 0 ….0 0 . . . M Where K is the number of unique motifs, K_i is the number of motifs appearing in row i, and P is the number of unique partitions

  23. Problems Each row has N(N+1)/2 motifs. So there will be a polynomial number of X’s. Good! Each row can be partitioned in 2^(N-1) ways. So there will be an exponential number of Y’s. Bad! Solution: column generation

  24. Column generation We find the optimal solution to the problem which contains all X’s and only some of the Y’s. Then we see if adding any Y’s would improve the solution.

  25. Where are we now? • Where are we going?

More Related