1 / 20

Genotype Calling

Genotype Calling. Matt Schuerman. Biological Problem. How do we know an individual’s SNP values (genotype)? Each SNP can have two values (A/B) Each individual has two copies of the SNP Probes can be used to measure how well a particular SNP matches values

Download Presentation

Genotype Calling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genotype Calling Matt Schuerman

  2. Biological Problem • How do we know an individual’s SNP values (genotype)? • Each SNP can have two values (A/B) • Each individual has two copies of the SNP • Probes can be used to measure how well a particular SNP matches values • Need a reliably way to declare values based on probe measurements

  3. Example Probe Reads

  4. Computational Problem • Given a set of data points how can we partition them to maximize similarity within subsets? • The clustering problem • Similarity function arbitrary, but often based on statistical or distance measures • Several accepted algorithms

  5. Standard Solutions • Algorithms exist which call HapMap genotypes with >99% accuracy • Not general, many hidden parameters tuned to work on existing data • Other algorithms require prior knowledge such as how many clusters are present • Again, not general

  6. My Solution • Wanted a more general method with few tuned parameters • Mine has almost no “tuned” parameters • Wanted a fast solution • Many accepted clustering algorithm have exponential run times • Mine is O(n2), but closer to linear in practice

  7. My Solution • Convolve gaussian kernel over data to find initial cluster candidates • Iteratively re-calculate cluster parameters and then re-assign data points to clusters • Assign calls to clusters based on ratio of probe measurements

  8. Phase 1: Initial clusters • Bin data points to grid • Convolve with a 5x5 gaussian kernel • All peaks are considered potential clusters

  9. Phase 2: Cluster Iteration • While the clusters are changing … • Calculate the mean position and covariance matrix of each cluster • Merge clusters within 3 standard deviations of each other using Mahalanobis distance • Assign each data point to the cluster with the shortest Mahalanobis distance

  10. Phase 2: Cluster Iteration Iteration 1 …

  11. Phase 2: Cluster Iteration Iteration 2 …

  12. Phase 2: Cluster Iteration Iteration 3 …

  13. Phase 2: Cluster Iteration Iteration 4, no change so done!

  14. Phase 3: Assigning calls • Based on the ratio of x to y at the center of each cluster • If y/x ~ 1.3, then call as BB • If y/x ~ 1, then call as AB • If y/x ~ 0.7, then call as AA • If 2 or 3 clusters are present, then find which is closest to these values

  15. Results • Clustering works much better when done within populations • Algorithm’s performance is comparable across all populations • Testing 1111 SNPs in the Affy 100K XBA CEU dataset found to be 96.47% accurate

  16. Results: Example Assignment Ignore point at (10,10). One incorrect call in black.

  17. Results • Sometimes assigning calls is problematic • Sometimes clusters get improperly split • Sometimes clusters get improperly merged • Sometimes the grouping is right, but one of the clusters was miscalled • Could probably be fixed if set ratios more precisely

  18. Results: Sample Split Error

  19. Results: Sample Merge Error

  20. Conclusions • Accuracy is close to that of best published algorithms • Faster run time • Simpler approach with less tuning • Need to run more data

More Related