1 / 42

Bioinformatics Tools for Genotyping

Bioinformatics Tools for Genotyping. Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics Institute Summer 2003 Funded by the National Science Foundation and the National Institutes of Health. Overview of Summer Program.

cicada
Download Presentation

Bioinformatics Tools for Genotyping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics Institute Summer 2003 Funded by the National Science Foundation and the National Institutes of Health

  2. Overview of Summer Program • Learn ASP and VBScript • Learn the biology • Programming Project I : writing code for mining of online genetic data • Programming Project II : writing a program to graph linkage disequilibrium data

  3. Intro to ASP & VBScript • ASP : Microsoft Active Server Pages * server generated web pages * similar to CGI but easier * works well with databases • VBScript : Microsoft Visual Basic Scripting * scripting language to enhance HTML web pages * default language of ASP

  4. Hello World! • Sample ASP file (one line only!) <% response.write (“Hello, World!”) %>

  5. Genetic Mapping of ASPs • ASPs : affected sibling pairs • Identification of genes associated with cancer in patients and siblings who both have cancer (breast, prostate, lung or colon) • Determine allele sharing statistics of susceptibility genes • Look at gene-gene interactions => Provide information on a person’s genetic risk of developing cancer

  6. DNA Marker Genotyping • Genetic marker : polymorphic gene or section of DNA that has identifiable physical location on a chromosome used to trace inheritance • Ex. Microsatellite and SNP markers

  7. Programming Project I:Tag Selection For Markers • Need unique way to identify markers (like social security numbers for people) • Chromosome locations are relative and change frequently (UCSC) • Use ASP to automate data mining to ease the generation of these unique 50 base-pair tags for each marker in database • Tags will be used to locate markers in genome

  8. UCSC Genome Browser

  9. Submit accession number for microsatellite Submit accession number for snp Submit sequence surrounding simple repeat Marker Tag Selection

  10. chromosome Sequence start position Sequence end position Link to UCSC browser Inputted sequence with repeats highlighted in blue Output

  11. Send sequence to UCSC Choosing a 50bp tag Copy and paste here

  12. UCSC Blat Results Blat is similar to BLAST : searches for alignment in genome

  13. List of markers and their tags

  14. Convert to FASTA format FASTA format: >name sequence program converts marker tag file into fasta format automatically

  15. Check tag selection Program sends fasta file to UCSC Blat

  16. Linkage Disequilibrium A condition where two polymorphisms are found together on the same chromosome at a greater frequency than that predicted from the product of their individual frequencies.

  17. 5’ 5’ 5’ 5’ 5’ 3’ 3’ 3’ 3’ 3’ G/A G : 0.88 A : 0.12 T/C T : 0.75 C : 0.25 Two snps and their base frequencies (0.88)(0.75) = 0.66 G T (0.88)(0.25) = 0.22 G C (0.12)(0.75) = 0.09 A T (0.12)(0.25) = 0.03 C A Expected frequencies

  18. IF observed frequencies of 2 variants together > expected frequencies => LINKAGE DISEQUILIBRIUM A and T together are in linkage disequilibrium

  19. A Quantitative Measure of LD • One of the most common measures of linkage disequilibrium is • It is a squared correlation coefficient => the correlation of alleles at two sites. • Special case: (“perfect LD”) ~ Exactly two out of the four possible haplotypes are observed. ~ Markers NOT separated by recombination

  20. Marker 1 Marker 2 0.7 1 Marker 1 0.7 0.2 Marker 2 Marker 3 Programming Project II • Program that helps visualize linkage disequilibrium by graphing scores such as • Each pair of markers has such a score => pairwise comparisons Marker 3 Symmetric! 1 0.2

  21. Sample data for graphing Read data by row: Pairwise comparison of marker 1 and marker 7 results in two different kinds of measurements

  22. GOLD – Graphical Overview of Linkage Disequilibrium • Existing program from the Univ. of Michigan to graph linkage disequilibrium http://www.sph.umich.edu/csg/abecasis/GOLD/ • Graphs based on a chromosomal position scale • Works very well for long range pattern analysis, but hard to distinguish each specific measurement.

  23. Comparison of Program Output Same input file Output from GOLD Difficult to see individual points on graph Output from LD Color (my program) Easier to distinguish individual points

  24. LD Color Program • Program written in ASP to graphically depict linkage disequilibrium in human genetic data • Color coded for specific numerical ranges of different measures of each pair-wise comparison of markers • Complete program: 4 files ; >1,000 lines of code

  25. Program Features • Data input : file uploading or text pasting • Allows for variable file formats for input • User defined colors and ranges • Switch between different measures of LD • View actual data on graph or just the colors • Change size of graph • Option to select specific rows of data

  26. Upload your file Paste data

  27. Specify marker columns

  28. Choose label for numerical data inputted

  29. Choose measure of linkage disequilibrium Specify which column the data is located

  30. Same as before => used to specify data for other side of diagonal

  31. Choose to display data on graph

  32. Choose different sizes for the graph

  33. Select only the markers you want graphed by choosing rows Default : all are graphed

  34. Specify the ranges for the colors you want graphed.

  35. Manual

  36. Color Legend

  37. Sample: Symmetric

  38. Sample: Big Size!

  39. Sample: Data On, Asymmetric

  40. Sample: Row Select

  41. Future Directions • LD Color • Mouseover tag to each cell on graph to show marker id (Javascript) • Ability to accept more kinds of file formats • Better form validation and error checking • More functionality and linking to outside sources

  42. Acknowledgements • Dr. Garry Larson, Ph.D • Dave Ko City of Hope Senior Programmer Analyst • Louis Geller City of Hope Senior Research Associate • Dr. Ted Krontiris, M.D.,Ph.D Principal Investigator • The rest of the Krontiris Lab • Southern California Bioinformatics Institute: Dr. Jamil Momand, Dr. Nancy Warter-Perez, Dr. Sandra Sharp & Dr. Wendie Johnston, Jackie Leung & rest of SoCalBSI staff • Fellow interns • NSF & NIH

More Related