1 / 31

Sudhindra R. Gadagkar, Ph.D.

Sudhindra R. Gadagkar, Ph.D. Computational Biology University of Dayton. Some background material…. BS in Fisheries Science from University of Agricultural Sciences, Bangalore, India MS in Fisheries Science (Statistics). Tilapia ( Oreochromis niloticus ). Genetics of fish behavior.

alain
Download Presentation

Sudhindra R. Gadagkar, Ph.D.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sudhindra R. Gadagkar, Ph.D. Computational Biology University of Dayton

  2. Some background material… • BS in Fisheries Science from University of Agricultural Sciences, Bangalore, India • MS in Fisheries Science (Statistics)

  3. Tilapia (Oreochromis niloticus)

  4. Genetics of fish behavior

  5. Ph. D. research (contd.) • Complex behaviors are heritable (behaviors governed by genes) • Behavior and growth rate are correlated at the genetic level (the same gene(s) are responsible for both traits or they are closely linked)

  6. Post-doctoral research in Bioinformatics at Arizona State University

  7. What I do now • There is information in DNA and this information is used by the body. Source for image: www.nigms.nih.gov/.../ genetics/science.html

  8. DNA is an incredibly long strand, made up of four different molecules (called nucleotides), abbreviated as A, C, G and T. • For example, the DNA from the longest human chromosome is 12 cm long!

  9. Each cell of the human body contains DNA. • The total length of all this DNA is >3 billion nucleotides! • That’s a large number!

  10. Let’s get some perspective • A DNA sequence can look like the following: • ACTGTTTGAAATTGACCCAGCACTTCTCCCTCGCGCAGACAGAGAGCAGTGTAGACGGAGCCTTAATCGCTAGAGCGAATCCCGATGCCCCACCTTCCGTCGGTGCATAAGTCGCACGGCGTCTCCCCCCCGTATGTGGTCTTAGGTAACCGCCGCCGGGCGTAGGGTTCACGGTCGAGGATGAAGATGGCGATTCGTCACCTCGCCAACGGGAGGGACCTCATTCGATCGATCCGCAAGTCTTCGCGGGAGCTCGTCATGCGGAACGCAGGAGACAACACTCTGCGTCGGATGCGCGCCGTATCAGTCGGGTGAGGCACGCCTAGCGATTCGACCTTAATTCCCGGACGCGACGCGAGGAGTTGGGAGATTGCTGCCCAAACCGGTCCGCGCTACTTAGGCTGCCGGACCCTTCTCGCCCCACGGGTGGCGGTGGTAATAGAGTTGGCCCGCCCTCTATGTGTCGGAAAGGGGGAGCCGGGGGCCGTGAGGATGCCCACACTGTCGGCGAGACCATGCTATCGAGCCTCCCTGGGACCCTCGGGGACTTTAGTTCCCACTCGGTTGGGGATTCAGTAGCCACGAATCAGACCGCCCCGGGTGGGGGCTTCGTCGTCTTGTCTTTCCAGCCCCCCTCTACTCTTCCTACTACGCCCGTCTGTCGAGGGTGCCGAGCGCGCAGTGTGCTCCCAGCGGCTCGTGCCAGGTTAGGTAGCCATATGTATTTATCGGCTGAGGACCGCCCGCCGTGTACCGACGATTTTGTTATAATTCTAGAGATGGGCTGGCACTTACCTGCTAGGTTTCTTGTCTGCTATGACTCGTGCGAACAGTCTTACTCTTGGCACAGCCGCGATGGCGATGGTTTAGCGGTTCCCATGGGGGGAATCGCGCGACGGCACCCAGTTCTGTTTCGACCGGACCCTGCTTACTCCTGGCCGAGAGGCCTCATTCTCGTTCGAGTCGATCGCTTATGTTATCGCGCCATTGGGAGTGCTCTGACCAATTACCGACCCGGAGTGTG

  11. Let’s get some perspective • What if we try and write down the entire sequence (all 3.5 billion of them)? • After all, now we do know what the entire human DNA sequence is.

  12. Let’s get some perspective • Let’s see…if we can fit 75 letters in each line and if there are 50 lines in a page, then a page will contain 3,750 nucleotides. • That does sound like a lot (the earlier slide had 1024).

  13. Let’s get some perspective • A book that contains 100 pages can hold 100 x 3,750 = 375,000 nucleotides. • That is a lot! • How thick do you think a book of 100 pages might be? • An inch maybe. • We need to write down at least 3 billion letters.

  14. Let’s get some perspective • Therefore, we need (3,000,000,000)/375,000 • = 8000 inches • = 667 feet.

  15. The Washington monument Source of image: epod.usra.edu/archive/ epodviewer.php3?oid=158368

  16. Let’s get some perspective • ... is 555 feet! • So imagine a stack of books taller than the Washington monument crammed with letters – no spaces, no commas, no paragraphs.

  17. Let’s get some perspective • And we would have written down the data for one strand in one cell of one human being! • We need to understand this data. • Remember, there are no words, no punctuations, no “parts of speech” in this “text”. • Yet, we have to make sense out of this information.

  18. Another example • This is the evolutionary tree of primates. • There are 10 species here whose evolutionary relationship we are interested in. Source for image: locus.umdnj.edu/nigms/ special/primate.html

  19. How many possible trees? • Do you know how many possible ways there are for drawing the evolutionary history (“tree”) for 10 species? where n is the number of species

  20. 1079 atoms in the universe 1037 atoms in the bodies of all humans by year 2035 5  1030 prokaryotes living today 5  1011 stars in the milky way How many trees! 1200 10 1000 10 10 800 No. of Possible Trees 10 600 Millions 400 10 10 200 0 0 100 200 300 400 Billions No. of Sequences How many trees represent the true relationship?

  21. And only one of them is the correct tree because evolution has happened only once. • And we need to find it!

  22. One final example

  23. Pairwise Alignment – contd. • Consider these two DNA sequences • AATCTATA • AAGATA • We want to compare them site by site, so we need to align them by introducing gaps. • Gaps can be introduced in various places, and in various combinations, as shown next.

  24. Pairwise Alignment – contd.

  25. Pairwise Alignment – contd. • Clearly, if the sequences are long, it would become impossible for manual introduction of gaps; we would need a computer to help us find the optimal gaps. • But let us first see what is involved in asking the computer to do this. • One way, the looooooooong way is to: • introduce gaps in every possible position.

  26. The Brute Force Method(the Perspiration approach) • For the long way, to get an idea of what is involved, let us first look at the first position. • There are three possible choices: • gap in the first sequence • gap in the second sequence, or • gap in neither sequence That is, • - A A • A - A

  27. The Brute Force Method(the Perspiration approach) • These options are the same for every position. • Therefore, the number of possible paths, y, for a pair of sequences of length 1 base is 3 • If the sequences are 2 bases long it is 32 = 9.

  28. The Brute Force Method(the Perspiration approach) • In general, if they are n bases long, then there are 3n paths. • If n = 20, then y = 320 = 3.4 x 109

  29. The Brute Force Method(the Perspiration approach) • If n = 200, then y = 3200 = 2.6 x 1095 • If one path takes 1 nanosecond (10-9 seconds), then for a pair of sequences that is 200 bases long, the computer will need • 8.4 x 1078 years!!

  30. Let’s get some perspective • Needs a super-human effort, eh? • That’s absolutely right! • That super-human is the computer. • But it’s not enough to just use the computer to solve such problems. • The computer does not have to work hard. It needs to work smart!

  31. Need Computer!

More Related