1 / 35

The rest of bioinformatics

The rest of bioinformatics. Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington thabangh@gmail.com. One-minute responses.

ankti
Download Presentation

The rest of bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The rest of bioinformatics Prof. William Stafford Noble Department of Genome SciencesDepartment of Computer Science and Engineering University of Washington thabangh@gmail.com

  2. One-minute responses • I always like it when we ask questions and you first say good question, even though the question is not good. • I liked the lecture although the concepts were a bit advanced for me. • I understood about 90% of everything. • The Python is more challenging but it is good to get confused sometimes. • Python was more interesting! • The comprehension of Python is improved at 95%. • Today’s program (first one) was really challenging. I thought the second one was easier to understand. • Python problem 3 was really challenging for me. • The Python today was completely different from the rest and needed more time. • Do your students at home write one-minute responses for the whole semester every day? • Yes. • How did we discover the first mutation? • I am not sure I understand the question. We can observe mutations happening in microorganisms in the lab by sequencing their DNA from one generation to the next. • Are you going to be readily available in future for consultations in case I get stuck? • Yes, you can always email me at thabangh@gmail.com. • I do not think species are related because I believe in creation.

  3. Outline • Parsimony • Distance methods • Computing distances • Finding the tree • Maximum likelihood

  4. Revision • How do we compute the probability of observing this column, given this tree and an assumed model of evolution? ACGCGTTGGG ACGCGTTGGG ACGCAATGAA ACACAGGGAA + Pr(column|tree,model) T T A G

  5. Revision • We enumerate all possible assignments to the internal nodes, compute the probability of each tree, and sum. C G A A A A A A A T T T T A G T A G T A G

  6. Revision • How do we compute the probability of observing this column, given this assigned tree and an assumed model of evolution? ACGCGTTGGG ACGCGTTGGG ACGCAATGAA ACACAGGGAA + A Pr(column|tree,model) T A T T A G

  7. Revision πA, πC, πG, πT • We use our evolutionary model to assign a probability to each branch, and then take the product of the probabilities of the branches. • L(tree) = L0  L1  L2  L3  L4  L5  L6 L0 A L1 L2 T A L5 L3 L4 L6 T T A G

  8. Revision • In maximum likelihood estimation, are mutations that occur on branches of a single tree considered independent or mutually exclusive events? • Independent. • What do different labelings of internal nodes of a tree represent? • Different possible evolutionary histories. • Are the different labelings independent or mutually exclusive? • Mutually exclusive. • Are the columns of a multiple alignment considered independent or mutually exclusive? • Independent

  9. Maximum likelihood revisited for each possible tree for each column of the alignment for each assignment of internal nodes for each branch compute the probability of that branch assigned tree probability ← multiply branch probabilities column probability ← sum assigned tree probabilities tree probability ← multiply column probabilities return the tree with the highest probability

  10. Sequence analysis tasks • Protein structure prediction • Remote homology detection • Gene finding

  11. Protein structure prediction • Given: amino acid sequence • Return: protein structure A complex of earthworm hemoglobin, comprised of 144 globin chains. Source: Protein Databank.

  12. Remote homology detection I0 I1 I2 I3 I4 I5 I6 I7 I8 • The hidden Markov model generalizes the PSSM used by PSI-BLAST. • The model is trained using expectation-maximization. B M1 M2 M3 M4 M5 M6 M7 M8 E D1 D2 D3 D4 D5 D6 D7 D8

  13. Gene finding Pedersen and Hein, Bioinformatics 2003.

  14. Mass spectrometry • Spectrum identification • Protein inference • Biomarker discovery

  15. EAMPK GDIFYPGYCPDVK LPLENENQGK ASVYNSFVSNGVK YVMTFK ENQGVVNR

  16. Biological networks • Functional networks • Protein-protein interaction networks • Metabolic networks • Regulatory networks

  17. Adai et al. JMB 340:179-190 (2004).

  18. Protein-protein interactions • Each node is a protein. • Each edge is a physical interaction. • Edges are measured via • Yeast two-hybrid • TAP tagging plus MS/MS Jeong et al. Nature. 2001.

  19. Regulatory networks • Mammalian cell cycle. • Colors represent different types of interactions • Black: binding • Red: covalent modifications and gene expression • Green: enzyme actions • Blue: stimulations and inhibitions Kohn. Mol Cell Biol. 1999

  20. Metabolic networks • Nodes are enzymes or metabolites. • Edges represent interactions. • This network represents the Arabidopsis TCA cycle.

  21. Gene expression • Clustering • Predictive modeling • Clinical applications

  22. Gene expression matrix The matrix entry at (i, j) is the expression level of gene i in experiment j. Genes Experiments

  23. Cholesterol biosynthesis Cell cycle Immediate-early response Signaling and angiogenesis Wound healing and tissue remodeling Fibroblast gene clustering Iyer et al. “The transcriptional program in the response of human fibroblasts to serum.” Science. 283:83-7, 1999.

  24. Achieves >75% accuracy.

  25. Next generation sequencing Next generation sequencing video

  26. Spaced seed alignment • Tags and tag-sized pieces of reference are cut into small “seeds.” • Pairs of spaced seeds are stored in an index. • Look up spaced seeds for each tag. • For each “hit,” confirm the remaining positions. • Report results to the user.

  27. Burrows-Wheeler • Store entire reference genome. • Align tag base by base from the end. • When tag is traversed, all active locations are reported. • If no match is found, then back up and try a substitution.

  28. Spliced-read mapping • Used for processed mRNA data. • Reports reads that span introns. • Examples: TopHat, ERANGE

  29. Beyond the genome • Epigenetics • Chromatin state assignment • Genome 3D architecture

  30. Next generation assays ENCODE Project Consortium 2011. PLoSBiol 9:e1001046

  31. Rediscovering genes

  32. Population genetics • Genotype to phenotype • Human disease genetics • Population history

  33. Human migrations jbiol.com

  34. Other topics • Natural language processing • Image analysis

More Related