260 likes | 446 Views
Bioinformatics. Predrag Radivojac Indiana University. Basics of Molecular Biology. Can we understand how cells function?. Eukaryotic cell. Bioinformatics is multidisciplinary!. What is Bioinformatics? Integrates : computer science, statistics, chemistry, physics, and molecular biology
E N D
Bioinformatics Predrag Radivojac Indiana University
Basics of Molecular Biology Can we understand how cells function? Eukaryotic cell
Bioinformatics is multidisciplinary! • What is Bioinformatics? • Integrates: computer science, statistics, chemistry, physics, and molecular biology • Goal: organize and store huge amounts of biological data and extract knowledge from it • Major areas of research • Genomics • Proteomics • Databases • Practical discipline Some major applications · Drug design · Evolutionary studies · Genome characterization
Interesting Problems Sequence Alignment
Interesting Problems • Sequence assembly Goal: solve the puzzle, i.e. connect the pieces into one genomic sequence
Interesting Problems • Proteomics Mass spectrometry
Interesting Problems • Microarray data
Interesting Problems • Gene Regulation • Functional Genomics
Diseases are interconnected… Goh et al. PNAS, 104: 8685 (2007).
Disease • Development of tools that can be used to understand and treat human disease • Prediction of disease-associated genes • Important from • biological standpoint • medical standpoint • computational standpoint • Background • human genome • low-throughput data • high-throughput data • ontologies for protein function at multiple levels The Time is Right! www.cancer.gov
Alzheimer’s disease Top PhenoPred hits: 1) CDK5 2) NTN1 AUC = 77.5%
Loss/Gain of function and disease E6V 4hhb 2hbs Sickle Cell Disease: Autosomal recessive disorder E6V in HBB causes interaction w/ F85 and L88 Formation of amyloid fibrils Abnormally shaped red blood cells, leads to sickle cell anemia Manifestation of disease vastly different over patients Pauling et al. Science110: 543 (1949). Chui & Dover. CurrOpinPediatr, 13: 22 (2001). http://gingi.uchicago.edu/hbs2.html
Proteins = chains of amino acids • biomolecule, macromolecule • more than 50% of the dry weight of cells is proteins • polymer of amino acids connected into linear chains • strings of symbols • machinery of life • play central role in the structure and function of cells • regulate and execute many biological functions a) amino acid b) amino acid chain Introduction to Protein Structure by Branden and Tooze
Protein structure • peptide bonds are planar and strong • by rotating at each amino acid, proteins adopt structure Introduction to Protein Structure by Branden and Tooze
Protein function • Multi-level phenomenon • biochemical function • biological function • phenotypical function • Example: kinase • biochemical function – transferase • biological function – cell cycle regulation • phenotypical function – disease • Function is everything that happens to or through a protein (Rost et al. 2003)
Protein contact graph Myoglobin 1.4A X-ray PDB: 2jho 153 residues C- C< 6A
Residue neighborhood Notation: S113 of isocitrate dehydrogenase G = (V, E) f: V A A = {A, C, D, … W, Y} g: V {1, +1}
S Graphlets are small non-isomorphic connected graphs. Different positions of the pivot vertex with respect to the graphlet correspond to graph-theoretical concept of automorphism orbits, or orbits. Przulj et al. Bioinformatics20: 3508 (2004).
Key insight: Efficient combinatorial enumeration of graphlets / orbits over 7 disjoint cases 2-graphlets: 01 3-graphlets: 011, 012 4-graphlets: 0111, 0112 0122, 0123 breadth-first search
02 01 01|A| o2|A|2 o5, o6, o11|A|3 o3, o4 ? A = {0, 1} 00, 01 = 10, 11 (3) A = {0, 1, 2} 00, 11, 22, 01 = 10, 02 = 20, 12 = 21 (6) binomial (multinomial) coefficients |A |= 20, dimensionality = 1,062,420
Graphlet kernel Inner product between vectors of counts of labeled orbits where K is a kernel because matrices of inner products are symmetric and positive definite (proof due to David Haussler). i(x) is the number of times labeled orbit i occurs in the graph