990 likes | 1.35k Views
Introduction to bioinformatics. Barbera van Schaik b.d.vanschaik@amc.uva.nl. Bioinformatics Laboratory, KEBB, AMC http://www.bioinformaticslaboratory.nl/. What is bioinformatics?. A set of software tools for molecular sequence analysis
E N D
Introduction to bioinformatics Barbera van Schaik b.d.vanschaik@amc.uva.nl Bioinformatics Laboratory, KEBB, AMC http://www.bioinformaticslaboratory.nl/
What is bioinformatics? • A set of software tools for molecular sequence analysis • The use of computers to collect, analyze, and interpret biological information at the molecular level. • The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information
Bioinformatics Biomedical research mathematics mathematics Genomics Database technology database technology biology informatics biology informatics Proteomics statistics statistics Metabolomics Data management
Molecular biology 1933 1953 1961 1980
What is genomics? The application of high-throughput automated technologies to molecular biology. OR The experimental study of complete genomes.
454, one run: 7.5 hours 400,000 sequences 200-300 bases per sequence = 100,000,000 bases per run Later in 2008: 400 bases per sequence Roche, 454 Illumina, Solexa Applied biosystems, SOLiD High throughput sequencing
Confused by genomics? Genomics Transcriptomics Proteomics Metabolomics Nutrigenomics Pharmacogenomics Epigenomics Infectomics Patientomics other 'omics'
Institutes that provide support • National Center for Biotechnology Information (NCBI, USA) http://www.ncbi.nlm.nih.gov/ • European Bioinformatics Institute (EBI, UK) http://www.ebi.ac.uk/ • Weizmann Institute of Science (Israel) http://bioportal.weizmann.ac.il/ • Swiss Institute of Bioinformatics (SIB) http://www.expasy.org/ • University of California Santa Cruz (UCSC) http://genome.ucsc.edu/
Bioinformaticsin the Netherlands Universiteiten: -> * Universiteit Leiden (1575) -> * Rijksuniversiteit Groningen (1614) -> * Universiteit Utrecht (1636) -> * Universiteit van Amsterdam (1632) -> * Technische Universiteit Delft (1842) -> * Vrije Universiteit Amsterdam (1880) * Theologische Universiteit Apeldoorn (1894) -> * Erasmus Universiteit Rotterdam (1913) -> * Wageningen Universiteit (1918) -> * Radboud Universiteit Nijmegen (1923) * Universiteit van Tilburg (1927) * Nyenrode Business Universiteit (1946) * Theologische Universiteit Kampen (Oudestraat) (1854) * Theologische Universiteit Kampen (Broederweg) (1854) * Universiteit voor Humanistiek (1946) -> * Technische Universiteit Eindhoven (1956) -> * Universiteit Twente (1961) * Katholieke Theologische Universiteit (1967) -> * Universiteit Maastricht (1976) * Open Universiteit Nederland (1984)
http://www.nbic.nl/ Bioinformaticsin the Netherlands
Databases and ontologies Genome analysis Data and text mining Sequence analysis Subjects in bioinformatics Phylogenetics Systems biology Structural bioinformatics Genetics and population analysis Gene expression Scope guidelines Bioinformatics journal
Databases and ontologies Genome analysis Data and text mining Sequence analysis Subjects in bioinformatics Phylogenetics Systems biology Structural bioinformatics Genetics and population analysis Gene expression
Sequence analysis Function prediction (similarity, sequence search) Localisation (genefinding) Grouping (genes, protein families) Conservation (motifs, functional blocks) SNPs and mutations (variations)
Multiple sequence alignment: in-exact matching of >2 sequences Sequence analysis Pairwise alignment: in-exact matching of 2 sequences
Databases and ontologies Genome analysis Data and text mining Sequence analysis Subjects in bioinformatics Phylogenetics Systems biology Structural bioinformatics Genetics and population analysis Gene expression
Phylogenetics • Evolution = mutation of DNA (and protein) sequences • Can we define evolutionary relationships between organisms by comparing DNA sequences • lots of methods and software, what is the "correct" analysis?
Phylogenetics Ciccarelli (2006), Science
Databases and ontologies Genome analysis Data and text mining Sequence analysis Subjects in bioinformatics Phylogenetics Systems biology Structural bioinformatics Genetics and population analysis Gene expression
Genome analysis Genome assembly http://www.wiley.com/legacy/college/boyer/0470003790/cutting_edge/shotgun_seq/shotgun.htm
HGP Physical Mapping Minimal Tiling Set Shotgun Sequencing For each BAC in tiling: (~33 000 for human) Fragment Assembly Hierarchical shotgunsequencing Genome
Gene annotationKey concepts Gene prediction: Usually the CDS is predicted, not a gene Gene annotation: Alternative splicing UTR Pseudogenes Known vs novelty genes etc.
3 Classes of'gene' prediction Ab-initio Genscan Grail FgenesH Genie GeneId Genefinder Glimmer etc Homology based GeneID Genomescan Twinscan etc Identity based Genewise Sim4 Spidey etc
Ab-initio prediction CCGTGATGCGGTGGCGCGTAAGGCGCAGTGGAAAGTGTAAGA exon exon Example: Genscan
Homology assistedprediction CCGTGATGCGGTGGCGCGTAAGGCGCAGTGGAAAGTGTAAGA EST exon exon Example: Genie, Grail
Identity basedprediction homology known mRNA prediction Example: estToGenome, sim4
human prediction Automated gene annotation homology Genscan IGI/IPI, OTTO, humans
Genome analysis Comparative genomics Thomas et al (2003), Nature
Gene structurein TranscriptView Provided by Jan Koster, Human Genetics, AMC
Discovery of new variant Valentijn et al. (2005), Genomics
Databases and ontologies Genome analysis Data and text mining Sequence analysis Subjects in bioinformatics Phylogenetics Systems biology Structural bioinformatics Genetics and population analysis Gene expression
Genetics and population analysis http://www.hapmap.org/
Copy number variation The Human Genome Structural Variation Working Group, Nature 2007
Databases and ontologies Genome analysis Data and text mining Sequence analysis Subjects in bioinformatics Phylogenetics Systems biology Structural bioinformatics Genetics and population analysis Gene expression
Gene expression analysis Statistical analysis of differential gene expression Expression-based classifiers Regulatory networks / Pathway analysis Integration of expression data Use genes, genesets
Gene expression analysis Highthroughput techniques EST sequencing Microarrays Serial Analysis of Gene Expression (SAGE) Genome tiling arrays High throughput sequencing
Microarray analysis Normalisation: correct for systematic bias Differential gene expression Clustering: grouping genes/samples Classification: signatures
Normalisation DNA microarray data systematic effects resulting from biological process random measurement noise systematic effects resulting from array technology Results in false positives and false negatives Remove these effects by normalisation This is what we are interested in.
Contributions to measured gene expression level ANOVA: analysis of variance yijkg = μ + Ai + Gg + (VG)kg + (AG)ig + (DG)jg + εijkg expression level Array/Gene effect Spot effect Dye effect Noise Gene expresion level (y) of 'Gene A' ANOVA: carefully consider experimental design