Introduction to
This presentation is the property of its rightful owner.
Sponsored Links
1 / 97

Introduction to bioinformatics PowerPoint PPT Presentation


  • 190 Views
  • Uploaded on
  • Presentation posted in: General

Introduction to bioinformatics. Barbera van Schaik [email protected] Bioinformatics Laboratory, KEBB, AMC http://www.bioinformaticslaboratory.nl/. What is bioinformatics?. A set of software tools for molecular sequence analysis

Download Presentation

Introduction to bioinformatics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Introduction to bioinformatics

Introduction to

bioinformatics

Barbera van Schaik

[email protected]

Bioinformatics Laboratory, KEBB, AMC

http://www.bioinformaticslaboratory.nl/


What is bioinformatics

What is bioinformatics?

  • A set of software tools for molecular sequence analysis

  • The use of computers to collect, analyze, and interpret biological information at the molecular level.

  • The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information


Bioinformatics

Bioinformatics

Biomedical

research

mathematics

mathematics

Genomics

Database

technology

database

technology

biology

informatics

biology

informatics

Proteomics

statistics

statistics

Metabolomics

Data management


History

History


The internet

The internet


Molecular biology

Molecular biology

1933

1953

1961

1980


Introduction to bioinformatics

What is genomics?

The application of high-throughput automated technologies to molecular biology.

OR

The experimental study of complete genomes.


Dna microarrays

DNA microarrays


Automated dna sequencing

AutomatedDNA sequencing


High throughput sequencing

454, one run:

7.5 hours

400,000 sequences

200-300 bases per sequence

= 100,000,000 bases per run

Later in 2008: 400 bases per sequence

Roche, 454

Illumina, Solexa

Applied biosystems, SOLiD

High throughput sequencing


Applications high throughput sequencing

Applications highthroughput sequencing


Sample storage

Sample storage


Confused by genomics

Confused by genomics?

Genomics

Transcriptomics

Proteomics

Metabolomics

Nutrigenomics

Pharmacogenomics

Epigenomics

Infectomics

Patientomics

other 'omics'


Introduction to bioinformatics

image credit: Digital Vision, PhotoDisc, Matt Ray/EHP


Institutes that provide support

Institutes that provide support

  • National Center for Biotechnology Information (NCBI, USA)

    http://www.ncbi.nlm.nih.gov/

  • European Bioinformatics Institute (EBI, UK)

    http://www.ebi.ac.uk/

  • Weizmann Institute of Science (Israel)

    http://bioportal.weizmann.ac.il/

  • Swiss Institute of Bioinformatics (SIB)

    http://www.expasy.org/

  • University of California Santa Cruz (UCSC)

    http://genome.ucsc.edu/


Bioinformatics in the netherlands

Bioinformaticsin the Netherlands

Universiteiten:

-> * Universiteit Leiden (1575)

-> * Rijksuniversiteit Groningen (1614)

-> * Universiteit Utrecht (1636)

-> * Universiteit van Amsterdam (1632)

-> * Technische Universiteit Delft (1842)

-> * Vrije Universiteit Amsterdam (1880)

* Theologische Universiteit Apeldoorn (1894)

-> * Erasmus Universiteit Rotterdam (1913)

-> * Wageningen Universiteit (1918)

-> * Radboud Universiteit Nijmegen (1923)

* Universiteit van Tilburg (1927)

* Nyenrode Business Universiteit (1946)

* Theologische Universiteit Kampen (Oudestraat) (1854)

* Theologische Universiteit Kampen (Broederweg) (1854)

* Universiteit voor Humanistiek (1946)

-> * Technische Universiteit Eindhoven (1956)

-> * Universiteit Twente (1961)

* Katholieke Theologische Universiteit (1967)

-> * Universiteit Maastricht (1976)

* Open Universiteit Nederland (1984)


Bioinformatics in the netherlands1

http://www.nbic.nl/

Bioinformaticsin the Netherlands


Introduction to bioinformatics

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

Scope guidelines Bioinformatics journal


Introduction to bioinformatics

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Sequence analysis

Sequence analysis

Function prediction (similarity, sequence search)

Localisation (genefinding)

Grouping (genes, protein families)

Conservation (motifs, functional blocks)

SNPs and mutations (variations)


Sequence analysis1

Multiple sequence alignment: in-exact matching

of >2 sequences

Sequence analysis

Pairwise alignment: in-exact matching of 2 sequences


Blast output

Blast output


Blast output alignments

Blast output - alignments


Introduction to bioinformatics

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Phylogenetics

Phylogenetics

  • Evolution = mutation of DNA (and protein) sequences

  • Can we define evolutionary relationships between organisms by comparing DNA sequences

    • lots of methods and software, what is the "correct" analysis?


Phylogenetics1

Phylogenetics


Phylogenetics2

Phylogenetics

Ciccarelli (2006), Science


Introduction to bioinformatics

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Genome analysis

Genome analysis

Genome assembly

http://www.wiley.com/legacy/college/boyer/0470003790/cutting_edge/shotgun_seq/shotgun.htm


Hierarchical shotgun sequencing

HGP

Physical Mapping

Minimal Tiling Set

Shotgun Sequencing

For each BAC in tiling:

(~33 000 for human)

Fragment Assembly

Hierarchical shotgunsequencing

Genome


Gene annotation key concepts

Gene annotationKey concepts

Gene prediction:

Usually the CDS is predicted, not a gene

Gene annotation:

Alternative splicing

UTR

Pseudogenes

Known vs novelty genes

etc.


3 classes of gene prediction

3 Classes of'gene' prediction

Ab-initio

Genscan

Grail

FgenesH

Genie

GeneId

Genefinder

Glimmer

etc

Homology based

GeneID

Genomescan

Twinscan

etc

Identity based

Genewise

Sim4

Spidey

etc


Ab initio prediction

Ab-initio prediction

CCGTGATGCGGTGGCGCGTAAGGCGCAGTGGAAAGTGTAAGA

exon

exon

Example: Genscan


Homology assisted prediction

Homology assistedprediction

CCGTGATGCGGTGGCGCGTAAGGCGCAGTGGAAAGTGTAAGA

EST

exon

exon

Example: Genie, Grail


Identity based prediction

Identity basedprediction

homology

known mRNA

prediction

Example: estToGenome, sim4


Automated gene annotation

human prediction

Automated gene annotation

homology

Genscan

IGI/IPI, OTTO, humans


Genome analysis1

Genome analysis

Comparative genomics

Thomas et al (2003), Nature


Gene structure in transcriptview

Gene structurein TranscriptView

Provided by Jan Koster, Human Genetics, AMC


Discovery of new variant

Discovery of new variant

Valentijn et al. (2005), Genomics


Introduction to bioinformatics

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Genetics and population analysis

Genetics andpopulation analysis


Genetics and population analysis1

Genetics and population analysis

http://www.hapmap.org/


Copy number variation

Copy number variation

The Human Genome Structural Variation Working Group, Nature 2007


Introduction to bioinformatics

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Gene expression analysis

Gene expression analysis

Statistical analysis of differential gene expression

Expression-based classifiers

Regulatory networks / Pathway analysis

Integration of expression data

Use genes, genesets


Gene expression analysis1

Gene expression analysis

Highthroughput techniques

EST sequencing

Microarrays

Serial Analysis of Gene Expression (SAGE)

Genome tiling arrays

High throughput sequencing


Microarray analysis

Microarray analysis

Normalisation: correct for systematic bias

Differential gene expression

Clustering: grouping genes/samples

Classification: signatures


Introduction to bioinformatics

Normalisation

DNA microarray

data

systematic effects

resulting from

biological process

random measurement

noise

systematic effects

resulting from

array technology

Results in false positives

and false negatives

Remove these effects by

normalisation

This is what we

are interested in.


Contributions to measured gene expression level

Contributions to measured gene expression level

ANOVA: analysis of variance

yijkg = μ + Ai + Gg + (VG)kg + (AG)ig + (DG)jg + εijkg

expression level

Array/Gene effect

Spot effect

Dye effect

Noise

Gene expresion level (y) of 'Gene A'

ANOVA: carefully consider experimental design


Introduction to bioinformatics

Classification with selected genes

Validation set:

2 out 19 incorrect

78 sporadic breast tumors

70 prognostic markers genes

Good prognosis

Van ‘t Veer et al, Nature 415: 530-536 (2002)

Bad prognosis


Gene set analysis

Gene set analysis

  • Single gene measurements are quite noisy

  • Lists of (differentially expressed) genes do not learn us much about the underlying biological process

  • Analysis in terms of sets of genes (=modules):

    • pathways

    • gene ontology (molecular function, biological process, cellular component)

    • chromosomal region


Gene set enrichment analysis gsea

Gene set enrichment analysis (GSEA)


Introduction to bioinformatics

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Http nar oupjournals org

http://nar.oupjournals.org/


Integration of information understanding disease

Integration of informationUnderstanding disease

Integration expression data

with pathways

http://www.genome.jp/kegg/


Integration of information knowledge base

Integration of informationKnowledge base

Literature

Primary ‘databases’

Huge amount of information

Heterogeneous

Different formats

Different ‘standards’

Redundant / erroneous

Not always curated

Difficult to obtain integrated view of domain

Biological

databases

Domain experts

Background

information

Wikipedia

Provided by Marcel Willemsen, Bioinformatics Laboratory, AMC


Introduction to bioinformatics

Pseudo-neonatal adrenoleukodystrophy

(ACOX1; EC1.3.3.6)

Different substrates use

different enzymes

Peroxisome

Thiolysis

Background knowledge:

what is Thiolysis?

dehydrogenation

hydration

Oxidation

mitochondrion

Peroxisomal beta-oxidation

ceases at Octanoyl CoA

Activation


Knowledge base

Knowledge base

Information selection

Knowledge acquistion

Curation (committee)

Maintenance


Knowledge base1

is a

is an

Chihuahua

Dog

Animal

Knowledge base

Storage of knowledge as an ontology


Knowledge visualization

Bioinformatics Laboratory

Knowledge Visualization

D131

D137


Knowledge visualization1

Bioinformatics Laboratory

Knowledge Visualization


Knowledge visualization2

Knowledge Visualization


Knowledge visualization3

Knowledge Visualization


Knowledge visualization4

Knowledge Visualization


Knowledge visualization5

Knowledge Visualization


Knowledge visualization6

Knowledge Visualization


Introduction to bioinformatics

Bioinformatics Laboratory

Knowledge Visualization


Knowledge base2

Knowledge base

Knowledge editor

Insightand overview

Export/Import

Genomics data in KB context

Access

Integration clinic

Integration

Maintenance

Education


Introduction to bioinformatics

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Co occurence of protein names

Co-occurence of protein names

http://www.cytoscape.org/


Introduction to bioinformatics

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Topics computer exercises

Topics computer exercises

Query public databases

Sequence format convertion

Automatic RNA protein translation

Primer design

Sequence alignment

Gene finding

Tomorrow 13:30-15:00 L-007

http://www.bioinformaticslaboratory.nl/


Introduction to bioinformatics

Extra


Introduction to bioinformatics

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Structural bioinformatics proteins

Structural bioinformatics - proteins

http://cwx.prenhall.com/horton/medialib/media_portfolio/04.html


Structural bioinformatics proteins1

Structural bioinformaticsproteins

Distributed computing - e.g. [email protected]

http://folding.stanford.edu/


Structural bioinformatics rna

Structural bioinformaticsRNA

http://folding.stanford.edu/

http://tigger.uic.edu/classes/phys/phys461/phys450/ANJUM04/


Introduction to bioinformatics

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Systems biology

Systems biology

Study of complete

pathways / organelles / cells / organisms

Strategies:

Modelling details

Complete network analysis

Complete systems biology

not feasible at the moment


Systems biology1

Systems biology

Celldesigner

http://www.celldesigner.org/


Systems biology2

Systems biology

Cytoscape

http://www.cytoscape.org/


Introduction to bioinformatics

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Introduction to bioinformatics

Cancer research tool: Human Transcriptome Map

Gene expression data

(SAGE / microarrays)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

X

Y


Introduction to bioinformatics

N-myc in Neuroblastoma


Introduction to bioinformatics

Prostate Cancer

Prostate Normal

Neuroblastoma

Breast Cancer

Colon Normal

Glioblastoma

Brain Normal

Ovary Normal

All tissues

Ovary Cancer

Colon tumor


Introduction to bioinformatics

Medulloblastoma

Prostate Normal

Prostate Cancer

Neuroblastoma

Breast Normal

Breast Cancer

Colon Normal

Ovary Normal

Ovary Cancer

Glioblastoma

Brain Normal

All tissues

Colon tumor

Median Expression Levels

All tissues

#11

cM:

156

cR:

446

Genes:

1208

Tags/gene

Caron et al (2001), Science


Introduction to bioinformatics

GC content

Intron length-1

Gene density

Fundamental insight in the human genome

Expression

Chromosome 12

Versteeg et al (2003), Genome research


Conserved synteny between the human and mouse genomes

Conserved synteny betweenthe human and mouse genomes


Introduction to bioinformatics

Synteny and orthologs

Mouse

"Reconstructed" human

chromosome 2

Provided by Ramin Monajemi, Bioinformatics Laboratory, AMC


Introduction to bioinformatics

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Gene expression with clinical information

Gene expression withclinical information


Timeserie mycn off on

Timeserie MycN off/on


Gene expression studies high throughput sequencing

Gene expression studieshigh throughput sequencing

Full-length transcripts

EST sequencing

5' transcript ends (5'-RATE, CAGE)

SAGE ditag sequencing

SAGE-like 3' end sequencing

Nebulized fragments

ncRNA sequencing


  • Login