Introduction to
Download
1 / 97

Introduction to bioinformatics - PowerPoint PPT Presentation


  • 235 Views
  • Uploaded on

Introduction to bioinformatics. Barbera van Schaik [email protected] Bioinformatics Laboratory, KEBB, AMC http://www.bioinformaticslaboratory.nl/. What is bioinformatics?. A set of software tools for molecular sequence analysis

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Introduction to bioinformatics' - aimee


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Introduction to

bioinformatics

Barbera van Schaik

[email protected]

Bioinformatics Laboratory, KEBB, AMC

http://www.bioinformaticslaboratory.nl/


What is bioinformatics
What is bioinformatics?

  • A set of software tools for molecular sequence analysis

  • The use of computers to collect, analyze, and interpret biological information at the molecular level.

  • The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information


Bioinformatics
Bioinformatics

Biomedical

research

mathematics

mathematics

Genomics

Database

technology

database

technology

biology

informatics

biology

informatics

Proteomics

statistics

statistics

Metabolomics

Data management




Molecular biology
Molecular biology

1933

1953

1961

1980


What is genomics?

The application of high-throughput automated technologies to molecular biology.

OR

The experimental study of complete genomes.



Automated dna sequencing
AutomatedDNA sequencing


High throughput sequencing

454, one run:

7.5 hours

400,000 sequences

200-300 bases per sequence

= 100,000,000 bases per run

Later in 2008: 400 bases per sequence

Roche, 454

Illumina, Solexa

Applied biosystems, SOLiD

High throughput sequencing


Applications high throughput sequencing
Applications highthroughput sequencing



Confused by genomics
Confused by genomics?

Genomics

Transcriptomics

Proteomics

Metabolomics

Nutrigenomics

Pharmacogenomics

Epigenomics

Infectomics

Patientomics

other 'omics'



Institutes that provide support
Institutes that provide support

  • National Center for Biotechnology Information (NCBI, USA)

    http://www.ncbi.nlm.nih.gov/

  • European Bioinformatics Institute (EBI, UK)

    http://www.ebi.ac.uk/

  • Weizmann Institute of Science (Israel)

    http://bioportal.weizmann.ac.il/

  • Swiss Institute of Bioinformatics (SIB)

    http://www.expasy.org/

  • University of California Santa Cruz (UCSC)

    http://genome.ucsc.edu/


Bioinformatics in the netherlands
Bioinformaticsin the Netherlands

Universiteiten:

-> * Universiteit Leiden (1575)

-> * Rijksuniversiteit Groningen (1614)

-> * Universiteit Utrecht (1636)

-> * Universiteit van Amsterdam (1632)

-> * Technische Universiteit Delft (1842)

-> * Vrije Universiteit Amsterdam (1880)

* Theologische Universiteit Apeldoorn (1894)

-> * Erasmus Universiteit Rotterdam (1913)

-> * Wageningen Universiteit (1918)

-> * Radboud Universiteit Nijmegen (1923)

* Universiteit van Tilburg (1927)

* Nyenrode Business Universiteit (1946)

* Theologische Universiteit Kampen (Oudestraat) (1854)

* Theologische Universiteit Kampen (Broederweg) (1854)

* Universiteit voor Humanistiek (1946)

-> * Technische Universiteit Eindhoven (1956)

-> * Universiteit Twente (1961)

* Katholieke Theologische Universiteit (1967)

-> * Universiteit Maastricht (1976)

* Open Universiteit Nederland (1984)


Bioinformatics in the netherlands1

http://www.nbic.nl/

Bioinformaticsin the Netherlands


Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

Scope guidelines Bioinformatics journal


Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Sequence analysis
Sequence analysis

Function prediction (similarity, sequence search)

Localisation (genefinding)

Grouping (genes, protein families)

Conservation (motifs, functional blocks)

SNPs and mutations (variations)


Sequence analysis1

Multiple sequence alignment: in-exact matching

of >2 sequences

Sequence analysis

Pairwise alignment: in-exact matching of 2 sequences




Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Phylogenetics
Phylogenetics

  • Evolution = mutation of DNA (and protein) sequences

  • Can we define evolutionary relationships between organisms by comparing DNA sequences

    • lots of methods and software, what is the "correct" analysis?



Phylogenetics2
Phylogenetics

Ciccarelli (2006), Science


Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Genome analysis
Genome analysis

Genome assembly

http://www.wiley.com/legacy/college/boyer/0470003790/cutting_edge/shotgun_seq/shotgun.htm


Hierarchical shotgun sequencing

HGP

Physical Mapping

Minimal Tiling Set

Shotgun Sequencing

For each BAC in tiling:

(~33 000 for human)

Fragment Assembly

Hierarchical shotgunsequencing

Genome


Gene annotation key concepts
Gene annotationKey concepts

Gene prediction:

Usually the CDS is predicted, not a gene

Gene annotation:

Alternative splicing

UTR

Pseudogenes

Known vs novelty genes

etc.


3 classes of gene prediction
3 Classes of'gene' prediction

Ab-initio

Genscan

Grail

FgenesH

Genie

GeneId

Genefinder

Glimmer

etc

Homology based

GeneID

Genomescan

Twinscan

etc

Identity based

Genewise

Sim4

Spidey

etc


Ab initio prediction
Ab-initio prediction

CCGTGATGCGGTGGCGCGTAAGGCGCAGTGGAAAGTGTAAGA

exon

exon

Example: Genscan


Homology assisted prediction
Homology assistedprediction

CCGTGATGCGGTGGCGCGTAAGGCGCAGTGGAAAGTGTAAGA

EST

exon

exon

Example: Genie, Grail


Identity based prediction
Identity basedprediction

homology

known mRNA

prediction

Example: estToGenome, sim4


Automated gene annotation

human prediction

Automated gene annotation

homology

Genscan

IGI/IPI, OTTO, humans


Genome analysis1
Genome analysis

Comparative genomics

Thomas et al (2003), Nature


Gene structure in transcriptview
Gene structurein TranscriptView

Provided by Jan Koster, Human Genetics, AMC


Discovery of new variant
Discovery of new variant

Valentijn et al. (2005), Genomics


Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Genetics and population analysis
Genetics andpopulation analysis


Genetics and population analysis1
Genetics and population analysis

http://www.hapmap.org/


Copy number variation
Copy number variation

The Human Genome Structural Variation Working Group, Nature 2007


Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Gene expression analysis
Gene expression analysis

Statistical analysis of differential gene expression

Expression-based classifiers

Regulatory networks / Pathway analysis

Integration of expression data

Use genes, genesets


Gene expression analysis1
Gene expression analysis

Highthroughput techniques

EST sequencing

Microarrays

Serial Analysis of Gene Expression (SAGE)

Genome tiling arrays

High throughput sequencing


Microarray analysis
Microarray analysis

Normalisation: correct for systematic bias

Differential gene expression

Clustering: grouping genes/samples

Classification: signatures


Normalisation

DNA microarray

data

systematic effects

resulting from

biological process

random measurement

noise

systematic effects

resulting from

array technology

Results in false positives

and false negatives

Remove these effects by

normalisation

This is what we

are interested in.


Contributions to measured gene expression level
Contributions to measured gene expression level

ANOVA: analysis of variance

yijkg = μ + Ai + Gg + (VG)kg + (AG)ig + (DG)jg + εijkg

expression level

Array/Gene effect

Spot effect

Dye effect

Noise

Gene expresion level (y) of 'Gene A'

ANOVA: carefully consider experimental design


Classification with selected genes

Validation set:

2 out 19 incorrect

78 sporadic breast tumors

70 prognostic markers genes

Good prognosis

Van ‘t Veer et al, Nature 415: 530-536 (2002)

Bad prognosis


Gene set analysis
Gene set analysis

  • Single gene measurements are quite noisy

  • Lists of (differentially expressed) genes do not learn us much about the underlying biological process

  • Analysis in terms of sets of genes (=modules):

    • pathways

    • gene ontology (molecular function, biological process, cellular component)

    • chromosomal region



Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression



Integration of information understanding disease
Integration of informationUnderstanding disease

Integration expression data

with pathways

http://www.genome.jp/kegg/


Integration of information knowledge base
Integration of informationKnowledge base

Literature

Primary ‘databases’

Huge amount of information

Heterogeneous

Different formats

Different ‘standards’

Redundant / erroneous

Not always curated

Difficult to obtain integrated view of domain

Biological

databases

Domain experts

Background

information

Wikipedia

Provided by Marcel Willemsen, Bioinformatics Laboratory, AMC


Pseudo-neonatal adrenoleukodystrophy

(ACOX1; EC1.3.3.6)

Different substrates use

different enzymes

Peroxisome

Thiolysis

Background knowledge:

what is Thiolysis?

dehydrogenation

hydration

Oxidation

mitochondrion

Peroxisomal beta-oxidation

ceases at Octanoyl CoA

Activation


Knowledge base
Knowledge base

Information selection

Knowledge acquistion

Curation (committee)

Maintenance


Knowledge base1

is a

is an

Chihuahua

Dog

Animal

Knowledge base

Storage of knowledge as an ontology


Knowledge visualization

Bioinformatics Laboratory

Knowledge Visualization

D131

D137


Knowledge visualization1

Bioinformatics Laboratory

Knowledge Visualization







Bioinformatics Laboratory

Knowledge Visualization


Knowledge base2
Knowledge base

Knowledge editor

Insightand overview

Export/Import

Genomics data in KB context

Access

Integration clinic

Integration

Maintenance

Education


Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Co occurence of protein names
Co-occurence of protein names

http://www.cytoscape.org/


Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Topics computer exercises
Topics computer exercises

Query public databases

Sequence format convertion

Automatic RNA protein translation

Primer design

Sequence alignment

Gene finding

Tomorrow 13:30-15:00 L-007

http://www.bioinformaticslaboratory.nl/



Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Structural bioinformatics proteins
Structural bioinformatics - proteins

http://cwx.prenhall.com/horton/medialib/media_portfolio/04.html


Structural bioinformatics proteins1
Structural bioinformaticsproteins

Distributed computing - e.g. [email protected]

http://folding.stanford.edu/


Structural bioinformatics rna
Structural bioinformaticsRNA

http://folding.stanford.edu/

http://tigger.uic.edu/classes/phys/phys461/phys450/ANJUM04/


Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Systems biology
Systems biology

Study of complete

pathways / organelles / cells / organisms

Strategies:

Modelling details

Complete network analysis

Complete systems biology

not feasible at the moment


Systems biology1
Systems biology

Celldesigner

http://www.celldesigner.org/


Systems biology2
Systems biology

Cytoscape

http://www.cytoscape.org/


Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Cancer research tool: Human Transcriptome Map

Gene expression data

(SAGE / microarrays)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

X

Y



Prostate Cancer

Prostate Normal

Neuroblastoma

Breast Cancer

Colon Normal

Glioblastoma

Brain Normal

Ovary Normal

All tissues

Ovary Cancer

Colon tumor


Medulloblastoma

Prostate Normal

Prostate Cancer

Neuroblastoma

Breast Normal

Breast Cancer

Colon Normal

Ovary Normal

Ovary Cancer

Glioblastoma

Brain Normal

All tissues

Colon tumor

Median Expression Levels

All tissues

#11

cM:

156

cR:

446

Genes:

1208

Tags/gene

Caron et al (2001), Science


GC content

Intron length-1

Gene density

Fundamental insight in the human genome

Expression

Chromosome 12

Versteeg et al (2003), Genome research


Conserved synteny between the human and mouse genomes
Conserved synteny betweenthe human and mouse genomes


Synteny and orthologs

Mouse

"Reconstructed" human

chromosome 2

Provided by Ramin Monajemi, Bioinformatics Laboratory, AMC


Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression


Gene expression with clinical information
Gene expression withclinical information



Gene expression studies high throughput sequencing
Gene expression studieshigh throughput sequencing

Full-length transcripts

EST sequencing

5' transcript ends (5'-RATE, CAGE)

SAGE ditag sequencing

SAGE-like 3' end sequencing

Nebulized fragments

ncRNA sequencing


ad