slide1
Download
Skip this Video
Download Presentation
Introduction to bioinformatics

Loading in 2 Seconds...

play fullscreen
1 / 97

Introduction to bioinformatics - PowerPoint PPT Presentation


  • 235 Views
  • Uploaded on

Introduction to bioinformatics. Barbera van Schaik [email protected] Bioinformatics Laboratory, KEBB, AMC http://www.bioinformaticslaboratory.nl/. What is bioinformatics?. A set of software tools for molecular sequence analysis

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Introduction to bioinformatics' - aimee


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Introduction to

bioinformatics

Barbera van Schaik

[email protected]

Bioinformatics Laboratory, KEBB, AMC

http://www.bioinformaticslaboratory.nl/

what is bioinformatics
What is bioinformatics?
  • A set of software tools for molecular sequence analysis
  • The use of computers to collect, analyze, and interpret biological information at the molecular level.
  • The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information
bioinformatics
Bioinformatics

Biomedical

research

mathematics

mathematics

Genomics

Database

technology

database

technology

biology

informatics

biology

informatics

Proteomics

statistics

statistics

Metabolomics

Data management

molecular biology
Molecular biology

1933

1953

1961

1980

slide7

What is genomics?

The application of high-throughput automated technologies to molecular biology.

OR

The experimental study of complete genomes.

high throughput sequencing

454, one run:

7.5 hours

400,000 sequences

200-300 bases per sequence

= 100,000,000 bases per run

Later in 2008: 400 bases per sequence

Roche, 454

Illumina, Solexa

Applied biosystems, SOLiD

High throughput sequencing
confused by genomics
Confused by genomics?

Genomics

Transcriptomics

Proteomics

Metabolomics

Nutrigenomics

Pharmacogenomics

Epigenomics

Infectomics

Patientomics

other \'omics\'

institutes that provide support
Institutes that provide support
  • National Center for Biotechnology Information (NCBI, USA)

http://www.ncbi.nlm.nih.gov/

  • European Bioinformatics Institute (EBI, UK)

http://www.ebi.ac.uk/

  • Weizmann Institute of Science (Israel)

http://bioportal.weizmann.ac.il/

  • Swiss Institute of Bioinformatics (SIB)

http://www.expasy.org/

  • University of California Santa Cruz (UCSC)

http://genome.ucsc.edu/

bioinformatics in the netherlands
Bioinformaticsin the Netherlands

Universiteiten:

-> * Universiteit Leiden (1575)

-> * Rijksuniversiteit Groningen (1614)

-> * Universiteit Utrecht (1636)

-> * Universiteit van Amsterdam (1632)

-> * Technische Universiteit Delft (1842)

-> * Vrije Universiteit Amsterdam (1880)

* Theologische Universiteit Apeldoorn (1894)

-> * Erasmus Universiteit Rotterdam (1913)

-> * Wageningen Universiteit (1918)

-> * Radboud Universiteit Nijmegen (1923)

* Universiteit van Tilburg (1927)

* Nyenrode Business Universiteit (1946)

* Theologische Universiteit Kampen (Oudestraat) (1854)

* Theologische Universiteit Kampen (Broederweg) (1854)

* Universiteit voor Humanistiek (1946)

-> * Technische Universiteit Eindhoven (1956)

-> * Universiteit Twente (1961)

* Katholieke Theologische Universiteit (1967)

-> * Universiteit Maastricht (1976)

* Open Universiteit Nederland (1984)

slide19

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

Scope guidelines Bioinformatics journal

slide20

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

sequence analysis
Sequence analysis

Function prediction (similarity, sequence search)

Localisation (genefinding)

Grouping (genes, protein families)

Conservation (motifs, functional blocks)

SNPs and mutations (variations)

sequence analysis1

Multiple sequence alignment: in-exact matching

of >2 sequences

Sequence analysis

Pairwise alignment: in-exact matching of 2 sequences

slide25

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

phylogenetics
Phylogenetics
  • Evolution = mutation of DNA (and protein) sequences
  • Can we define evolutionary relationships between organisms by comparing DNA sequences
      • lots of methods and software, what is the "correct" analysis?
phylogenetics2
Phylogenetics

Ciccarelli (2006), Science

slide29

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

genome analysis
Genome analysis

Genome assembly

http://www.wiley.com/legacy/college/boyer/0470003790/cutting_edge/shotgun_seq/shotgun.htm

hierarchical shotgun sequencing

HGP

Physical Mapping

Minimal Tiling Set

Shotgun Sequencing

For each BAC in tiling:

(~33 000 for human)

Fragment Assembly

Hierarchical shotgunsequencing

Genome

gene annotation key concepts
Gene annotationKey concepts

Gene prediction:

Usually the CDS is predicted, not a gene

Gene annotation:

Alternative splicing

UTR

Pseudogenes

Known vs novelty genes

etc.

3 classes of gene prediction
3 Classes of\'gene\' prediction

Ab-initio

Genscan

Grail

FgenesH

Genie

GeneId

Genefinder

Glimmer

etc

Homology based

GeneID

Genomescan

Twinscan

etc

Identity based

Genewise

Sim4

Spidey

etc

ab initio prediction
Ab-initio prediction

CCGTGATGCGGTGGCGCGTAAGGCGCAGTGGAAAGTGTAAGA

exon

exon

Example: Genscan

homology assisted prediction
Homology assistedprediction

CCGTGATGCGGTGGCGCGTAAGGCGCAGTGGAAAGTGTAAGA

EST

exon

exon

Example: Genie, Grail

identity based prediction
Identity basedprediction

homology

known mRNA

prediction

Example: estToGenome, sim4

automated gene annotation

human prediction

Automated gene annotation

homology

Genscan

IGI/IPI, OTTO, humans

genome analysis1
Genome analysis

Comparative genomics

Thomas et al (2003), Nature

gene structure in transcriptview
Gene structurein TranscriptView

Provided by Jan Koster, Human Genetics, AMC

discovery of new variant
Discovery of new variant

Valentijn et al. (2005), Genomics

slide41

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

genetics and population analysis1
Genetics and population analysis

http://www.hapmap.org/

copy number variation
Copy number variation

The Human Genome Structural Variation Working Group, Nature 2007

slide45

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

gene expression analysis
Gene expression analysis

Statistical analysis of differential gene expression

Expression-based classifiers

Regulatory networks / Pathway analysis

Integration of expression data

Use genes, genesets

gene expression analysis1
Gene expression analysis

Highthroughput techniques

EST sequencing

Microarrays

Serial Analysis of Gene Expression (SAGE)

Genome tiling arrays

High throughput sequencing

microarray analysis
Microarray analysis

Normalisation: correct for systematic bias

Differential gene expression

Clustering: grouping genes/samples

Classification: signatures

slide49

Normalisation

DNA microarray

data

systematic effects

resulting from

biological process

random measurement

noise

systematic effects

resulting from

array technology

Results in false positives

and false negatives

Remove these effects by

normalisation

This is what we

are interested in.

contributions to measured gene expression level
Contributions to measured gene expression level

ANOVA: analysis of variance

yijkg = μ + Ai + Gg + (VG)kg + (AG)ig + (DG)jg + εijkg

expression level

Array/Gene effect

Spot effect

Dye effect

Noise

Gene expresion level (y) of \'Gene A\'

ANOVA: carefully consider experimental design

slide51

Classification with selected genes

Validation set:

2 out 19 incorrect

78 sporadic breast tumors

70 prognostic markers genes

Good prognosis

Van ‘t Veer et al, Nature 415: 530-536 (2002)

Bad prognosis

gene set analysis
Gene set analysis
  • Single gene measurements are quite noisy
  • Lists of (differentially expressed) genes do not learn us much about the underlying biological process
  • Analysis in terms of sets of genes (=modules):
    • pathways
    • gene ontology (molecular function, biological process, cellular component)
    • chromosomal region
slide54

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

integration of information understanding disease
Integration of informationUnderstanding disease

Integration expression data

with pathways

http://www.genome.jp/kegg/

integration of information knowledge base
Integration of informationKnowledge base

Literature

Primary ‘databases’

Huge amount of information

Heterogeneous

Different formats

Different ‘standards’

Redundant / erroneous

Not always curated

Difficult to obtain integrated view of domain

Biological

databases

Domain experts

Background

information

Wikipedia

Provided by Marcel Willemsen, Bioinformatics Laboratory, AMC

slide59

Pseudo-neonatal adrenoleukodystrophy

(ACOX1; EC1.3.3.6)

Different substrates use

different enzymes

Peroxisome

Thiolysis

Background knowledge:

what is Thiolysis?

dehydrogenation

hydration

Oxidation

mitochondrion

Peroxisomal beta-oxidation

ceases at Octanoyl CoA

Activation

knowledge base
Knowledge base

Information selection

Knowledge acquistion

Curation (committee)

Maintenance

knowledge base1

is a

is an

Chihuahua

Dog

Animal

Knowledge base

Storage of knowledge as an ontology

slide69

Bioinformatics Laboratory

Knowledge Visualization

knowledge base2
Knowledge base

Knowledge editor

Insightand overview

Export/Import

Genomics data in KB context

Access

Integration clinic

Integration

Maintenance

Education

slide71

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

co occurence of protein names
Co-occurence of protein names

http://www.cytoscape.org/

slide73

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

topics computer exercises
Topics computer exercises

Query public databases

Sequence format convertion

Automatic RNA protein translation

Primer design

Sequence alignment

Gene finding

Tomorrow 13:30-15:00 L-007

http://www.bioinformaticslaboratory.nl/

slide77

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

structural bioinformatics proteins
Structural bioinformatics - proteins

http://cwx.prenhall.com/horton/medialib/media_portfolio/04.html

structural bioinformatics proteins1
Structural bioinformaticsproteins

Distributed computing - e.g. [email protected]

http://folding.stanford.edu/

structural bioinformatics rna
Structural bioinformaticsRNA

http://folding.stanford.edu/

http://tigger.uic.edu/classes/phys/phys461/phys450/ANJUM04/

slide81

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

systems biology
Systems biology

Study of complete

pathways / organelles / cells / organisms

Strategies:

Modelling details

Complete network analysis

Complete systems biology

not feasible at the moment

systems biology1
Systems biology

Celldesigner

http://www.celldesigner.org/

systems biology2
Systems biology

Cytoscape

http://www.cytoscape.org/

slide85

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

slide86

Cancer research tool: Human Transcriptome Map

Gene expression data

(SAGE / microarrays)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

X

Y

slide89

Prostate Cancer

Prostate Normal

Neuroblastoma

Breast Cancer

Colon Normal

Glioblastoma

Brain Normal

Ovary Normal

All tissues

Ovary Cancer

Colon tumor

slide90

Medulloblastoma

Prostate Normal

Prostate Cancer

Neuroblastoma

Breast Normal

Breast Cancer

Colon Normal

Ovary Normal

Ovary Cancer

Glioblastoma

Brain Normal

All tissues

Colon tumor

Median Expression Levels

All tissues

#11

cM:

156

cR:

446

Genes:

1208

Tags/gene

Caron et al (2001), Science

slide91

GC content

Intron length-1

Gene density

Fundamental insight in the human genome

Expression

Chromosome 12

Versteeg et al (2003), Genome research

slide93

Synteny and orthologs

Mouse

"Reconstructed" human

chromosome 2

Provided by Ramin Monajemi, Bioinformatics Laboratory, AMC

slide94

Databases and ontologies

Genome analysis

Data and text mining

Sequence analysis

Subjects in

bioinformatics

Phylogenetics

Systems biology

Structural bioinformatics

Genetics and population analysis

Gene expression

gene expression studies high throughput sequencing
Gene expression studieshigh throughput sequencing

Full-length transcripts

EST sequencing

5\' transcript ends (5\'-RATE, CAGE)

SAGE ditag sequencing

SAGE-like 3\' end sequencing

Nebulized fragments

ncRNA sequencing

ad