Genomics, Bioinformatics
This presentation is the property of its rightful owner.
Sponsored Links
1 / 83

Genomics, Bioinformatics and the Revolution in Biology PowerPoint PPT Presentation


  • 187 Views
  • Uploaded on
  • Presentation posted in: General

Genomics, Bioinformatics and the Revolution in Biology. Jonathan Pevsner, Ph.D. Kennedy Krieger Institute/ Johns Hopkins School of Medicine. Outline. Three views of bioinformatics and genomics Informatics From small to large From genotype to phenotype The chromosomes

Download Presentation

Genomics, Bioinformatics and the Revolution in Biology

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Genomics bioinformatics and the revolution in biology

Genomics, Bioinformatics

and the Revolution in Biology

Jonathan Pevsner, Ph.D.

Kennedy Krieger Institute/

Johns Hopkins School of Medicine


Genomics bioinformatics and the revolution in biology

Outline

Three views of bioinformatics and genomics

Informatics

From small to large

From genotype to phenotype

The chromosomes

SNPs, HapMap, and the 1000 Genomes project


Genomics bioinformatics and the revolution in biology

Definitions of bioinformatics and genomics

  • Bioinformatics is the interface of biology and computers.

  • It is the analysis of proteins, genes and genomes

  • using computer algorithms and databases.

  • Genomics is the analysis of genomes, including the

  • nature of genetic elements on chromosomes.

  • The tools of bioinformatics are used to make

  • sense of the billions of base pairs of DNA

  • that are sequenced by genomics projects.

  • Genetics is the study of the origin and expression of individual uniqueness.


Genomics bioinformatics and the revolution in biology

Three views of bioinformatics and genomics

1. The field of informatics

2. From small to large

3. From genotype to phenotype


Genomics bioinformatics and the revolution in biology

bioinformatics

medical

informatics

public health

informatics

algorithms

databases

infrastructure

genomics

Tool-users

Tool-makers


Genomics bioinformatics and the revolution in biology

Three views of bioinformatics and genomics

1. The field of informatics

2. From small to large

3. From genotype to phenotype


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

phenotype


Genomics bioinformatics and the revolution in biology

200

180

160

140

120

100

80

60

40

20

0

Rapid growth of DNA sequences

Total number of DNA base pairs in GenBank/WGS

Base pairs (billions)

Sequences (millions)

1982

1992

2002

2008

Year


Genomics bioinformatics and the revolution in biology

Time of

development

Body region, physiology,

pharmacology, pathology


Genomics bioinformatics and the revolution in biology

The Origin of Species (1859)

It is interesting to contemplate a tangled bank, clothed with many plants of many kinds, with birds singing on the bushes, with various insects flitting about, and with worms crawling through the damp earth, and to reflect that these elaborately constructed forms, so different from each other, and dependent upon each other in so complex a manner, have all been produced by laws acting around us.

Source: Origin of Species, Chapter 15


Genomics bioinformatics and the revolution in biology

Eukaryotes

(Baldauf et al. 2000)

fungi

animals

slime

mold

plants

Paramecium

Plasmodium

Trypanosoma

Giardia

Trichomonas


Genomics bioinformatics and the revolution in biology

Wolfe et al. (1999)


Genomics bioinformatics and the revolution in biology

8 chromosomes

(5,000 genes)

16 chromosomes

(10,000 genes)

16 chromosomes

(6,000 genes)

Wolfe et al. (1999)


Genomics bioinformatics and the revolution in biology

Paramecium tetraurelia: a ciliate with two nuclei, 40,000 genes, and three whole-genome duplications


Genomics bioinformatics and the revolution in biology

Phylogenetic

footprinting

Phylogenetic

shadowing

Population

shadowing


Genomics bioinformatics and the revolution in biology

Three views of bioinformatics and genomics

1. The field of informatics

2. From small to large

3. From genotype to phenotype


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population

We see 500 inpatients and 13,000 outpatients per year at the Kennedy Krieger Institute. Why do children engage in self-injurious behavior? In many cases, there are chromosomal insults.

Phenotype


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population

From genotype…

…to phenotype


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population

DNA

RNA

protein

cellular phenotype

clinical phenotype


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population

DNA

RNA

protein

Central dogma of molecular biology:

DNA is transcribed into RNA,

and translated into protein.

Central dogma of bioinformatics/genomics:

the genome is transcribed into the transcriptome, and translated into the proteome.


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population

200

180

160

140

120

100

80

60

40

20

0

1982

1992

2002

2008

Over 200 billion base pairs of DNA have now been sequenced, from >165,000 organisms.


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population

Scope of bioinformatics

Sequence analysis

Pairwise alignment

Multiple sequence alignment

Phylogeny

Database searching (e.g. BLAST)

Functional genomics

RNA studies; gene expression profiling

Proteomics; protein structure

Gene function


Genomics bioinformatics and the revolution in biology

Pairwise alignments in the 1950s

b-corticotropin (sheep)

Corticotropin A (pig)

ala gly glu asp asp glu

asp gly ala glu asp glu

CYIQNCPLG

CYFQNCPRG

Oxytocin

Vasopressin


Genomics bioinformatics and the revolution in biology

globins:

a-

b-

myoglobin

Early example of sequence alignment: globins (1961)

H.C. Watson and J.C. Kendrew, “Comparison Between the Amino-Acid Sequences of Sperm Whale Myoglobin and of Human Hæmoglobin.” Nature 190:670-672, 1961.


Genomics bioinformatics and the revolution in biology

LAGAN

2e Fig. 5.21


Genomics bioinformatics and the revolution in biology

Multiple sequence alignment of five globins:

ClustalW


Genomics bioinformatics and the revolution in biology

Praline


Genomics bioinformatics and the revolution in biology

MUSCLE


Genomics bioinformatics and the revolution in biology

Probcons


Genomics bioinformatics and the revolution in biology

TCoffee


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population

Scope of bioinformatics

Sequence analysis

Pairwise alignment

Multiple sequence alignment

Phylogeny

Database searching (e.g. BLAST)

Functional genomics

RNA studies; gene expression profiling

Proteomics; protein structure

Gene function


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population

Four bases: A, G, C, T arranged in base pairs along a double helix (1953).

Human genome project: sequencing all ~3 billion base pairs (2003).


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population

1995: first genome sequence (a bacterium)

2000: fruit fly genome, plant

2003: human genome

2008: --two individual human genomes finished

--1,000 human genomes (launched)

--SNPs used to study chromosomes


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population

Time of

development

Body region, physiology,

pharmacology, pathology


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population


Genomics bioinformatics and the revolution in biology

DNA

RNA

protein

pathway

cell

organism

population

Genotype

Phenotype


Genomics bioinformatics and the revolution in biology

Outline

Three views of bioinformatics and genomics

Informatics

From small to large

From genotype to phenotype

The chromosomes

SNPs, HapMap, and the 1000 Genomes project


Genomics bioinformatics and the revolution in biology

Eukaryotic genomes are organized

into chromosomes

Genomic DNA is organized in chromosomes. The diploid

number of chromosomes is constant in each species

(e.g. 46 in human). Chromosomes are distinguished by a centromere and telomeres.

The chromosomes are routinely visualized by karyotyping

(imaging the chromosomes during metaphase, when

each chromosome is a pair of sister chromatids).


Genomics bioinformatics and the revolution in biology

Fig. 16.19

Page 565


Genomics bioinformatics and the revolution in biology

nucleolar organizing center

centromere

human chromosome 21

at NCBI


Genomics bioinformatics and the revolution in biology

nucleolar organizing center

centromere

human chromosome 21

at www.ensembl.org


Genomics bioinformatics and the revolution in biology

centromere

human chromosome 21

at UCSC Genome Browser


Genomics bioinformatics and the revolution in biology

centromere

human chromosome 21

at UCSC Genome Browser


Genomics bioinformatics and the revolution in biology

First P.G. mitosis in polar view. Tradescantia virginiana, Commelinaceae, n = 9 (from aberrrant plant with 22 chromosomes). 2 BE - CV smears. x 1200. Printed on multigrade paper.

Darlington.


Genomics bioinformatics and the revolution in biology

First P.G. mitosis in Paris quadrifolia, Liliaceae, showing all stages from prophase to telophase. n = 10 (cf. Darlington 1937, 1941)

2 BE – CV smear, 8mm. objective. x 800

Darlington.


Genomics bioinformatics and the revolution in biology

Root tip squashes showing anaphase separation. Fritillaria pudica, 3x = 39, spiral structure of chromatids revealed by pressure after cold treatment.

2 BD – Feulgen; x 3000

Darlington.


Genomics bioinformatics and the revolution in biology

Cleavage mitosis in the morula of the teleostean fish, Coregonus clupeoides, in the middle of anaphase. Spindle structure revealed by slow fixation. Section cut at 10 u. x 4000. Strong Flemming, haematoxylin. Prep. and photo by P.C. Koller.

Darlington.


Genomics bioinformatics and the revolution in biology

The eukaryotic chromosome: Robertsonian fusion

creates one metacentric by fusion of two acrocentrics

ordinary male house mouse (Mus musculus, 2n = 40)

male tobacco mouse (Mus poschiavinus, 2n = 26)

Ohno (1970) Plate II


Genomics bioinformatics and the revolution in biology

The spectrum of variation

Category of variationSizetype

Single base pair changes1 bpSNPs,

point mutations

Small insertions/deletions1 –50 bp

Short tandem repeats1 –500 bpmicrosatellites

Fine-scale structural var.50 bp –5 kbdel, dup, inv

tandem repeats

Retroelement insertions0.3 –10 kb SINEs, LINEs

LTRs, ERVs

Intermediate-scale struct.5 kb – 50 kbdel, dup, inv,

tandem repeats

Large-scale structural var.50 kb – 5 Mbdel, dup, inv, large tandem repeats

Chromosomal variation>>5Mb aneuploidy

Adapted from Sharp AJ et al. (2006) Annu Rev Genomics Hum Genet 7:407-42


Genomics bioinformatics and the revolution in biology

Across the genome, there

are four possible SNP calls:

[1] homozygous (AA)

[2] homozygous (BB)

[3] heterozygous (AB)

[4] no call


Genomics bioinformatics and the revolution in biology

In a deleted region, there

are three possible SNP calls:

[1] A (interpreted as AA)

[2] B (interpreted as BB)

[3] no call

Across the genome, there

are four possible SNP calls:

[1] homozygous (AA)

[2] homozygous (BB)

[3] heterozygous (AB)

[4] no call


Genomics bioinformatics and the revolution in biology

Single nucleotide polymorphisms (SNPs) to investigate

chromosomes: A case of 7p deletion

AA

AB

BB


Genomics bioinformatics and the revolution in biology

A case of 7p deletion

A

B

AA

AB

BB


Genomics bioinformatics and the revolution in biology

A case of 7p deletion

A

B

  • Deletions (and duplications) such as these are called copy number variants (CNVs).

  • CNVs commonly occur in normal individuals.

  • When found in individuals with disease, we can tell if they are inherited (likely to be benign) or occur de novo (more likely to be disease-associated) by comparison to the parents’ genotypes.

  • Recent papers report many CNVs in disease.


Genomics bioinformatics and the revolution in biology

A case of trisomy 21 (Down syndrome)

AAA

AAB

ABB

BBB


Genomics bioinformatics and the revolution in biology

Three cases of 10q deletion


Genomics bioinformatics and the revolution in biology

Deafness gene?


Genomics bioinformatics and the revolution in biology

The International HapMap Project

► A catalog of common genetic variants that occur in humans

► The project’s goal is to compare the genetic sequences of different individuals to identify chromosomal regions where genetic variants are shared

► An initial focus has been on four groups (n=270):

CEUEuropean ancestry (30 trios)

Utah residents

YRIAfrican ancestry (30 trios)

Yoruba in Ibadan, Nigeria

JPT/CHBAsian ancestry (90 individuals)

Japanese in Tokyo, Japan

Han Chinese in Beijing, China

► Phase I (2005): > 1 million SNPs

Phase II (2007): added 2.1 million SNPs


Genomics bioinformatics and the revolution in biology

The International HapMap Project

► In addition to CEU, YRI, and JPT/CHB additional populations have been genotyped including:

Maasai in Kinyawa, Kenya

Luhya in Webuye, Kenya

Gujarati Indians in Houston, TX

Toscani in Italy

Mexican ancestry in Los Angeles

African ancestry in southwestern US


Genomics bioinformatics and the revolution in biology

The ENCODE project

►The ENCyclopedia Of DNA Elements (ENCODE) project was launched in 2003

► Pilot phase: devise and test high-throughput approaches to identify functional elements. Efforts center on 44 DNA targets. These cover about 1 percent of the human genome, or about 30 million base pairs.

► Second phase: technology development.

► Third phase: production. Expand the ENCODE project to analyze the remaining 99 percent of the human genome.


Genomics bioinformatics and the revolution in biology

The ENCODE project

Goal of ENCODE: build a list of all sequence-based functional elements in human DNA. This includes:

► protein-coding genes

► non-protein-coding genes

► regulatory elements involved in the control of gene transcription

► DNA sequences that mediate chromosomal structure and dynamics.


Genomics bioinformatics and the revolution in biology

ENCODE data at the UCSC Genome Browser: beta globin

HBB, HBD, HBG1,

HBG2, HBE1


Genomics bioinformatics and the revolution in biology

ENCODE data at the UCSC Genome Browser: beta globin

(50,000 base pairs including HBB, HBD, HBG1, HBG2, HBE1)


Genomics bioinformatics and the revolution in biology

ENCODE tracks available at the UCSC Genome Browser

<>


Genomics bioinformatics and the revolution in biology

EGASP: the human ENCODE Genome

Annotation Assessment Project

EGASP goals:

[1] Assess of the accuracy of computational methods to predict protein coding genes. 18 groups competed to make gene predictions, blind; these were evaluated relative to reference annotations generated by the GENCODE project.

[2] Assess of the completeness of the current human genome annotations as represented in the ENCODE regions.


Genomics bioinformatics and the revolution in biology

UCSC: tracks for Gencode and for various gene prediction algorithms

(focus on 50 kb encompassing five globin genes)

Gencode

<>

JIGSAW


Genomics bioinformatics and the revolution in biology

On bioinformatics

“Science is about building causal relations between natural phenomena (for instance, between a mutation in a gene and a disease). The development of instruments to increase our capacity to observe natural phenomena has, therefore, played a crucial role in the development of science - the microscope being the paradigmatic example in biology. With the human genome, the natural world takes an unprecedented turn: it is better described as a sequence of symbols. Besides high-throughput machines such as sequencers and DNA chip readers, the computer and the associated software becomes the instrument to observe it, and the discipline of bioinformatics flourishes.”


Genomics bioinformatics and the revolution in biology

On bioinformatics

“However, as the separation between us (the observers) and the phenomena observed increases (from organism to cell to genome, for instance), instruments may capture phenomena only indirectly, through the footprints they leave. Instruments therefore need to be calibrated: the distance between the reality and the observation (through the instrument) needs to be accounted for. This issue of Genome Biology is about calibrating instruments to observe gene sequences; more specifically, computer programs to identify human genes in the sequence of the human genome.”

Martin Reese and Roderic Guigó, Genome Biology 2006 7(Suppl I):S1,

introducing EGASP, the Encyclopedia of DNA Elements (ENCODE) Genome Annotation Assessment Project


Genomics bioinformatics and the revolution in biology

The 1000 Genomes Project

Goal: To create a deep catalog of human genetic variation in multiple populations.

[1] Discover variants (SNPs, copy number variants, insertions/deletions). Include ~all variants with allele frequencies >1% across the genome (and >0.1-0.5% in gene regions)

[2] Estimate the frequencies of variant alleles


Genomics bioinformatics and the revolution in biology

The 1000 Genomes Project

  • Secondary goals:

  • Characterize SNPs

  • Improve the human reference sequence

  • Study regions under selection

  • Study variation across populations

  • Study mutation and recombination


Genomics bioinformatics and the revolution in biology

The 1000 Genomes Project

Current approaches include sequencing two HapMap trios (one from YRI, one CEU; father/mother/child) at 20X depth using next generation sequencing technology.

For one individual, 20X depth = 60 gigabases

For one trio, 20X depth = 180 gigabases

In another approach, sequence many individuals (n=1000) from the extended HapMap collection at lighter coverage.


Genomics bioinformatics and the revolution in biology

Conclusions

We briefly surveyed the fields of bioinformatics and genomics. Bioinformatics serves biology, and genomics depends on the tools of bioinformatics.

There are rapid advances in available technologies, such as next generation sequencing, that allow us to address fundamental biological questions at unprecedented resolution. These questions include the nature of variation within and between genomes of individuals, groups (gender, ethnicity, disease status), and across species. Other questions, posed decades ago, concern biological processes such as development, metabolism, adaptation, and function.


  • Login