The h uman g enome impact in the biomedical domain
Download
1 / 74

- PowerPoint PPT Presentation


  • 169 Views
  • Uploaded on

The H uman G enome , impact in the biomedical domain. Sonia ABDELHAK, PhD Molecular Investigation of Genetic Orphan Disorders Institut Pasteur de Tunis. Human Genome Project. Historical context. Goals of the HGP. Strategy. Results. Impact on Biomedical domain. Discussion.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - ghada


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The h uman g enome impact in the biomedical domain l.jpg

The Human Genome, impact in the biomedical domain

Sonia ABDELHAK, PhD

Molecular Investigation of Genetic Orphan Disorders

Institut Pasteur de Tunis


Human genome project l.jpg
Human Genome Project

  • Historical context.

  • Goals of the HGP.

  • Strategy.

  • Results.

  • Impact on Biomedical domain.

  • Discussion.


Slide3 l.jpg

February 2001

« Finished » sequence

April 1953-April 2003


Brief history of hgp l.jpg
Brief history of HGP

1984 to 1986 – first proposed at US DOE meetings

1988 – endorsed by US National Research Council

(Funded by NIH and US DOE $3 billion set aside)

1990 – Human Genome Project started (NHGRI)

Later – UK, France, Japan, Germany, China

1998. Celera announces a 3-year plan to complete the project years early

First draft published in Science and Nature in February, 2001

Finished Human Genome sequence published in Nature 2003.


Challenges l.jpg
Challenges

  • Genome Attributes

    • Size

    • Polymorphism

    • Repeats (Smaller repeats are technically difficult to sequence, some sequences are repeated all over the genome: How can these be placed?).

  • Available Technology

    • 600 bp per “read”(Sequencing works by extension from a primer/ gel electrophoresis. Limited by resolution of gel).

    • Error (~1 error per 600. Sequencing multiple times decreases error; same error unlikely in multiple reads. 10x Coverage = error rate ~1/10,000).

    • Relies on cloning (Some regions are difficult to clone Heterochromatin; some sequences rearrange or are deleted when cloned)


Goals of hgp l.jpg
Goals of HGP

  • Create a genetic and physical map of the 24 human chromosomes (22 autosomes, X & Y)

  • Identify the entire set of genes & map them all to their chromosomes

  • Determine the nucleotide sequence of the estimated 3 billion base pairs

  • Analyze genetic variation among humans

  • Map and sequence the genomes of model organisms


Model organisms l.jpg
Model organisms

  • Bacteria (E. coli, influenza, several others)

  • Yeast (Saccharomyces cerevisiae)

  • Plant (Arabidopsis thaliana)

  • Roundworm (Caenorhabditis elegans)

  • Fruit fly (Drosophila melanogaster)

  • Mouse (Mus musculus)


Goals of hgp ii l.jpg
Goals of HGP (II)

  • Develop new laboratory and computing technologies to make all this possible

  • Disseminate genome information

  • Consider ethical, legal, and social issues associated with this research



Slide11 l.jpg

IL-12p35AC F

tggtggcagaaatcattgtctgaaaagtaattgttttacttttattcttttcgtgtgtgtgtgtgt

gtgtgtgtgtgtgtgtgtgtgtgtgtgtgtgcatgtgccagatttcttgtttgaaaggcaat

gagcttcatccaagtatcaa

78.57%

IL-12p35AC R

IL-12p40AC F

atttcaggtgtgagccactgtgcctggccagaactttttcaatgaatattcaagataattgtatacacattttatatatatatatatatatacacacacacacacacacacatatgtatacacacattatatatataatccatgttatatacatctctacattatatatatccactatatatattttacttatacatatagattttatttttatgaactaggatcaaattgta

69.23%

1 2 3 4 5

IL-12p40AC R

174

170

166

Identification de Polymorphismes de type microsatellites par analyse de séquence:


Est division e xpressed s equence t ags l.jpg

sequence1

ESTs

TAGTCA

clone xyz

sequence2

CGTACT

- isolate unique clones

- sequence once from each end

make cDNA

library

80-100,000 unique

cDNA clones in library

EST Division: Expressed Sequence Tags

dbEST http://www.ncbi.nlm.nih.gov/dbEST/

>IMAGE:275615 5' mRNA sequence

GACAGCATTCGGGCCGAGATGTCTCGCTCCGTGGCCTTAGCTGTGCTCGCGCTACTCTCTCTTTCTGGCC

TGGAGGTATCCAGCGTACTCCAAAGATTCAGGTTTACTCACGTCATCCAGCAGAGAATGGAAAGTCAAAT

TTCCTGAATTGCTATGTGTCTGGGTTTCATCCATCCGACATTGAAGTTGACTTACTGAAGAATGGAGAGA

GAATTGAAAAAGTGGAGCATTCAGACTTGTCTTTCAGCAAGGACTGGTCTTTCTATCTCTTGTACTACAC

TGAATTCACCCCCACTGAAAAAGATGAGTATGCCTGCCGTGTTGAACCATGTNGACTTTGTCACAGNCCC

AAGTTNAGTTTAAGTGGGNATCGAGACATGTAAGGCAGGCATCATGGGAGGTTTTGAAGNATGCCGCNTT

TTGGATTGGGATGAATTCCAAATTTCTGGTTTGCTTGNTTTTTTAATATTGGATATGCTTTTG

nucleus

80-100,000

genes

>IMAGE:275615 3', mRNA sequence

NNTCAAGTTTTATGATTTATTTAACTTGTGGAACAAAAATAAACCAGATTAACCACAACCATGCCTTACT

TTATCAAATGTATAAGANGTAAATATGAATCTTATATGACAAAATGTTTCATTCATTATAACAAATTTCC

AATAATCCTGTCAATNATATTTCTAAATTTTCCCCCAAATTCTAAGCAGAGTATGTAAATTGGAAGTTAA

CTTATGCACGCTTAACTATCTTAACAAGCTTTGAGTGCAAGAGATTGANGAGTTCAAATCTGACCAAGAT

GTTGATGTTGGATAAGAGAATTCTCTGCTCCCCACCTCTANGTTGCCAGCCCTC

80-100,000 RNA

gene products


Slide13 l.jpg

Chimie de séquençage

Dye Terminator (6)

Taq

A G C T A T

...

amorce

T C G A T A

ADN

réaction de

séquence

Electrophorèse

Gel plat / capillaire

A G C T A

T

Analyse automatique

A G C T

A

A

G

C

T

A

T

A G C

T

dépot

détection

A G

C

A

G

A


Two competing strategies for human genome l.jpg
Two Competing Strategies for Human Genome

  • (Hierarchical shotgun) [Public human genome project]

  • Whole-genome Shotgun [Celera project]


Sequencing l.jpg
Sequencing

BAC:

Bacterial Artificial

Chromosome clone

Contig: joined

overlapping collection

of sequences or clones.


Whole genome shotgun sequencing private company celera used to sequence whole human genome l.jpg
Whole-genome shotgun sequencingPrivate company Celera used to sequence whole human genome

  • Whole genome randomly sheared three times

    • Plasmid library constructed with ~ 2kb inserts

    • Plasmid library with ~10 kb inserts

    • BAC library with ~ 200 kb inserts

  • Computer program assembles sequences into chromosomes

  • No physical map construction

  • Only one BAC library

  • Reduces problems of repeat sequences


Slide17 l.jpg

A

G

C

T

A

T

Différentes étapes d’analyse de séquence

Vérification de la qualité de séquence

Elimination des séquences contaminantes

Blastn contre des banques de vecteurs, de bactéries, levures,…

Assemblage, Phred, Phrap, Consed

Identification des séquences potentiellement codantes

Comparaison avec les banques de données,

Logiciels de prédictions d’exons.


Slide19 l.jpg

EBI

Entrez

NIH

NCBI

GenBank

  • Submissions

  • Updates

  • Submissions

  • Updates

EMBL

DDBJ

CIB

NIG

  • Submissions

  • Updates

SRS

EMBL

getentry


Htg division h igh t hroughput g enome records l.jpg

phase 1

HTG

Acc = AC008701 gi = 6601005

phase 2

HTG

Acc = AC008701 gi = 6671909

PRI

phase 3

Acc = AC008701 gi = 7328720

HTG Division: High Throughput Genome Records

40,000 to > 350,000 bp


Slide21 l.jpg

2.88 Gbp

2,851,330,913


Gene prediction l.jpg
Gene prediction

  • Easy for procaryotes (single cell) – one gene, one protein

  • More difficult for eukaryotes (multicell) – one gene, many proteins

  • Very difficult for Human – short exons separated by non-coding long introns


Gene recognition l.jpg
Gene recognition

  • Coding region and non-coding region have different sequence profiles

    • coding region is “protected” from mutation and is less random

  • Gene recognition by sequence alignment

  • Gene prediction by Hidden Markov Model trained by set of known genes

  • Many genes are homologs – similar in vastly different organisms


Two predictions disagree l.jpg
Two predictions disagree

“…predicted transcripts

collectively contain partial

matches to nearly all known

genes, but the novel genes

predicted by both groups

are largely non-overlapping.”

John B. Hogenesch, et al

Cell, Vol. 106, 413–415

August 24, 2001


Human genome content l.jpg
Human genome content

  • The Human Genome

  • Total length 3000 Mb

  • ~ 40,000 genes (coding seq)

  • Gene sequences < 5%

    • Exons ~ 1.5% (coding)

    • Introns ~ 3.5% (noncoding)

  • Intergenic regions (junk) > 95%

  • Repeats > 50%


Global properties l.jpg
Global properties

  • Pericentromeric and subtelomeric regions of chromosomes filled with large recent transposable elements

  • Marked decline in the overall activity of transposable elements or transposons

  • Male mutation rate about twice female

    • most mutation occurs in males

  • Recombination rates much higher in distal regions of chromosomes and on shorter chromosome arms

    • > one crossover per chromosome arm in each meiosis


Fig 17 transposables l.jpg
Fig 17 transposables

Interspersed repeats: fixed transposable

elements copied to non-homologous regions.

Total 45%

Classes of transposable elements. LINE, long interspersed

element. SINE short interspersed element.


Fig 21 l.jpg
Fig 21

Genes are sometimes protected from repeats

Two regions of about 1 Mb on chromosomes 2 and 22. Red bars,

interspersed repeats; blue bars, exons of known genes. Note the

deficit of repeats in the HoxD cluster, which contains a collection

of genes with complex, interrelated regulation.


Important features of human proteome l.jpg
Important features of Human proteome

  • 30,000–40,000 protein-coding genes

  • Proteome (full set of proteins) more complex than those of invertebrates.

    • pre-existing components arranged into a richer architectures.

  • Hundreds of genes seem to come from horizontal transfer from bacteria questionable

  • Dozens of genes seem to come from transposable elements.


Noncoding rna genes l.jpg
Noncoding RNA genes

  • Transfer RNAs (tRNAs) – adaptors that translate triplet code of RNA into amino acid sequence of proteins

  • Ribosomal RNAs (rRNAs) – components of ribosome

  • Small nucleolar RNAs (snoRNAs) – RNA processing and base modification in nucleolus

  • Small nuclear RNAs (sncRNAs) - spliceosomes


Human races have similar genes l.jpg
Human races have similar genes

  • Genome sequence centers have sequenced significant portions of at least three races

  • Range of polymorphisms within a race can be much greater than the range of differences between any two individuals of different race

  • Very few genes are race specific



Fig 35a l.jpg
Fig 35a

Size distributions of exons in Human, Worm and Fly. Human have shorter exons.


Fig 35c l.jpg
Fig 35c

Size distributions

of intons in

Human, Worm

and Fly.

Human have

longer introns.


Slide39 l.jpg


Combinatorial strategies l.jpg
Combinatorial strategies

  • At DNA level – T-cell receptor genes are encoded by a multiplicity of gene segments

Fig. 10.21

  • At RNA level – splicing of exons in different orders


Yeast l.jpg
Yeast

  • 70 human genes are known to repair mutations in yeast

  • Nearly all we know about cell cycle and cancer comes from studies of yeast

  • Advantages:

    • fewer genes (6000)

    • few introns

    • 31% of yeast genes give same products as human homologues


Drosophila l.jpg
Drosophila

  • nearly all we know of how mutations affect gene function come from Drosophila studies

  • We share 50% of their genes

    • 61% of genes mutated in 289 human diseases are found in fruit flies

    • 68% of genes associated with cancers are found in fruit flies

  • Knockout mutants

  • Homeobox genes


C elegans l.jpg
C. elegans

  • 959 cells in the nervous system

  • 131 of those programmed for apoptosis

  • apoptosis involved in several human genetic neurological disorders

    • Alzheimers

    • Huntingtons

    • Parkinsons


Mouse l.jpg
Mouse

  • known as “mini” humans

    • Very similar physiological systems

    • Share 90% of their genes


Questions remain about the human genome l.jpg
Questions Remain about the Human Genome

  • Difficult to precisely estimate number of genes at this time

    • Small genes are hard to identify

    • Some genes are rarely expressed and do not have normal codon usage patterns – thus hard to detect



Applications to medicine and biology l.jpg
Applications to medicine and biology

  • Disease genes

    • human genomic sequence in public databases allows rapid identification of disease genes in silico

  • Drug targets

    • pharmaceutical industry has depended upon a limited set of drug targets to develop new therapies

    • now can find new target in silico

  • Basic biology

    • basic physiology, cell biology…




Slide51 l.jpg

mm

Mm

Mm

A1A1

A1A2

A1A2

A2A2

A1A2

A1A1

A1A1

MM

Mm

mm

mm

Hérédité autosomique récessive


Slide52 l.jpg

Les mutations ponctuelles

Création de codon stop

CAG Gln

TAG


Slide53 l.jpg

Positional cloning of genes

Disease

Chromosomal localisation

Function/

Protein

Gene

Disease

Chromosomal localisation

Function/

Protein

Gene


Slide54 l.jpg

Recherche de familles

anomaliecytogénétique

-détermination du phénotype

-collecte d'ADN

Cartographie génétique

-localisation chromosomique

-localisation fine

Cartographie physique

et

Isolement de clones spécifiques

Isolement de gène (s)

n

o

r

m

a

l

m

u

t

é

.

.

.

C

C

T

G

A

G

G

A

G

.

.

.

.

.

.

C

C

T

G

T

G

G

A

G

.

.

.

Recherche de mutations

Etude fonctionnelle

.

.

.

P

r

o

G

l

u

G

l

u

.

.

.

.

.

.

P

r

o

V

a

l

G

l

u

.

.

.

1 to 10 years!


Slide55 l.jpg

11083

9480

4405

10910

a)

1

2

1

4

-

1

1

1

'

2

3

4

5

6

7

8

9

1

0

1

1

1

3

1

5

1

6

b)

-

I

I

I

'

I

I

I

I

I

I

V

V

V

I

V

I

I

V

I

I

I

I

X

X

X

I

X

I

I

X

I

V

X

V

X

I

I

I

c)

EYA1 gene structure

Bronchio-Oto-Renal Syndrome


Slide56 l.jpg

Recherche de familles

anomaliecytogénétique

-détermination du phénotype

-collecte d'ADN

Cartographie génétique

-localisation chromosomique

-localisation fine

Cartographie physique

et

Isolement de clones spécifiques

Isolement de gène (s)

n

o

r

m

a

l

m

u

t

é

.

.

.

C

C

T

G

A

G

G

A

G

.

.

.

.

.

.

C

C

T

G

T

G

G

A

G

.

.

.

Recherche de mutations

Etude fonctionnelle

.

.

.

P

r

o

G

l

u

G

l

u

.

.

.

.

.

.

P

r

o

V

a

l

G

l

u

.

.

.


Slide58 l.jpg

.... From in vivo to in vitro to in silico



Slide62 l.jpg

3

3

m

7

3

3

M

8

3

3

M

7

3

3

M

10

2

2

M

11

2

2

M

11

3

3

M

7

3

3

M

8

4

4

M

5

3

3

M

10

3

3

M

8

3

3

M

8

3

3

m

6

3

3

M

10

3

3

m

7

3

3

M

10

3

3

m

7

3

3

M

10

3

3

M

8

3

3

M

7

3

3

m

6

3

3

M

8

Famille EBDD-I

Sous le mode dominant

I

II

III

4

7

4

IV

3

2

V

3

3

M

3

3

m

7

5

2

M

9


Slide63 l.jpg

Environnement?

Individu 1

G1 Malade

Individu 2

G1  Sain

??

Maladie à pénétrance incomplète et expressivité variable


Slide64 l.jpg

G1/1

G1/2

Epissage alternatif

Non Sens mRNA decay

Mécanisme de régulation

post-transcriptionnelle

G3

G2

Gènes modificateurs


Slide65 l.jpg

Complex /common disorders: multifactoriel

Environemental factors

Genetic factors


Slide66 l.jpg

Complex Diseases : Genes & Environment

Environmental Effect

Genetic Component

Hemophilia

Cystic Fibrosis

Stroke

Asthma

Lung Cancer

Skin Cancer

Alzheimer’s

Cardiovascular Disease

Motor Vehicle

Accident

Schizophrenia

Familial Colon or Breast Cancer

Type 2 Diabetes

Bipolar Disorder


Slide67 l.jpg

Predisposition

Targeted screening

Prevention

Diagnosis

Therapy

Predictive medicine

The potential benefits of identifying genes/variations

involved in disease

  • Improve the understanding of disease etiology and mechanism

  • Early disease risk assessment

  • Discover new drug targets

  • Disease prevention

  • population or ethnic group variability


Slide68 l.jpg

Pharmacogenomics:

The Promise of Personalized Medicine



Acknowledgement the following presentation has been prepared on the basis of l.jpg
Acknowledgement: the following presentation has been prepared on the basis of

  • Internet resources.

  • InternationalHuman Genome Sequencing Consortium. Initial sequencing and analysis of the humangenome. Nature 409, 860–921 (2001).

  • Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

  • International Human Genome Sequencing Consortium. Finishing the euchromatic sequence ofthe human genome., Nature 431: 931-945 (2004).


Slide74 l.jpg

Thank you prepared on the basis of


ad