human genome sequence and variability n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Human Genome Sequence and Variability PowerPoint Presentation
Download Presentation
Human Genome Sequence and Variability

Loading in 2 Seconds...

play fullscreen
1 / 49

Human Genome Sequence and Variability - PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on

Human Genome Sequence and Variability. Gabor T. Marth, D.Sc. Department of Biology, Boston College marth@bc.edu. Medical Genomics Course – Debrecen, Hungary, May 2006. Lecture overview. 1. Genome sequencing strategies, sequencing informatics.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Human Genome Sequence and Variability' - enye


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
human genome sequence and variability
Human Genome Sequence and Variability

Gabor T. Marth, D.Sc.

Department of Biology, Boston College

marth@bc.edu

Medical Genomics Course – Debrecen, Hungary, May 2006

slide2

Lecture overview

1. Genome sequencing strategies, sequencing informatics

2. Genome annotation, functional and structural features in the human genome

3. Genome variability, DNA nucleotide, structural, and epigenetic variations

slide5

The genome sequence

  • the primary template on which to outline functional features of our genetic code (genes, regulatory elements, secondary structure, tertiary structure, etc.)
slide6

~3,000 Mb

>100 Mb

~100 Mb

Completed genomes

~1 Mb

slide7

Whole-genome shotgun sequencing

Main genome sequencing strategies

Clone-based shotgun sequencing

Human Genome Project

Celera Genomics, Inc.

slide8

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing

sequence reconstruction (sequence assembly)

Lander et al. Nature 2001

slide10

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)

Lander et al. Nature 2001

slide11

Shotgun subclone library construction

cloning vector

BAC primary clone

subclone insert

sequencing vector

slide12

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)

Lander et al. Nature 2001

slide14

Robotic automation

Lander et al. Nature 2001

slide15

Base calling

PHRED

base = A

Q = 40

slide17

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)

Lander et al. Nature 2001

slide20

Sequence completion (finishing)

region of low sequence coverage and/or quality

gap

CONSED, AUTOFINISH

slide22

Genome annotation – Goals

repetitive elements

protein coding genes

RNA genes

GC content

slide23

The starting material

AGCGTGGTAGCGCGAGTTTGCGAGCTAGCTAGGCTCCGGATGCGA

CCAGCTTTGATAGATGAATATAGTGTGCGCGACTAGCTGTGTGTT

GAATATATAGTGTGTCTCTCGATATGTAGTCTGGATCTAGTGTTG

GTGTAGATGGAGATCGCGTAGCGTGGTAGCGCGAGTTTGCGAGCT

AGCTAGGCTCCGGATGCGACCAGCTTTGATAGATGAATATAGTGT

GCGCGACTAGCTGTGTGTTGAATATATAGTGTGTCTCTCGATATGT

AGTCTGGATCTAGTGTTGGTGTAGATGGAGATCGCGTGCTTGAG

TCGTTCGTTTTTTTATGCTGATGATATAAATATATAGTGTTGGTG

GGGGGTACTCTACTCTCTCTAGAGAGAGCCTCTCAAAAAAAAAGCT

CGGGGATCGGGTTCGAAGAAGTGAGATGTACGCGCTAGXTAGTAT

ATCTCTTTCTCTGTCGTGCTGCTTGAGATCGTTCGTTTTTTTATGCT

GATGATATAAATATATAGTGTTGGTGGGGGGTACTCTACTCTCTCT

AGAGAGAGCCTCTCAAAAAAAAAGCTCGGGGATCGGGTTCGAAGA

AGTGAGATGTACGCGCTAGXTAGTATATCTCTTTCTCTGTCGTGCT

slide24

Coding genes – ab initio predictions

Stop codon

Start codon

ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA

PolyA signal

Open Reading Frame = ORF

slide25

Ab initio predictions

Gene structure

slide26

Ab initio predictions

…AGAATAGGGCGCGTACCTTCCAACGAAGACTGGG…

splice acceptor site

splice donor site

slide27

Ab initio predictions

Genscan

Grail

Genie

GeneFinder

Glimmer

etc…

EST_genome

Sim4

Spidey

EXALIN

slide28

Homology based predictions

known coding sequence from another organism

expressed sequence

ACGGAAGTCT

GGACTATAAA

ATGGCACCACCGATGTCTACGTGGTAGGGGACTATAAAAAAAAAAA

genes predicted by homology

Genomescan

Twinscan

etc…

slide29

Consolidation – gene prediction systems

Sim4

dbEst

Genewise

Grail

Genscan

FgenesH

Ensembl

Otto

slide30

ncRNA genes

prediction based on structure (e.g. tRNAs)

for other novel ncRNAs, only homology-based predictions have been successful

slide31

Repeat annotations

Repeat annotation are based on sequence similarity to known repetitive elements in a repeat sequence library

slide33

Gene annotations – # of coding genes

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

slide34

Gene annotations – gene length

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

slide35

Gene annotations – gene function

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

slide36

GC content and coding potential

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

slide37

ncRNAs

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

slide38

Segmental duplications

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

slide39

Repeat elements

Lander et al. Initial sequencing and analysis of the human genome, Nature, 2001

slide41

Physical vs. genetic map (Mb/cM)

0.4 cM

1.3 cM

0.7 cM

0.4 Mb

0.7 Mb

0.3 Mb

slide43

DNA sequence variations

  • the reference Human genome sequence is 99.9% common to each human being
  • sequence variations make our genetic makeup unique
  • the most abundant human variations are single-nucleotide polymorphisms (SNPs) – 10 million SNPs are currently known

SNP

slide44

DNA sequence variations

insertion-deletion (INDEL) polymorphisms

slide45

Structural variations

Speicher & Carter, NRG 2005

slide46

Structural variations

Feuk et al. Nature Reviews Genetics7, 85–97 (February 2006) | doi:10.1038/nrg1767

slide47

Detection of structural variants

Feuk et al. Nature Reviews Genetics7, 85–97 (February 2006) | doi:10.1038/nrg1767