large scale genome projects l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Large-scale genome projects PowerPoint Presentation
Download Presentation
Large-scale genome projects

Loading in 2 Seconds...

play fullscreen
1 / 47

Large-scale genome projects - PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on

Strategy. Libraries. Sequencing. Assembly. Closure. Annotation. Release. Large-scale genome projects. Sequencing DNA molecules in the Mb size range All strategies employ the same underlying principles: Random Shotgun sequencing. Genomic DNA. Shearing/Sonication.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Large-scale genome projects' - nile


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
large scale genome projects

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Large-scale genome projects
  • Sequencing DNA molecules in the Mb size range
  • All strategies employ the same underlying principles:
  • Random Shotgun sequencing
slide2

Genomic DNA

Shearing/Sonication

Subclone and Sequence

Shotgun reads

Assembly

Contigs

Finishing read

Finishing

Complete sequence

strategies for sequencing

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Strategies for sequencing
  • How big can you go??
  • Large-insert clones
    • cosmids 30-40 kb
    • BACs/PACs 50 - 100 kb
  • Whole chromosomes
  • Whole genomes
genome size and sequencing strategies
Genome size and sequencing strategies

Genome size (log Mb)

4

0

1

2

3

H.sapiens (3000 Mb)

D.melanogaster (170 Mb)

C.elegans (100Mb)

P.falciparum (30 Mb)

S.cerevisiae (14 Mb)

E.coli (4 Mb)

Whole genome shotgun (WGS)

Clone-by-clone

Whole Chromosome Shotgun (WCS)

Whole Genome Shotgun (WGS)

with Clone ‘skims’

slide9

Genomic DNA

Shearing/Sonication

Subclone and Sequence

Shotgun reads

Assembly

Contigs

Finishing read

Finishing

Complete sequence

strategies for sequencing10

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Strategies for sequencing
  • Size and GC composition of genome
    • Volume of data
    • Ease of cloning
    • Ease of sequencing
  • Genome complexity
    • dispersed repetitive sequence
    • telomeres & centromeres
  • Politics/Funding
strategies clone by clone

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Strategies: Clone by Clone
  • Simple (0.5 - 2 K reads)
  • Few problems with repeats
  • Relatively simple informatics
  • Scalability
  • Quality of physical map
    • Fingerprint / STS maps
    • End sequencing
strategies whole chromosome shotgun wcs

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Strategies: Whole Chromosome shotgun (WCS)
  • Requires chromosome isolation
  • Moderate complexity (10’s K reads)
  • Problems with repeats
  • Complex informatics
  • Inefficient in isolation
  • Quality of physical map
    • Skims of mapped clones
strategies whole genome shotgun wgs

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Strategies: Whole Genome shotgun (WGS)
  • Moderate to High complexity (10-100’s K reads)
  • Problems with repeats
  • Complex informatics
  • Quality of physical map
    • Fingerprint map
    • STS markers
    • End-sequences
    • Skims of mapped clones
sequencing my genome

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Sequencing my genome

Politics

Production

Finishing

Annotation

TIME

MONEY

what do you get

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

What do you get?

DATA!!, DATA !!, and more DATA!!

  • Sequence
    • incomplete v complete
  • First-pass annotation
    • Gene discovery
    • Full annotation
  • A starting point for research
sequencing

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Sequencing

  • Library construction
  • Colony picking
  • DNA preparation
  • Sequencing reactions
  • Electrophoresis
  • Tracking/Base calling
libraries

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Libraries

  • Essentially Sub-cloning
  • Generation of small insert libraries in a well characterised vector.
    • Ease of propagation
    • Ease of DNA purification
    • e.g. puc18, M13
libraries testing

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Libraries - testing

  • Simple concepts
    • Insert/Vector ratio
  • Real data
    • Insert size
    • Sequence ….
    • Simple analysis
sequence generation

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Sequence generation

  • Pick colonies
  • Template preparation
  • Sequence reactions
    • Standard terminator chemistry
    • pUC libraries sequenced with forward and reverse primers
sequence generation23

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Sequence generation

  • Electrophoresis of products
    • Old style - slab gels, 32 > 64 > 96 lanes
    • New style - capillary gels, 96 lanes
  • Transfer of gel image to UNIX
    • Sequencing machines use a slave Mac/PC
    • Move data to centralised storage area for processing
gel image processing

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Gel image processing

  • Light-to-Dye estimation
  • Lane tracking
  • Lane editing
  • Trace extraction
  • Trace standardisation
    • Mobility correction
    • Background substitution
pre processing

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Pre-processing

  • Base calling using Phred
    • modifies SCF file
  • Quality clipping
  • Vector clipping
    • Sequencing vector
    • Cloning vector
  • Screen for contaminants
  • Feature mark up (repeats/transposons)
finishing

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Finishing
  • Assembly: Process of taking raw single-pass reads into contiguous consensus sequence
  • Closure: Process of ordering and merging consensus sequences into a single contiguous sequence
  • Finished is defined as sequenced on both strands using multiple clones. In the absence of multiple clones the clone must be sequenced with multiple chemistries. The overall error rate is estimated at less than 1 error per 10 kb
genome assembly

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Genome Assembly
  • Pre-assembly
  • Assembly
  • Automated appraisal
  • Manual review
pre assembly

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Pre-Assembly
  • Convert to CAF format
    • flatfile text format
    • choice of assembler
    • choice of post-assembly modules
    • choice of assembly editor

www.sanger.ac.uk/Software/CAF

assembly

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Assembly
  • Assemble using Phrap
  • Read fasta & quality scores from CAF file
  • Merge existing Phrap .ace file as necessary
  • Adjust clipping
assembly appraisal

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Assembly appraisal
  • auto-edit
    • removes 70% of read discrepancies
  • Remove cloning vector
  • Mark up sequence features
  • finish
    • Identify low-quality regions
    • Cover using ‘re-runs’ and ‘long-runs’
  • Compare with current databases
    • plate contamination
manual assembly appraisal

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Manual Assembly appraisal
  • Use a sequence editor (GAP/consed)
  • Tools to identify Internal joins
  • Tools to identify and import data from an overlapping projects
  • Tools to check failed or mis-assembled reads for inclusion in project
manual editing

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Manual editing
  • Sanger uses 100% edit strategy
  • Where additional data is required:
    • Check clipping
    • Additional sequencing
      • Template / Primer / Chemistry
  • Assemble new data into project
    • GAP4 Auto-assemble
    • Repeat whole process
manual quality checks

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Manual Quality Checks
  • Force annotation tag consistency
  • All unedited data is re-assembled using Phrap
  • All high-quality discrepancies are reviewed
  • Confirm restriction digest (clones)
  • Check for inverted repeats
  • Manually check:
    • Areas of high-density edits
    • Areas with no supporting unedited data
    • Areas of low read coverage
gap closure

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Gap closure
  • Read pairs
  • PCR reactions (long-range / combinatorial)
  • Small-insert libraries
  • Transposon-insertion libraries
gap closure contig ordering

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Gap closure - contig ordering
  • Read pair consistency
  • STS mapping
    • Physical mapping
    • Genetic mapping
    • Optical mapping
  • Large-insert clone
    • skims
    • end-sequencing
annotation

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Annotation
  • DNA features (repeats/similarities)
  • Gene finding
  • Peptide features
  • Initial role assignment
  • Others- regulatory regions
annotation of eukaryotic genomes
Annotation of eukaryotic genomes

Genomic DNA

ab initio gene prediction

transcription

Unprocessed RNA

RNA processing

Mature mRNA

Gm3

AAAAAAA

Comparative gene prediction

translation

Nascent polypeptide

folding

Active enzyme

Functional identification

Function

Reactant A

Product B

dna features

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

DNA features
  • Similarity features
  • mapping repeats
    • simple tandem and inverted
    • repeat families
  • mapping DNA similarities
    • EST/mRNAs in eukaryotes
    • Duplications,
    • RNAs
  • mapping peptide similarities
    • protein similarities
gene finding

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Gene finding
  • ORF finding (simple but messy)
  • ab initio prediction
    • Measures of codon bias
    • Simple statistical frequencies
  • Comparative prediction
    • Using similarity data
    • Using cross-species similarities
peptide features

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Peptide features
  • Peptide features
    • low-complexity regions
    • trans-membrane regions
    • structural information (coiled-coil)
  • Similarities and alignments
  • Protein families (InterPro/COGS)
initial role assignment

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Initial role assignment
  • Simple attempt to describe the functional identity of a peptide
  • Uses data from:
    • peptide similarities
    • protein families
  • Vital for data mining
  • Large number of predicted genes remain hypothetical or unknown
other regulatory features

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Other regulatory features
  • Ribosomal binding sites
  • Promoter regions
data release

Strategy

Libraries

Sequencing

Assembly

Closure

Annotation

Release

Data Release
  • DNA release
    • Unfinished
    • Finished
  • Nucleotide databases
    • GENBANK/EMBL/DDBJ
  • Peptide databases
    • SWISSPROT/TREMBL/GENPEPT
  • Others