introduction to genomic sequencing assembly annotation diego martinez l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Genomic Sequencing Assembly Annotation Diego Martinez PowerPoint Presentation
Download Presentation
Introduction to Genomic Sequencing Assembly Annotation Diego Martinez

Loading in 2 Seconds...

play fullscreen
1 / 26

Introduction to Genomic Sequencing Assembly Annotation Diego Martinez - PowerPoint PPT Presentation


  • 177 Views
  • Uploaded on

Introduction to Genomic Sequencing Assembly Annotation Diego Martinez. Why do we want to know the sequence of an entire genome??. To know all the genes – then proteins, then pathways… We can understand the biochemistry of the organism We can understand diseases Evolution Regulation

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Introduction to Genomic Sequencing Assembly Annotation Diego Martinez' - johana


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
why do we want to know the sequence of an entire genome
Why do we want to know the sequence of an entire genome??

To know all the genes – then proteins, then pathways…

We can understand the biochemistry of the organism

We can understand diseases

Evolution

Regulation

– know all the upstream/downstream regions for proteins to bind and control transcription

convinced good lets sequence
Convinced???? Good. Lets Sequence!!!
  • 2 ways
    • Map (Public)
      • Several long steps – tiling
      • Expensive because they generated a complete map
    • Whole Genome Shotgun (Private)
      • Direct/cuts out some steps, missed 103 genes
      • Repeats!
    • Synthesis approach
core principles
Core principles
  • BAC-end sequences allow one to know physical association of sequence
  • More coverage leads to better sequence
  • Better algorithms make it easier
  • Longer sequence reads are critical
figure 2 5 relationships of chromosomes to genome sequencing markers
Figure 2.5 Relationships of chromosomes to genome sequencing markers

Sixteen overlapping clones represent 1,408 BACs needed to span the 163 Mb X chromosome. (Avg insert 146 kb)

assembly put it all back together
Assembly::put it all back together

Assemble –

BAC assemblies, Phrap (Phil Green, UW)

Celera, WGS – Celera Assembler

both – find overlaps of same sequence, build regions (contigs)

put contigs together using paired end information – Order and Orient into large Scaffolds (also called super contigs.)

Whole Genome Shotgun – automated, without tiling

Finishing

which method worked best
Which method worked best?
  • WGS failed with highly repetitive regions
  • WGS, however, reduced overall workload for sequencing
  • Use hybrid approach
    • WGS used for 6-fold coverage
    • Reduced number of BACs needed to sequence by 93%
annotation
Annotation

Need to make it useable – and fun!!!

What is annotation?

Find sequence features in the genome

find genes (focus here)

The act or process of furnishing critical commentary or explanatory notes.

pseudogenes

repeats

reg. elements(very difficult, still in its infancy)

attempt to describe gene function

figure 2 6 alternative splicing
Figure 2.6 alternative splicing

NADPH oxidase

H+ channel

functional annotation what does the protein do
Functional Annotation – What does the protein do?
  • Found Genes
  • Basic approach – By similarity to known protein.
    • Old Style – Best Blast Hit
      • Can lead to funny incorrect annotations
phanerochaete chrysosporium
Phanerochaete chrysosporium
  • Degrades lignin
  • 30 million base pair (30 MB)
  • This was the 1st basidiomycete – so gene finding was a

big challenge

  • Estimate 11,777 genes