Introduction to Genomic Sequencing Assembly Annotation Diego Martinez

Introduction to Genomic SequencingAssemblyAnnotationDiego Martinez

Why do we want to know the sequence of an entire genome?? To know all the genes – then proteins, then pathways… We can understand the biochemistry of the organism We can understand diseases Evolution Regulation – know all the upstream/downstream regions for proteins to bind and control transcription

Convinced???? Good. Lets Sequence!!! • 2 ways • Map (Public) • Several long steps – tiling • Expensive because they generated a complete map • Whole Genome Shotgun (Private) • Direct/cuts out some steps, missed 103 genes • Repeats! • Synthesis approach

Generate Data

Core principles • BAC-end sequences allow one to know physical association of sequence • More coverage leads to better sequence • Better algorithms make it easier • Longer sequence reads are critical

Figure 2.5 Relationships of chromosomes to genome sequencing markers Sixteen overlapping clones represent 1,408 BACs needed to span the 163 Mb X chromosome. (Avg insert 146 kb)

Assembly::put it all back together Assemble – BAC assemblies, Phrap (Phil Green, UW) Celera, WGS – Celera Assembler both – find overlaps of same sequence, build regions (contigs) put contigs together using paired end information – Order and Orient into large Scaffolds (also called super contigs.) Whole Genome Shotgun – automated, without tiling Finishing

Problems even Map-based couldn’t fix

Which method worked best? • WGS failed with highly repetitive regions • WGS, however, reduced overall workload for sequencing • Use hybrid approach • WGS used for 6-fold coverage • Reduced number of BACs needed to sequence by 93%

Annotation Need to make it useable – and fun!!! What is annotation? Find sequence features in the genome find genes (focus here) The act or process of furnishing critical commentary or explanatory notes. pseudogenes repeats reg. elements(very difficult, still in its infancy) attempt to describe gene function

Figure 2.6 alternative splicing NADPH oxidase H+ channel

Table 2.1 How annotation can be used to infer/understand biological niche

Example of annotation - What is a gene?

Functional Annotation – What does the protein do? • Found Genes • Basic approach – By similarity to known protein. • Old Style – Best Blast Hit • Can lead to funny incorrect annotations

Funny examples

Critical residues/multiple sequence alignment(lysozyme)

Gene Family Expansion

Signaling Pathways

Phanerochaete chrysosporium • Degrades lignin • 30 million base pair (30 MB) • This was the 1st basidiomycete – so gene finding was a big challenge • Estimate 11,777 genes

Genome Facts!

http://genome.jgi-psf.org/Phchr1/Phchr1.info.html

Phylogenies of Genes

Genome Evolution and RIP

Introduction to Genomic Sequencing Assembly Annotation Diego Martinez

Introduction to Genomic Sequencing Assembly Annotation Diego Martinez

Presentation Transcript

Introduction to Assembly

DNA Sequencing and Assembly

Introduction to annotation

The Undergraduate Genomic Sequencing Project

Genome Assembly and Annotation

Introduction to annotation

Genome Sequencing and Assembly High throughput Sequencing

Genome Sequencing and Assembly

Genome sequencing and annotation

Using BLAST for Genomic Sequence Annotation

Whole Genome Sequencing, Assembly and Annotation

Genome Sequencing Impact on Annotation

Introduction to Assembly

Genome Sequencing and Assembly High throughput Sequencing

Genomic Sequencing

Phymatotrichopsis omnivora genomic sequencing

Genome sequencing and annotation

Sequencing and Assembly

Using BLAST for Genomic Sequence Annotation

Sequencing technology and assembly

Introduction to GO Annotation