234 Views

Download Presentation
## Reference based assembly

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Reference based assembly**Macrogen Inc 김세환**Scripture VS Cufflinks**SIMILATIRY Both programs then build directed graphs and traverse the graphs to identify distinct transcripts, using paired end information to link sparsely covered transcripts and filter out unlikely isoforms DIFFERENCE - Cufflinks uses a rigorous mathematical model to identify the complete set of alternatively regulated transcripts at each locus - Scripture employs a statistical segmentation model to distinguish expressed loci and filter out experimental noise**Ab initio reconstruction of cell type-specific**transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs(Scripture) NATURE BIOTECHNOLOGY MAY 2010**1**2 3 4 5**1. Map Read to Genome**• Using Tophat, • since ~30% of 76 base reads are expected on average to span an exon-exon junction • ‘spliced’ reads provide direct information (GT/AG or GC/AG,AT/AC)**2. Construct Connectivity Graph**• Use only ‘spliced’ reads for construction of connectivity graph • Splicing motifs provide direct information (GT/AG or GC/AG,AT/AC) • Node = base, edge = connection between base A G T A G T C G A A G T A A C A A A T C A C A G A G A A A A T A A A A A**3. Identify Significantly Enriched Paths**• Use a statistical segmentation strategy : • segmentation approach identifies regions of mapped read enrichment compared to the genomic background A G T A G T C G A A G T A A C A A A T C A C A G A G A A A A T A G A C C C C G**4. Construct Transcript Graphs**• Each node in a transcript graph is an exon and each edge is a splice junction • A path through the graph represents one isoform of the gene**5. Weighting of Isoforms**Isoform 1 Insert size distribution (Σ probability of insert size of paired read) Normalized weighted score of Isoform 1 = (# of paired read) Filter out :: Normalized weighted score < 0.1**Transcript assembly and quantification by RNA-Seq reveals**unannotated transcripts and isoform switching during cell differentiation(Cufflinks) NATURE BIOTECHNOLOGY MAY 2010**Cufflinks**Cufflinks seek an assembly that parsimoniously explains the fragments from the RNA-Seq experiment; => Every fragment in the experiment should have come from a Cufflinks transcript, and Cufflinks should produce as few transcripts as possible with that property**Transcript Assembly**Isoform 1 Isoform 2**Transcript Assembly**Compatibility Incompatiblity Nested Uncertain : x4 - compatibility & incompatibility -**Transcript Assembly**Nested incompatible**Transcript Assembly**Nested incompatible chain**Transcript Assembly**Bipartite graph Directed Acyclic Graph**Transcript Assembly**Theorem (Dilworth's theorem) Let P be a finite partially ordered set. The maximum number of elements in any antichainof P equals the minimum number of chains in any partition of P into chains Theorem (Konig's theorem) In a bipartite graph, the number of edges in a maximum matching equals the number of vertices in a minimum vertex cover. Theorem Dilworth's theorem is equivalent to Konig's theorem. Hasse diagram & reachability graph**Transcript Assembly**Finally, Finding minimum number of chains in directed acyclic graph is reduced to finding maximum matching problem in bipartite graph This can be solved by LEMON and Boost graph library.**Conditions for filtering transctript x**• x aligns to the genome entirely within an intronic region of the alignment for a transcript y, and the abundance of x is less than 15% of y's abundance. • x is supported by only a single fragment alignment to the genome. • More than 75% of the fragment alignments supporting x, are mappable to multiple genomic loci. • x is an isoform of an alternatively spliced gene, and has an estimated abundance less than 5% of the major isoform of the gene.**Keyword for Fresher**1.Reference-based assembly == mapping-first approach**Keyword for Intermediate**• 1. Graph theory • - reading recommendation : introduction to graph theory**Keyword for Expert**1. Scan statistics**Transcript Assembly**Bipartite graph Directed Acyclic Graph