Reference based assembly
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

Reference based assembly PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on
  • Presentation posted in: General

Reference based assembly. Macrogen Inc 김세환. Reconstructing transcripts from RNA-Seq. Scripture VS Cufflinks. SIMILATIRY.

Download Presentation

Reference based assembly

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Reference based assembly

Reference based assembly

Macrogen Inc

김세환


Reference based assembly

Reconstructing transcripts from RNA-Seq


Reference based assembly

Scripture VS Cufflinks

SIMILATIRY

Both programs then build directed graphs and traverse the graphs to identify distinct transcripts, using paired end information to link sparsely covered transcripts and filter out unlikely isoforms

DIFFERENCE

- Cufflinks uses a rigorous mathematical model to identify the complete set of alternatively regulated transcripts at each locus

- Scripture employs a statistical segmentation model to distinguish expressed loci and filter out experimental noise


Nature biotechnology may 2010

Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs(Scripture)

NATURE BIOTECHNOLOGY

MAY 2010


Reference based assembly

1

2

3

4

5


Reference based assembly

1. Map Read to Genome

  • Using Tophat,

  • since ~30% of 76 base reads are expected on average to span an exon-exon junction

  • ‘spliced’ reads provide direct information (GT/AG or GC/AG,AT/AC)


Reference based assembly

2. Construct Connectivity Graph

  • Use only ‘spliced’ reads for construction of connectivity graph

  • Splicing motifs provide direct information (GT/AG or GC/AG,AT/AC)

  • Node = base, edge = connection between base

A

G

T

A

G

T

C

G

A

A

G

T

A

A

C

A

A

A

T

C

A

C

A

G

A

G

A

A

A

A

T

A

A

A

A

A


Reference based assembly

3. Identify Significantly Enriched Paths

  • Use a statistical segmentation strategy :

  • segmentation approach identifies regions of mapped read enrichment compared to the genomic background

A

G

T

A

G

T

C

G

A

A

G

T

A

A

C

A

A

A

T

C

A

C

A

G

A

G

A

A

A

A

T

A

G

A

C

C

C

C

G


Reference based assembly

4. Construct Transcript Graphs

  • Each node in a transcript graph is an exon and each edge is a splice junction

  • A path through the graph represents one isoform of the gene


Reference based assembly

5. Weighting of Isoforms

Isoform 1

Insert size distribution

(Σ probability of insert size of paired read)

Normalized weighted score of Isoform 1 =

(# of paired read)

Filter out :: Normalized weighted score < 0.1


Nature biotechnology may 20101

Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation(Cufflinks)

NATURE BIOTECHNOLOGY

MAY 2010


Reference based assembly

Cufflinks

Cufflinks seek an assembly that parsimoniously explains the fragments from the RNA-Seq experiment;

=> Every fragment in the experiment should have come from a Cufflinks transcript, and Cufflinks should produce as few transcripts as possible with that property


Reference based assembly

Transcript Assembly

Isoform 1

Isoform 2


Reference based assembly

Transcript Assembly


Reference based assembly

Transcript Assembly

Compatibility

Incompatiblity

Nested

Uncertain : x4

- compatibility & incompatibility -


Reference based assembly

Transcript Assembly

Nested

incompatible


Reference based assembly

Transcript Assembly

Nested

incompatible

chain


Reference based assembly

Transcript Assembly

Bipartite graph

Directed Acyclic Graph


Reference based assembly

Transcript Assembly

Theorem (Dilworth's theorem) Let P be a finite partially ordered set. The maximum number of elements in any antichainof P equals the minimum number of chains in any partition of P into chains

Theorem (Konig's theorem) In a bipartite graph, the number of edges in a maximum matching equals the number of vertices in a minimum vertex cover.

Theorem Dilworth's theorem is equivalent to Konig's theorem.

Hasse diagram & reachability graph


Reference based assembly

Transcript Assembly

Finally,

Finding minimum number of chains in directed acyclic graph

is reduced to

finding maximum matching problem in bipartite graph

This can be solved by LEMON and Boost graph library.


Reference based assembly

Conditions for filtering transctript x

  • x aligns to the genome entirely within an intronic region of the alignment for a transcript y, and the abundance of x is less than 15% of y's abundance.

  • x is supported by only a single fragment alignment to the genome.

  • More than 75% of the fragment alignments supporting x, are mappable to multiple genomic loci.

  • x is an isoform of an alternatively spliced gene, and has an estimated abundance less than 5% of the major isoform of the gene.


Reference based assembly

Keyword for Fresher

1.Reference-based assembly

== mapping-first approach


Reference based assembly

Keyword for Intermediate

  • 1. Graph theory

  • - reading recommendation : introduction to graph theory


Reference based assembly

Keyword for Expert

1. Scan statistics


Reference based assembly

Transcript Assembly

Bipartite graph

Directed Acyclic Graph


Reference based assembly

Transcript Assembly


  • Login