Annotation and Alignment of the Drosophila Genomes
Download
1 / 31

Annotation and Alignment of the Drosophila Genomes - PowerPoint PPT Presentation


  • 241 Views
  • Uploaded on

Annotation and Alignment of the Drosophila Genomes One (possibly wrong) alignment is not enough: the history of parametric inference 1992: Waterman, M., Eggert, M. & Lander, E. Parametric sequence comparisons, Proc. Natl. Acad. Sci. USA 89, 6090-6093

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Annotation and Alignment of the Drosophila Genomes' - Gabriel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

One possibly wrong alignment is not enough the history of parametric inference l.jpg
One (possibly wrong) alignment is not enough: the history of parametric inference

  • 1992: Waterman, M., Eggert, M. & Lander, E.

    • Parametric sequence comparisons, Proc. Natl. Acad. Sci. USA89, 6090-6093

  • 1994: Gusfield, D., Balasubramanian, K. & Naor, D.

    • Parametric optimization of sequence alignment, Algorithmica12, 312-326.

  • 2003: Wang, L., Zhao, J.

    • Parametric alignment of ordered trees, Bioinformatics, 19 2237-2245.

  • 2004: Fernández-Baca, D., Seppäläinen, T. & Slutzki, G.

    • Parametric Multiple Sequence Alignment and Phylogeny Construction, Journal of Discrete Algorithms, 2 271-287.

XPARAL

by Kristian Stevens and Dan Gusfield


Slide3 l.jpg
Whole Genome Parametric Alignment parametric inferenceColin Dewey, Peter Huggins, Lior Pachter, Bernd Sturmfels and Kevin Woods

  • Mathematics and Computer Science

  • Parametric alignment in higher dimensions.

  • Faster new algorithms.

  • Deeper understanding of alignment polytopes.

  • Biology

  • Whole genome parametric alignment.

  • Biological implications of alignment parameters.

  • Alignment with biology rather than for biology.


Slide4 l.jpg
Whole Genome Parametric Alignment parametric inferenceColin Dewey, Peter Huggins, Lior Pachter, Bernd Sturmfels and Kevin Woods

  • Mathematics and Computer Science

  • Parametric alignment in higher dimensions.

  • Faster new algorithms.

  • Deeper understanding of alignment polytopes.

  • Biology

  • Whole genome parametric alignment.

  • Biological implications of alignment parameters.

CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTG

CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT

CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------

CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----

CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-

CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------

CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT

analysis


Slide5 l.jpg
Whole Genome Parametric Alignment parametric inferenceColin Dewey, Peter Huggins, Lior Pachter, Bernd Sturmfels and Kevin Woods

  • Mathematics and Computer Science

  • Parametric alignment in higher dimensions.

  • Faster new algorithms.

  • Deeper understanding of alignment polytopes.

  • Biology

  • Whole genome parametric alignment.

  • Biological implications of alignment parameters.

CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTG

CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT

CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA-------

CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----

CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-

CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA-------

CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT

analysis


Slide6 l.jpg

computational geometry parametric inference


Slide7 l.jpg

= parametric inference

+

A Whole Genome Parametric Alignment of

D. Melanogaster and D. Pseudoobscura

  • Divided the genomes into 1,116,792 constrained and 877,982 unconstrained segment pairs.

  • 2d, 3d, 4d, and 5d alignment polytopes were constructed for each of the 877,802 unconstrained segment pairs.

  • Computed the Minkowski sum of the 877,802 2d polytopes.


Slide8 l.jpg

A Whole Genome Parametric Alignment of parametric inference

D. Melanogaster and D. Pseudoobscura

  • Divided the genomes into 1,116,792 constrained and 877,982 unconstrained segment pairs.

  • This is an orthology map of the two genomes.

  • 2d, 3d, 4d, and 5d alignment polytopes were constructed for each of the 877,802 unconstrained segment pairs.

  • For each segment pair, obtain all possible optimal summaries for all parameters in a Needleman--Wunsch scoring scheme.

  • Computed the Minkowski sum of the 877,802 2d polytopes.

  • There are only 838 optimal alignments of the two Drosophila genomes if the same match, mismatch and gap parameters are used for all the segment pair alignments.


Slide14 l.jpg

>mel parametric inference

CTGCGGGATTAGGGGTCATTAGAGTGCCGA

AAAGCGAGTTTATTCTATGGAC

>pse

CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGA

GGAGAGGCCATCATCGTGTAC

?

How do we build the polytope for


Alignment polytopes are small l.jpg
Alignment polytopes are small parametric inference

Theorem: The number of vertices of an alignment polytope for two sequences of length n and m is O((n+m)d(d-1)/(d+1)) where d is the number of free parameters in the scoring scheme.

Examples:

Parameters Model Vertices

M,X,SJukes-Cantor with linear gap penalty O(n+m)2/3

M,X,S,GJukes-Cantor with affine gap penalty O(n+m)3/2M,XTS,XTV,S,GK2P with affine gap penalty O(n+m)12/5

L. Pachter and B. Sturmfels, Parametric inference for biological sequence analysis, Proceedings of the National Academy of Sciences, Volume 101, Number 46 (2004), p 16138--16143.

L. Pachter and B. Sturmfels, Tropical geometry of statistical models, Proceedings of the National Academy of Sciences, Volume 101, Number 46 (2004), p 16132--16137.

L. Pachter and B. Sturmfels (eds.), Algebraic Statistics for Computational Biology, Cambridge University Press.


Slide16 l.jpg

The parametric inferencealgebraic statisticalmodel for sequence alignment, known

as the pair hidden Markov model, is the image of the map

The logarithms of the parameters q give the edge lengths for the shortest path problem on the alignment graph.


Newton polytope of a polynomial l.jpg

14 parametric inference

Newton Polytope of a Polynomial

Definition: The Newton polytope of a polynomial

is defined to be the convex hull of the lattice points in Rd corresponding to monomials in f:


Slide18 l.jpg

Newton polytope for parametric inference

positions [1,i] and [1,j]

in each sequence

Minkowski sum

Polytope propagation

Convex hull of union

NPi,j = S*NPi-1,j+S*NPi,j-1+(X or M)*NPi-1,j-1

A

C

A

T

T

A

G

A

A

A

G

A

T

T

A

C

C

A

C

A


Slide19 l.jpg

Back to Adf1 parametric inference

BP England, U Heberlein, R Tjian. Purified Drosophila transcription factor, Adh distal factor-1 (Adf-1), binds to sites in several Drosophila promoters and activates transcription, J Biol Chem 1990.


Back to adf1 l.jpg
Back to Adf1 parametric inference

mel TGTGCGTCAGCGTCGGCCGCAACAGCG

pse TGT-----------------GACTGCG

*** ** ***

BLASTZ alignment


Back to adf122 l.jpg
Back to Adf1 parametric inference

mel TGTGCGTCAGCGTCGGCCGCAACAGCG

pse TGT-----------------GACTGCG

*** ** ***

mel TGTG----CGTCAGC--G----TCGGCC---GC-AACAG-CG

Pse TGTGACTGCG-CTGCCTGGTCCTCGGCCACAGCCAAC-GTCG

**** ** * ** * ****** ** *** * **


Back to adf123 l.jpg
Back to Adf1 parametric inference

mel TGTGCGTCAGCGTCGGCCGCAACAGCG

pse TGT-----------------GACTGCG

*** ** ***

mel TGTG----CGTCAGC--G----TCGGCC---GC-AACAG-CG

pse TGTGACTGCG-CTGCCTGGTCCTCGGCCACAGCCAAC-GTCG

**** ** * ** * ****** ** *** * **

mel TGTGCGTCAGC------GTCGGCCGCAACAGCG

pse TGTGACTGCGCTGCCTGGTCCTCGGCCACAGC-

**** * ** *** * ** *****


Slide26 l.jpg

80.4% parametric inference


Slide27 l.jpg

85.1% parametric inference


Slide28 l.jpg

86.5% parametric inference


Slide29 l.jpg

79.1% parametric inference


Applications l.jpg
Applications parametric inference

  • Conservation of cis-regulatory elements

  • Phylogenetics: branch length estimation

Jukes-Cantor correction:

This is the expected number of mutations per site in an alignment with summary (x,s).


Applications31 l.jpg
Applications parametric inference

  • Conservation of cis-regulatory elements

  • Phylogenetics: branch length estimation


ad