1 / 27

Continuous Representations of Time Gene Expression Data

Continuous Representations of Time Gene Expression Data. Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput . Biol .,10,341-356, 2003. Outline. Splines Estimating Unobserved Expression Values and Time Points

manju
Download Presentation

Continuous Representations of Time Gene Expression Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continuous Representations of Time Gene Expression Data Ziv Bar-Joseph, Georg Gerber, David K. Gifford MIT Laboratory for Computer Science J. Comput. Biol.,10,341-356, 2003

  2. Outline • Splines • Estimating Unobserved Expression Values and Time Points • Model Based Clustering Algorithm for Temporal Data • Aligning Temporal Data • Results

  3. Splines • The word “spline” come from the ship building industry

  4. Splines • Splines are piecewise polynomials with boundary continuity and smoothness constraints. • The typical way to represent a piecewise cubic curve :

  5. Splines • We have cubic polynomial : • equations are required : • Interpolatingsplines

  6. Splines • B-spline • In terms of a set of normalized Basis functions • The application of fitting curved to gene expression time-series data • Convenient with the B-spline basis to obtains approximating or smoothingsplines • Fewer basis coefficient than there are observed data points • Avoid overfitting

  7. Splines • The basis coefficients : • Interpreted geometrically as control points • The vertices of a polygon that control the shape of the spline but are not interpolated by the curve • The curve lies entirely within the convex hull of this controlling polygon. • Each vertex exerts only a local influence on the curve.

  8. Splines

  9. y bi,1 1 bi,2 bi,3 xi xi+1 xi+2 xi+3 t Splines • 任何xi區間中S(t)必為k-1次的多項式 • S(t)具有1,2,…,k-2階微分的連續性 • 對於同一k值而言 • 在t的有效區間中bi,k≧0,且任一bi,k均僅有唯一極大值,除k=1,2外bi,k均為連續平滑曲線。

  10. Splines • A uniform knot vector is one in which the entries are evenly space • i.e. • The basis functions will be translated of each other, i.e. • For a periodic cubic B-spline (k=4), the equation specifying the curve :

  11. B-splines • The B-spline will only be defined in the shaded region 3t 4

  12. Estimating Unobserved Expression Values and Time Points • To obtain a continuous time formulation, use cubic B-spline • Getting the value of the splines at a set of control points in the time-series. • Re-sample the curve to estimate expression values at any time-points. • Spline function are not fit for each gene individually • due to noise and missing value • lead to over-fitting • Instead, constraint the spline coefficients of co-expressed genes to have the same covariance matrix • Use other genes in the same class to estimate the missing values of a specific gene.

  13. Estimating Unobserved Expression Values and Time Points • Aprobabilistic model of time series expression data • Assume a set of genes are grouped together • Using prior biological knowledge • a clustering algorithm

  14. Estimating Unobserved Expression Values and Time Points

  15. Estimating Unobserved Expression Values and Time Points • To learn the parameters of this model (, , and ) • Use the observed values, and maximize the likelihood of the input data

  16. Estimating Unobserved Expression Values and Time Points • Decompose the probability : • If the  values were observed, decompose the probability:

  17. Estimating Unobserved Expression Values and Time Points • Use EM • E step : find the best estimation for  usingthe values we have for 2, , and . • M step : maximize .

  18. Model Based Clustering Algorithm for Temporal Data • A new clustering algorithm that simultaneously solves the parameter estimation and class assignment problems • EM algorithm • E step • M step

  19. Model Based Clustering Algorithm for Temporal Data

  20. Aligning Temporal Data • Assume we have two sets of time-series gene expression profiles • Splines for reference • Splines in the set to be warped • A mapping • Linear transformation

  21. Aligning Temporal Data • The error of the alignment: • Averaged squared distance • Find parameters a and b that minimize • The error for a set of genes S of size n The averaged squared distance between the two curve Take into account the degree of overlap between the curves.

  22. Aligning Temporal Data

  23. Results • 800 genes in Saccharomycescerevisiae with five groups • Unobserved data estimation

  24. Results • Clustering • Explore the effect that non-uniform sampling • Two synthetic curves :

  25. Results

  26. Results

  27. Results

More Related