bi clustering
Download
Skip this Video
Download Presentation
Bi-Clustering

Loading in 2 Seconds...

play fullscreen
1 / 7

Bi-Clustering - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

Bi-Clustering. COMP 790-90 Seminar Spring 2011. Definition of OP-Cluster.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Bi-Clustering' - brook


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
bi clustering

Bi-Clustering

COMP 790-90 Seminar

Spring 2011

definition of op cluster
Definition of OP-Cluster
  • Let I be a subset of genes in the database. Let J be a subset of conditions. We say <I, J> forms an Order Preserving Cluster (OP-Cluster),if one of the following relationships exists for any pair of conditions.

Expression Levels

A1 A2 A3 A4

when

problem statement
Problem Statement
  • Given a gene expression matrix, our goal is to find all the statistically significant OP-Clusters. The significance is ensured by the minimal size threshold nc and nr.
conversion to sequence mining problem
Conversion to Sequence Mining Problem

Sequence:

Expression Levels

A1 A2 A3 A4

ming op clusters a na ve approach
Ming OP-Clusters: A naïve approach

root

  • A naïve approach
    • Enumerate all possible subsequences in a prefix tree.
    • For each subsequences, collect all genes that contain the subsequences.
  • Challenge:
    • The total number of distinct subsequences are

a

a

b

c

d

b

b

c

d

a

c

d

c

d

d

b

d

b

c

c

d

a

d

d

c

d

b

c

b

d

c

d

a

A Complete Prefix Tree with 4 items {a,b,c,d}

mining op clusters prefix tree

a:3

d:2

d:3

c:2

c:3

Mining OP-Clusters: Prefix Tree
  • Goal:
  • Build a compact prefix tree that includes all sub-sequences only occurring in the original database.
  • Strategies:
  • Depth-First Traversal
  • Suffix concatenation: Visit subsequences that only exist in the input sequences.
  • Apriori Property: Visit subsequences that are sufficiently supported in order to derive longer subsequences.

Root

a:1,2

a:1,2,3

a:1,2

a:1,2,3

b:3

d:1

d:1,2,3

d:1,2,3

d:1,3

d:1,3

b:2

a:3

b:1

c:1,3

c:1,2,3

d:2

d:3

c:1

c:2

c:3

references
References
  • J. Yang, W. Wang, H. Wang, P. Yu, Delta-cluster: capturing subspace correlation in a large data set, Proceedings of the 18th IEEE International Conference on Data Engineering (ICDE), pp. 517-528, 2002.
  • H. Wang, W. Wang, J. Yang, P. Yu, Clustering by pattern similarity in large data sets, to appear in Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD), 2002.
  • Y. Sungroh,  C. Nardini, L. Benini, G. De Micheli, Enhanced pClustering and its applications to gene expression data Bioinformatics and Bioengineering, 2004.
  • J. Liu and W. Wang, OP-Cluster: clustering by tendency in high dimensional space, ICDM’03.
ad