190 likes | 343 Views
Microarray gene expression data association rules mining based on BSC-tree and FIS-tree. Authors: Xiang-Rong Jiang and Le Gruenwald Source: Data & Knowledge Engineering, vol.53, 2005, pp.3-29 Speaker: Shu-Fen Chiou( 邱淑芬 ) Date:2005/1/20. Outline. Introduction
E N D
Microarray gene expression data association rules mining based on BSC-tree and FIS-tree Authors: Xiang-Rong Jiang and Le Gruenwald Source: Data & Knowledge Engineering, vol.53, 2005, pp.3-29 Speaker: Shu-Fen Chiou(邱淑芬) Date:2005/1/20
Outline • Introduction • Proposed method • Experimental results • Conclusions • Comment
Introduction • Use association rules to mine the association relationships among different genes. • Three characteristics must be consideration: • The large search space • Uninteresting genes • Data normalization
Association rule • Itemset : 商品集合 • Large itemset (frequent itemset): 經常被一齊購買的商品集合 • Minimum support 最小支持度門檻 • Minimum confidence最小信心門檻 • Association rule關聯規則 : 顧客若買了X之後,很有可能會再買Y • 從Large itemset我們可以推出關聯規則
Ex: minsup=20% • sup{1}=6/10=60% • sup{1,2}=4/10=40% • sup{1,2,3}=2/10=20% • 以上是large itemset • sup{3,5}=1/10=10% • sup{1,3,5}=1/10=10% • 以上不是large itemset
產生關聯規則:minconf=50% • {1}{2} 規則成立 sup(1)=60%, sup(1,2)=40%, conf:67.7% • {1}{2,3} 規則不成立 sup(1)=60%, sup(1,2,3)=20%, conf:33.3% • {1,2}3 規則成立 sup(1,2)=40%, sup(1,2,3)=20%, conf:50%
Proposed method n: fraction bits m: exponent bits • G1’s bit string:111011, Use only one bit with this example • Each gene the value of which greater than some standard point • of comparison (increasing):1 • Otherwise (zero or decresing):0
Proposed method • BSC tree each node: G1’s 1-bit count (root count) =5 G1:111011 node-level: initial=1 bit-type: the node bit type 1: all bits is 1 0: all bits is 0 m: all bits is mixture 1 and 0 1-bit-count: the number of the 1 bits of the node 1|1|1 1|1|1 1|1|1 1|1|1
Proposed method bit-type=1 • BSC-tree ANDing algorithm Getting the path code from BSC tree
Proposed method • BSC-tree ANDing algorithm Find the subcode 1 bit count at the root node (root count) of the ANDing BSC-tree representing the 2-itemset is 1 + 2 = 3
Proposed method • FIS tree G1, G2, G4, G7 and G8 are frequent 1-itemset • Level 1: • Suppose the minimum support=50% • G1 root count = 5, support = 5 / 6 = 83.3% > minSup • G2, G4, G7 and G8 root count = 3, support = 3 / 6 =50% = minSup • G3 root count = 0, support =0 < minSup • G5, G6 root count = 2 , support = 2 / 6 = 33.3% < minSup
Proposed method • FIS tree G1G2, G1G4, G1G7, G2G8 and G4G7 are frequent 2-itemsets • Level 2: build frequent 2-itemsets • Suppose the minSup = 50% • Combine the each node with level 1, such as G1G2, G1G4,…, and get the ANDing BSC-tree • Root counts of G1G2, G1G4 and G1G7’s ANDing BSC-tree are 3, support = minSup, so they are frequent 2-itemsets • Root count of the G1G8 ‘s ANDing BSC-tree is 2, support < minSup
Proposed method • Deriving association rule from a FIS-tree • G1G2 is a frequent 2-itemsets in FIS-tree. • Suppose the minSup = 50% and user-defined minimum confidence, • minConf =50% • For G1G2, support = 3 / 6 = 50%, for G1, support = 5 / 6 = 83.3% • Confidence = = 3 / 5 = 60% > minConf • => the rule G1 => G2 holds
Conclusions • Proposed a new association rule mining algorithm. • BSC-tree and FIS-tree are compression trees,and they can save space. • The FIS-tree mining algorithm’s performance is better then other methods.
Comments • 利用簡單的概念創造出有效率的方法。 • 考慮基因表現量反應的程度,去找出較重要的gene,而不是只有用0和去代表基因表現量無變化、減少及1代表表現量增加。 高 3 2 低 1 無變化 0