Data Mining & Knowledge Discovery
Sponsored Links
This presentation is the property of its rightful owner.
1 / 30

S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335 PowerPoint PPT Presentation


  • 156 Views
  • Uploaded on
  • Presentation posted in: General

Data Mining & Knowledge Discovery. Mining association rules procedure to support on-line recommendation by customers and products fragmentation. S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335. 組員: M964020025 郭李哲 M964020027 鄭淵太

Download Presentation

S. Wesley Changchien, Tzu-Chuen Lu Expert Systems with Applications 20(2001) 325-335

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data Mining & Knowledge Discovery

Mining association rules procedure to support on-line recommendation by customers and products fragmentation

S. Wesley Changchien, Tzu-Chuen Lu

Expert Systems with Applications 20(2001) 325-335

組員:M964020025 郭李哲

M964020027 鄭淵太

M964020044 鐘佶修


Background and motivation

  • Most of the EC business endeavor to survive and become leaders in the frontier of the new wave.

  • The major key factors of success include learning customers’ behavior of purchasing, developing marketing strategies to create new consuming market, and discover latent loyal customers, etc.


Data mining task


Self-Organizing Map(SOM)


Rough set theory(RST)

  • 以RST進行資料分析全賴兩個基本觀念,稱之為集合的下界與上界近似(the lower and the upper approximations of a set)


Rough set theory(RST)

,


Data mining 程序


Step 1- selection and sampling

  • 1. Creating a fact table

  • 2. Selecting dimensions

    • 挑選其所感興趣的dimension

  • 3. Selecting attributes

    • 根據重要性,挑選屬性

  • 4. Filtering data

    • 限制屬性值的範圍


Step 2 - transformation and normalization

  • 1. 屬性為數值資料

  • 2. 屬性為非數值資料

    • 將資料做設計描述

    • Job 屬性中的資料當作character


Step 3 – data mining of association rules

  • 採用neural network進行clustering與rough set theory 取得規則,以應用於找尋association rules,解釋每個cluster中其特性,和不同的cluster間屬性的關係。


Clustering module

  • Kohonen proposed SOM in 1980.

  • 顯示input屬性之間的natural relationship。

  • We can group enterprise’s customers, products, and suppliers into clusters.

  • For instance, input nodes : education and job from the table member

  • Output nodes:nine clusters.


Rule extraction module

  • 使用Rough set theory 對資料記錄中同質的cluster找出association rules與不同cluster間其屬性間關係。


Characterization of each cluster

  • 利用Rough Set Theory來解釋一個cluster所擁有的特徵

    • Ex:某類的顧客,其教育程度在大學以上、月薪3.5萬以上…

  • 產生Result equivalence classes, Xk

  • 產生Cause equivalence classes, Aij

  • 產生Lower approximation rules

  • 產生Upper approximation rules

  • 產生Combinatorial rules

  • 解釋cluster的特徵

  • 重複(返回Step3)


step1.產生Result equivalence classes, Xk

  • 針對每個cluster產生result equivalence class


step2.產生Cause equivalence classes, Aij

  • 針對屬性產生Cause equivalence class


step3.產生Lower approximation rules

  • ,Confidence = 1

  • X1 = { Member2, Member5, Member6}

    • = { Member2}

    • = { Member5, Member6}

    • = {Φ}

    • = {Φ}

    • = { Member2、Member5, Member6}

    • = {Φ}

  • Rule1: If Education = H then GID = A

    • Confidence = 1


step4.產生Upper approximation rules

  • , ,Confidence =

  • X1 = { Member2, Member5, Member6}

    • = { Member2}

    • = { Member5, Member6}

    • = {Φ}

    • = {Φ}

    • = { Member2、Member5, Member6}

    • = {Φ}


step4.產生Upper approximation rules

  • Confidence Threshold = 0.75

  • Rule2: If Education = N then GID = A

    • Confidence=

    • Reject Rule2 (0.33≦0.75)

  • Rule3: If Job = H then GID = A

    • Confidence=

    • Accept Rule3(0.75 ≦0.75)


step5.產生Combinatorial rules

  • Confidence =

  • 結合Rule,產生考量多個屬性的關聯規則

X1 = { Member2, Member5, Member6}


step5.產生Combinatorial rules

  • Rule4: If Education = N and Job = H then GID = A

    • Confidence=

  • Rule5: If Education = H and Job = H then GID = A

    • Confidence =


step6.解釋cluster的特徵

  • 將規則匯總並解釋其特徵

  • 屬於Cluster 1(Cluster A)的Member:

    • 100%的人Education = High

    • 75%的人Job = High

    • 25%的人Education = Normal且Job = High

    • 50%的人Education = High且Job = High


step7.重複

  • 返回Step3,計算下一個equivalence class Xk,以此方式重複進行直到所有的equivalence class皆計算完成。


Association of different clusters

  • 利用Rough Set Theory分析不同cluster之間的關係

    • Ex: A類的會員較喜歡b類的商品;C類的會員較喜歡d類的商品…


Association of different clusters

R3: If Buyer = 1 Then Receiver = 2, Confidence = 0.5

R4: If Buyer = 2 Then Receiver = 2, Confidence = 0.75

R1: If Product = 3 Then Receiver = 2, Confidence = 1

R2: If Product = 6 Then Receiver = 2, Confidence = 1

R5: If Product = 7 Then Receiver = 2, Confidence = 0.5


系統實做

  • 以某家商店的交易紀錄為對象

    • Product Table有1120筆記錄

    • Customer Table有35筆紀錄

  • 保留2000筆交易紀錄作為探勘的資料

    • 經由維度、屬性的挑選

  • Customer Clustering

    • education、job 、 gender

  • Product Clustering

    • sales price、import price 、 sale price of VIP customers


SOM network Interface


Relationship analysis Interface


Use Rules for Recommendation

  • 某一位顧客想購買一個商品贈送其朋友,但他不知該買什麼較適合。

  • 顧客的cluster = 7,而其朋友的cluster = 1,則系統可推薦cluster = 9之商品給顧客


Conclusion

  • 本篇採用SOM與rough set theory進行群集與規則粹取Rule extraction module描述了不同群集間之關係特性分析者可進一步選擇其他屬性,以分析出群集間的關係,例如,星座、心理測驗或血型等。

  • 本研究利用Rough Set Theory找出資料中的關聯規則,而關聯規則又可分為兩個方向:cluster的特徵敘述和不同cluster間關係;然而在實作中,只有呈現不同cluster間之關係,並沒有提到cluster的特徵敘述和該如何應用。


  • Login