290 likes | 409 Views
This study explores clustering networked data using active learning techniques, focusing on the implementation of Newman’s Modularity and Collective Classification (ICA). The research introduces ALFNET and CLAL frameworks for effective learning from unlabeled instances. Through a series of experiments, the results demonstrate improvements in classification accuracy by optimizing the clustering process. The outline covers essential topics including active learning principles, networked data analysis, and the experimental outcomes that showcase the effectiveness of the proposed methods.
E N D
Clustering networked data based on link and similarity in Active learning Advisor : Sing Ling Lee Student : Yi Ming Chang Speaker : Yi Ming Chang
Outline • Introduction • Active Learning • Networked data • Related Work • Newman’s Modularity • Collective Classification(ICA) • ALFNET • CLAL • Experimental Results • Conclusion
Passive Learning : Unlabeled instance : Labeled instance + Training data - - + Train + Classifier - + + - - Wrong : 5 + + - + + Classify + - - + + + - - + - Testing data + - + - - +
Active Learning : Unlabeled node : Labeled node + Training data - + Train + Classifier - - + - Query Wrong : 2 + + - + + + Classify + - - + + + + - - + - - Testing data + + - - - EX : Query batch number = 3
Network data + - + + - - + - - : Unlabeled node training : Labeled node classify Classifier
Outline • Introduction • Active Learning • Networked data • Related Work • Newman’s Modularity • Collective Classification(ICA) • ALFNET • CLAL • Experimental Results • Conclusion
Newman’s Modularity for clustering 4 4 • m = 5 • : Real edge • : Degree of node • : Group of node • = (1 – 2*2 /10 ) • = (0 – 2*2/10 ) • = (1 – 2*3/10 ) • = (0 – 2*1/10 ) 1 1 5 5 2 2 3 3
Newman’s Modularity for clustering • Example : • = (1 – 5*2 /16 ) = 0.375 • = (0 – 5*3/ 16 ) = -0.9375 • = (1 – 2*5/ 16 ) = 0.375 • = (1 – 2*3/ 16 ) = 0.625 • = (0 – 3*5/ 16 ) = -0.9375 • = (1 – 3*2/ 16) = 0.625 1 2 3 0.625+0.625 > 0.375+0.375
Newman’s Modularity for clustering Maximizing 1 1 -1 0.3 0.1 -0.5
Collective Classification(ICA) • Iterative Classification Algorithm(ICA) feature Neighbor feature ? + CO 1 0 0 1 0 … 1 CC 1 0 0 1 0 … 1 3/5 2/5 .. ? - ? ? training Content-Only learner ? - ? ? + Compute neighbor feature using CO Iteration 1 training Compute neighbor feature using CC Iteration 2 Collective learner Iteration 3 . . . Until stable or threshold of iteration have elapsed
CC problem • How to set threshold? : Labeled node + 3 : Unlabeled node - - + + - 2 - - - + + - + Infer neighbor feature : + - - 1 + - + + - - Iteration 1: 2/5 3/5 1 + 3/5 2/5 2 + - 0/1 1/1 3/5 2/5 3 Iteration 4: 1 2/5 3/5 2 3/5 2/5 Iteration 2: 1 1/1 0/1 3 2/5 3/5 2 1/1 0/1 3 2/5 3/5 Iteration 5: 1 4/5 1/5 2/5 3/5 2 Iteration 3: 1 0/1 1/1 4/5 1/5 3 2 0/1 1/1 3
ALFNET • 1. Cluster data at least k clusters. • 2. Pick k clusters based on size and initialize Content-Only(CO)classifier … … … cluster cluster cluster k CO ClassifierSVM
ALFNET • 3.while (labeled nodes < budget ) • 3.1 Re-train CO and CC classifier • 3.2 pick k cluster based on score : CO Training set train CC … … … cluster cluster cluster k
3.2 pick an item form each cluster based on CO Training set train CC
ALFNET CO CC Main Label predict predicted category Class A proportion of three classifier predicted Class B CO Class C CC Class D Main entropy(1/3) + entropy(1/3) + entropy(1/3) = 0.3662 *3 entropy(2/3) + entropy(1/3) = 0.2703 + 0.3662 entropy(3/3) = 0
Outline • Introduction • Active Learning • Networked data • Related Work • Newman’s Modularity • Collective Classification(ICA) • ALFNET • CLAL • Experimental Results • Conclusion
Modularity and Similarity EX: Node 1 1 1 0 0 Node 3 1 1 0 0 Node 2 1 0 0 0 Node 4 0 0 1 1
Maximum Q Maximizing
CLAL : Labeled node : Unlabeled node training CO training CO Query & classify Query & classify Until Labeled node > budget
Tuning and greedy mechanism : Labeled node ? : Unlabeled node ? ? Moving priority: OutLink - Inlink 3 -> 2 -> 1 -> 1 ? ? ? CO training ? ? CO Query & classify Retrain & ? ? Query & classify reserve the greater COs ? ? Move Out-link > In-link ? ? ? CO CO Move Out-link > In-link Clustering priority : Low accuracy -> High accuracy
Outline • Introduction • Active Learning • Networked data • Related Work • Newman’s Modularity • Collective Classification(ICA) • ALFNET • CLAL • Experimental Results • Conclusion
Background • Networked data • Social network • Citation network Person name friend node Person name feature feature feature … feature feature Attribute … feature Paper NO. cite word node Paper NO. word word … word Attribute word … word
Outline • Introduction • Active Learning • Networked data • Related Work • Newman’s Modularity • Collective Classification(ICA) • ALFNET • CLAL • ExperimentalResults • Conclusion
SVM • Training data sets : Hyper-plan + + + + + + + + + + - - + + - - - - - - Margin Margin - -
Challenge • Query efficiency from discriminative feature 510 400 250 Paper name Sum of 2 class word word … word 250 220 100 Paper name Class 1 word word word … 260 180 150 Class 2 Paper name word word word …
CC problem :How to set terminal condition? • Different iteration will obtain diverse result. : CO predicted label :true labeled : labeled CC classifier Infer neighbor feature Local feature Neighbor feature A B A A B F1,F2,… NF_A NF_B B B A B 0,1,0,… 3/5 2/5 A A A A 2/3 1/3 Iteration 1 1/3 2/3 Iteration2 B B 4/5 1/5 A A A A 2/3 1/3 2/3 1/3
ALFNET Query and training CO Compute Query and training classifier Compute N Iteration > ? N Y Labeled node >Budget? Y Output
Representation and Challenge • In a citation network node node node node node How to use link information