260 likes | 308 Views
Explore UTOPIAN, an NMF-based interactive system for topic modeling, offering advantages over LDA in visual analytics. Features include reliable algorithms, flexibility in user interactions, and efficient topic creation. Discover the benefits and functionalities of UTOPIAN in detail.
E N D
UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization Jaegul Choo1*, Changhyun Lee1, Chandan K. Reddy2, and Haesun Park1 1Georgia Institute of Technology, 2Wayne State University *e-mail: jaegul.choo@cc.gatech.edu
Intro: Topic Modeling Document 1 Document 2 Document 3 Document 4 brain evolve dna genetic gene nerve neuron life organism
Intro: Topic Modeling Document 1 Document 2 Document 3 Document 4 Topic 1 Topic 2 Topic 3 Topic: a distribution over keywords brain evolve dna genetic gene nerve neuron life organism
Intro: Topic Modeling Topic: a distribution over keywords Document 1 Document 2 Document 3 Document 4 Document : a distribution over topic Topic 1 Topic 2 Topic 3 brain evolve dna genetic gene nerve neuron life organism
Latent Dirichlet Allocation (LDA) in Visual Analytics • LDA has been widely used in visual analytics. • TIARA [Wei et al. KDD10], iVisClustering [Lee et al. EuroVis12], ParallelTopics [Dou et al. VAST12], TopicViz [Eisenstein et al. CHI-WIP12], … *Image courtesy of original papers.
Overview of Our Work Keyword-induced topic creation Topic merging • Proposes nonnegative matrix factorization (NMF) for topic modeling. • Highlights advantages of NMF over LDA in visual analytics. • Presents UTOPIAN, an NMF-based interactive topic modeling system. Doc-induced topic creation Topic splitting
Nonnegative Matrix Factorization (NMF) Lower-rank approximation with nonnegativity constraints Why nonnegativity? • Easy interpretation and semantically meaningful output Algorithm • Alternating nonnegativity-constrained least squares [Kim et al., 2008] H • min || A – WH ||F W>=0, H>=0 ~ = A W
NMF as Topic Modeling H H ~ = A W W Topic: a distribution over keywords Document 1 Document 2 Document 3 Document 4 Document : a distribution over topic Topic 1 Topic 2 Topic 3 brain evolve dna genetic gene nerve neuron life organism
Advantages of NMF in Visual Analytics • Reliable algorithmic behaviors • Flexible support for user interactions
NMF vs. LDAConsistency from Multiple Runs Documents’ topical membership changes among 10 runs InfoVis/VAST paper data set 20 newsgroup data set
NMF vs. LDAEmpirical Convergence Documents’ topical membership changes between iterations InfoVis/VAST paper data set 48 seconds 10 minutes NMF LDA
NMF vs. LDATopic Summary (Top Keywords) InfoVis/VAST paper data set • Topics are more consistent in NMF than in LDA. • Topic quality is comparable between NMF and LDA.
Advantages of NMF in Visual Analytics • Reliable algorithmic behaviors • Flexible support for user interactions
Weakly Supervised NMF [Choo et al., DMKD, accepted with rev.] min ||A – WH ||F2+ α||(W – Wr)MW ||F2 + β||MH(H – DHHr) ||F2 W>=0, H>=0 • Wr, Hr: reference matrices for W and H • MW, MH: diagonal matrices for weighting/masking columns/rows of W and H • Provides flexible yet intuitive means for user interaction. • Maintains the same computational complexity as original NMF.
UTOPIAN: User-Driven Topic Modeling Based on Interactive NMF Topic merging Keyword-induced topic creation Doc-induced topic creation Topic splitting
UTOPIAN Overview Keyword-induced topic creation Topic merging Supervised t-distributed stochastic neighbor embedding (t-SNE) User interactions supported • Keyword refinement • Topic merging/splitting • Keyword-/document-induced topic creation Real-time interaction via PIVE (Per-Iteration Visualization Environment) Doc-induced topic creation Topic splitting
Supervised t-SNE Original t-SNE • Documents are often too noisy to work with. Supervised t-SNE • d(xi, xj) ← α•d(xi, xj) if xi and xj belongs to the same topic cluster.
PIVE (Per-Iteration Visualization Environment) for Real-time Interaction[Choo et al., under revision] Standard approach PIVE approach
Usage Scenario: Hyundai Genesis Review Data Initial result After interaction
Summary • Presented UTOPIAN, a User-Driven Topic Modeling based on Interactive NMF. • Highlighted the advantages of NMF over LDA in visual analytics. • Reliable algorithmic behaviors • Consistency from multiple runs • Early empirical convergence • Flexible support for user interactions • Keyword refinement • Topic merging/splitting • Keyword-/document-induced topic creation
More in the paper & On-going Work • A general taxonomy of user interactions with computational methods • Keyword-based vs. document-based • Template-based vs. from-scratch-based • Algorithmic details about supported user interactions • Implementation details • More usage scenarios On-going Work • Scaling up the system with parallel distributed NMF
Jaegul Choojaegul.choo@cc.gatech.eduhttp://www.cc.gatech.edu/~joyfull/ Thank you!http://tinyurl.com/UTOPIAN2013 Topic merging Keyword-induced topic creation For more details, please find me at ‘Meet the Candidate’ A601+ A602, 6PM today Doc-induced topic creation Topic splitting