1 / 9

Large Scale Parallel Supervised Topic-Modeling -implementation plan-

Large Scale Parallel Supervised Topic-Modeling -implementation plan-. Keisuke Kamataki Jun Zhu Eric Xing. Sep 27, 2010. Implementation plan. Big picture: Separate implementation for 3 steps (we can still run the distributed MedLDA from the step1)

mari-mccall
Download Presentation

Large Scale Parallel Supervised Topic-Modeling -implementation plan-

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Large Scale Parallel Supervised Topic-Modeling-implementation plan- Keisuke Kamataki Jun Zhu Eric Xing Sep 27, 2010

  2. Implementation plan Big picture: Separate implementation for 3 steps (we can still run the distributed MedLDA from the step1) • Plan1 (E-step and M-step are separated programs. SVM in M-step is not parallelized) • Plan2 (E-step and M-step are separated programs. SVM in M-step is parallelized) • Plan3 (Everything is integrated and parallelized within a single program) *Start from plan1, then extend it to plan2, and try plan3 last.

  3. Plan 1 (E-step and M-step are separated. SVM in M-step is not parallelized) Given –many documents z z z z Perform E-step (Gibbs sampling)in parallel way. Get Sufficient Stats Repeat until convergence Single Program α, β, η, μ Perform M-step on a single computer Single Program

  4. Plan 1 in detail • Prepare E-step code and M-step code separately (probably in C++). Merge the codes using Shell/Perl/Ruby script • Easy to quickly implement and debug • Would be extendible to plan 2 and plan 3 • May not scale only in the situation when we have a large # of K(n-topics) and L(n-labels) …..should be solved in plan 2 To be a good start !!

  5. Plan 2 (E-step and M-step are separated. SVM in M-step is parallelized) Given –many documents z z z z Perform E-step (Gibbs sampling)in parallel way. Get Sufficient Stats Repeat until convergence Single Program α, β η, μ η, μ Perform M-step In parallel way (only parallelize SVM to Estimateη and μ) Single Program

  6. Plan 2 in detail • Prepare E-step code and M-step code separately. Merge the codes using Shell/Perl/Ruby script (same with plan 1) • Almost a copy from plan 1 except for the SVM part in M-step • SVM is parallelized. So, the estimation of η, μ would be fast and scalable (of course, we need to figure out how to parallelize SVM in the FB’s computing environment) Practical extension (only need to figure out how to parallelize SVM)

  7. Plan 3 (Everything is integrated and parallelized within a single program) Given –many documents z z z z Perform E-step (Gibbs sampling)in parallel way. Get Sufficient Stats Repeat until convergence α, β, η, μ α, β, η, μ α, β, η, μ Perform M-step In parallel way Single Program

  8. Plan 3 in detail • E-step and M-step (including SVM) is integrated within a single code • In practice, the computational efficiency and the algorithmic behavior would be almost same with the plan 2 (but the software will be more complicated and the implementation would take a lot of time) Could be beautiful in research aspect (but should be built as an extension of plan2 since the software will be complex)

  9. ToDo for a while • Keisuke Prepare the core-part codes of Gibbs-sampling based LDA and the merging script • Jun Derive Gibbs-sample based equation in MedLDA

More Related