1 / 13

Link Distribution on Multilingual Wikipedia

KwangHee Park 02/17. Link Distribution on Multilingual Wikipedia. Introduction. Current Problem Analyze Link Distribution on Multilingual Wikipedia Goal Find Cultural Intention from Multilingual data for the Multilingual Synchronization. Example. Samsung. Methodology. Topic modeling

nathanz
Download Presentation

Link Distribution on Multilingual Wikipedia

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KwangHee Park 02/17 Link Distribution on Multilingual Wikipedia

  2. Introduction • Current Problem • Analyze Link Distribution on Multilingual Wikipedia • Goal • Find Cultural Intention from Multilingual data for the Multilingual Synchronization

  3. Example • Samsung

  4. Methodology • Topic modeling • Target = 5 linked article • 34,577 number of article from each language • English, Espanol, French, Chinese, Korean • Linked term • Easy to handling in terms of Term boundary recognition problem

  5. Latent Dirichlet Allocation based approach TERM SET

  6. LDA approach Korean Wiki page Inter language link English Wiki page

  7. Experiment • LingPipe API • Support LDA cluster • 20 number of topics • Linked term • English : random sample about #330,000 • Korean : about 220,000 • Document • English : 1000 number of article • Korean : 3185 number of article

  8. Aristotle

  9. Andorra

  10. Problem • Select total topic number • Topic number per document • Need to some threshold • Evaluation

  11. Evaluation • Count Overlapping Terms in Topic and in Session • Limit 3 topics per document • Labeling to all topics and judge manually

  12. Work plan • Experiment • Apply other language • French , Chinese, Espanol • Compare with old document • Analyze Latent changes

  13. Thanks

More Related