1 / 17

ProBase : common Sense Concept KB and Short Text Understanding

ProBase : common Sense Concept KB and Short Text Understanding. Wentao Ding. Term explanation. Common sense KB vs Encyclopedia KB Common short text Search Query, Document Title, Ad keyword, Caption, Anchor text, Question, Image Tag, Tweet/Weibo. ProBase /Microsoft Concept Graph.

donnag
Download Presentation

ProBase : common Sense Concept KB and Short Text Understanding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ProBase: common Sense Concept KBand Short Text Understanding Wentao Ding

  2. Term explanation • Common sense KB vs Encyclopedia KB • Common short text • Search Query, Document Title, Ad keyword, Caption, Anchor text, Question, Image Tag, Tweet/Weibo

  3. ProBase/Microsoft Concept Graph • A probabilistic taxonomy for Text understanding, harnessed from billions of web pages and years' worth of search logs. • 2016 version • 5,401,933 unique concepts • 12,551,613 unique instances • 87,603,947 IsA relations

  4. Probabilistic taxonomy

  5. Concept Distribution • X axis: the 5.4 million concepts ordered by their size, • Y axis: the number of instances each concept contains(logarithmic scale)

  6. Quality Evaluation • Coverage • Analyzed Bing’s query log from a two year period, sorted the queries in decreasing order by frequency. • Precision • On abenchmark dataset containing 40 concepts in various domains. • The concept size varies from 21 instances (for aircraft model) to 85,391 (for company), with a median of 917. 

  7. Coverage • Taxonomy Coverage: The query contains at least one concept or instance within the taxonomy. • Concept Coverage: The query contains at least one concept in the taxonomy.

  8. Precision

  9. Constructing ProBase • Extract superordinate-subordinate pairs from sentences • Merge nodes of same sense

  10. Conceptualization • The Microsoft Concept Tagging model (a.k.a. the Conceptualization model) aims to map text format entities into semantic concept categories with some probabilities.

  11. Concept Labeling for understanding short texts Challenges of short text understanding.

  12. Concept Labeling for understanding short texts • Concept coherence

  13. Head, Modifier, and Constraint Detection in Short Texts • [popular]modifier [iphone 5s]constraint [smart cover]head • To solve this, we need to know • (Instance-level head-modifier knowledge) “smart cover” is the head, and “iphone 5s” is the constraint. • (Conceptual knowledge) “smart cover” is an accessary, and “iphone 5s” is a device • (Concept-level head-modifier knowledge) when an accessary and a device appear together, the device is the constraint and the accessary is the head.

  14. Head, Modifier, and Constraint Detection in Short Texts

  15. Reference • ProBase. https://www.microsoft.com/en-us/research/project/probase/ • Microsoft Concept Graph For Short Text Understanding. https://concept.research.microsoft.com/Home/Introduction • Understanding Short Texts, Zhongyuan Wang and Haixun Wang, in the Association for Computational Linguistics (ACL), August 2016. http://www.wangzhongyuan.com/tutorial/ACL2016/Understanding-Short-Texts/ • Probase,HaixunWang, APWeb 13. http://101.96.10.63/www1.se.cuhk.edu.hk/~apweb/previous/apweb2013/slides/Haixun-APWeb13-Tutorial.pdf • Probase: A Probabilistic Taxonomy for Text Understanding. WentaoWu, Hongsong Li, Haixun Wang, Kenny Q. Zhu, ACM International Conference on Management of Data (SIGMOD) | May 2012 • Short Text Understanding Through Lexical-Semantic Analysis. Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, XiaofangZhou, International Conference on Data Engineering (ICDE) | April 2015 • Head, Modifier, and Constraint Detection in Short Texts. ZhongyuanWang, Haixun Wang, ZhiruiHu, International Conference on Data Engineering (ICDE) | January 2014

  16. Thanks for listening • Q & A

More Related