1 / 18

Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base. Yi Z eng , Dongsheng Wang, etc. 2013.11.18. Content. Introduction Chinese Semantic Knowledge Base Entity Linking Semantic Knowledge Base Construction Linking Unambiguous Entities

iolani
Download Presentation

Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base Yi Zeng, Dongsheng Wang, etc. 2013.11.18

  2. Content • Introduction • Chinese Semantic Knowledge Base Entity Linking • Semantic Knowledge Base Construction • Linking Unambiguous Entities • Stepwise Entity Disambiguation Based on a Chinese Knowledge Base • Experimental Result and Analysis • Conclusion

  3. Introduction • Motivation and requirement • Real-time update knowledge base • Extract facts automatically from massive raw texts • -> Entity identification in advance is essential

  4. Introduction • CASIA_EL (the proposed system) • Knowledge base construction • Entity linking by retrieving “relations” • Ambiguous • Similarity measurement • Stepwise ambiguity • Result • Linked 1232 entities from Sina to a Chinese Knowledge Base • An accuracy of 88.5% overall

  5. Chinese Semantic Knowledge Base Entity Linking • Semantic Knowledge Base Construction • Linking Unambiguous Entities • Stepwise Entity Disambiguation Based on a Chinese Knowledge base

  6. Semantic Knowledge Base Construction • Format • XML -> N3 • TDB triple store

  7. Synset construction from multiple sources (1/2) • Provided • Part of the BaiduEncyclopedia knowledge base • name, English name, Chinese name from the infoboxknowledge • Not provided synset • 1) nick names and redirect titles from: • Baidu Encyclopedia • Hudong Encyclopedia • Wikipedia Chinese pages • -> lead to 476,086 pair of synonyms are added

  8. Synset construction from multiple sources (2/2) • Split western people’s name by “.” into smaller keywords • Etc., “Michael·Jordan” is split into “Michael” and “Jordan” • added as possible labels for “Michael Jordan” • These synset are represented through “rdf:label” • Enable the search of entity through keyword

  9. Linking Unambiguous Entities • Preprocessing • <,>,《,》,”,”, etc. • For example, “《霸王别姬》 • Process as it is (syntax) • Remove and process again • Retrieving “rdfs:label” can result in • 1) one candidate • Link directly and out put KB_ID • 2) no candidate • Google’s “did you mean?” function • Start again or output null • 3) several candidate • We will discuss in details in the following

  10. Stepwise Entity Disambiguation Based on a Chinese Knowledge base

  11. Stepwise Entity Disambiguation Based on a Chinese Knowledge base • Stepwise Bag-of-Words(S-BOW) • Add other entities' document • bag[dst] • Bag of [Document of short text] • bag[dkb] • Bag of [document of knowledgebase] Algorithms • 1) sim(dst, dkb)=|bag[dst] ∩bag[dkb]|. • 2) bag’[dst]= bag[dst]bag[t1]bag[t2]…bag[tn]. • 3) sim'(dst, dkb)=|bag'[dst] ∩bag[dkb]| • = | (bag[dst]bag[t1]bag[t2]…bag[tn]) ∩bag[dkb] |.

  12. Experimental Result and Analysis

  13. Experimental Result and Analysis • Reasons why we select nouns and literal string • Which words should be considered and the documents should be added?

  14. Experimental Result and Analysis • The number of candidate entities • Goes larger, more correct it is

  15. Experimental Result and Analysis • Due to the adding of many synonyms • Many of then more than 10 (56 entities) • Performs very well • Example • Note: when the number is greater than 9, there are no incorrect disambiguations.

  16. Experimental Result and Analysis • Disambiguation • 161 disambuguation entites are detected (161/1232) • 123 were disambiguated -> correctness is 82.1% (101/123 entities were correctely disambiguated) • Extend the original microblog posts (Stepwise) • Another 38 entities were disambiguated • correctness is 63.2% • the overall entity disambiguation correctness based on the proposed algorithm is 77.6% (125/161 entities are correctly disambiguated).

  17. Conclusion • As for short texts (Based on Chinese Knowledge base) • Stepwise method • Adding other documents of entities those that are within the same context • Compared to many algorithms • Solve the insufficient of information • Retrieving of the relations is various • Enriching of the synset from multiple sources

  18. References • [1] F. M. Suchanek, G. Kasneci, and G. Weikum, "YAGO: A Large Ontology from Wikipedia and WordNet". Journal of Web Semantics, 6(3), 203-217, Elsevier, 2008. • [2] A. Bagga and B. Baldwin, "Entity-based Cross-document Coreferencing Using the Vector Space Model". Proceedings of the 17th International Conference on Computational linguistics (COLING '98), 79-85, ACL, Montreal, Quebec, Canada, 1998. • [3] G. S. Mann and D. Yarowsky, "Unsupervised personal name disambiguation". Proceedings of the 7th Conference on Natural Language Learning (CONLL '03), 33-40, ACL, Edmonton, Canada, 2003. • [4] R. Bekkerman and A. McCallum, "Disambiguating Web appearances of people in a social network". Proceedings of the 14th International Conference on the World Wide Web (WWW '05), 463-470, ACM Press, Chiba, Japan, 2005. • [5] L. Jiang, J. Wang, N. An, S. Wang, J. Zhan, and L. Li, "GRAPE: A Graph-Based Framework for Disambiguating People Appearances in Web Search". Proceedings of the 9th IEEE International Conference on Data Mining (ICDM '09), 199-208, IEEE Press, 2009. • [6] X. Han and J. Zhao, "Named entity disambiguation by leveraging Wikipedia semantic knowledge". Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM '09), 215-224, ACM Press, Hong Kong, China, 2009. • [7] W. Shen, J. Wang, P. Luo, and M. Wang, "LINDEN: linking named entities with knowledge base via semantic knowledge". Proceedings of the 21st international conference on the World Wide Web (WWW '12), 449-458, ACM Press, Lyon, France, 2012. • [8] X. Niu, X. Sun, H. Wang, S. Rong, G. Qi, and Y. Yu, "Zhishi.me - Weaving Chinese Linking Open Data". Proceedings of the International Semantic Web Conference (ISWC '11), Lecture Notes in Computer Science 7032, 205-220, Springer, 2011. • [9] Z. Wang, J. Li, Z. Wang, and J. Tang, "Cross-lingual Knowledge Linking across Wiki Knowledge Bases". Proceedings of the 21st World Wide Web Conference (WWW '12), 459-468, ACM Press, Lyon, France, 2012.

More Related