1 / 23

Named Entity Recognition in Query

Named Entity Recognition in Query. Jiafeng Guo , Gu Xu, Xueqi Cheng, Hang Li ( ACM SIGIR 2009 ) Speaker: Yi-Lin,Hsu Advisor: Dr. Koh, Jia-ling Date: 2009/11/16. Outline. Introduction to NERQ NERQ Problem Implementation WSLDA Experimental Results Conclusion and Future work.

galena
Download Presentation

Named Entity Recognition in Query

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Named Entity Recognition in Query Jiafeng Guo, Gu Xu, Xueqi Cheng, Hang Li (ACM SIGIR 2009) Speaker: Yi-Lin,HsuAdvisor: Dr. Koh, Jia-ling Date: 2009/11/16

  2. Outline • Introduction to NERQ • NERQ Problem • Implementation • WSLDA • Experimental Results • Conclusion and Future work

  3. Introduction to NERQ • Named entity recognition (NER)is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.

  4. Introduction to NERQ • NERQ involves 2 tasks: • 1. Detection of the named entity in a given query • 2. Classification of the named entity into predefined classes. • Example: mine movie titles • Applications: Web search, etc. • Challenges • Queries are usually very short • Queries are not necessarily in standard form

  5. Query Data • New data source for NER • About 70% of search queries contain named entities. • Rich context for determining the classes of entities. • Query Context • “harry potter walkthrough”→“harry potter cheats” (context in the same class) • Wisdom-of-crowds • Very Large-scale data and keep on growing • Frequent update with emerging named entities

  6. NERQ Problem • A query having one named entity is represented as a triple (e, t, c), • e : named entity, • t : context of e α#β • c : class of e

  7. Probabilistic Approach • (e,t,c)* = argmax (e,t,c) Pr(q,e,t,c) = argmax (e,t,c) Pr(q|e,t,c) Pr(e,t,c) = argmax (e,t,c) Pr(e,t,c) (1) • Pr(e,t,c) = Pr(e) Pr(c|e) Pr(t|e,c) = Pr(e) Pr(c|e) Pr(t|c) (2) Make an assumption here

  8. Topic Model for NERQ • T = {(ei,ti,ci) | i = 1..N} , the learning problem can be formalized as :

  9. Implementation • Offline Training • Online Prediction

  10. Offline Training Seeds Query log Scan the query log with the seed name entity and collect the queries contain them ……………….. Harry Potter ……………….. ……………….. ……………….. Harry Potter trail Harry Potter walk through Harry Potter cheats ………………..

  11. Offline Training • Pr(e) : the total frequency of queries containing e in the query log Name entity Context Class Harry Potter New Moon trails movie Query Pr(c|e) : estimated by WS-LDA Pr(c|t) : fixed

  12. Online Prediction harry potter trails Find the most likely triple (e,t,c) in G(q)

  13. WSLDA

  14. WSLDA • Introduce Weak Supervision • LDA log likelihood + soft constraints • Soft Constraints Soft Constraints LDA Probability Document Probability on i-th Class Document Binary Label on i-th Class

  15. WSLDA • Objective Fuction :

  16. Experiments • A real data set consisting of 6 billion queries • 930 million unique queries • Four semantic classes ,“Movie”, “Game”, “Book”, and “Music”. • 4 human annotators. • 180 named entities were selected from the web sites of Amazon, GameSpot, and Lyrics. • 120 for training and 60 for test. • Finally , we obtain 432,304 contexts and about 1.5 millions name entities.

  17. Experiments • Randomly sampled 400 queries from the recognition results(0.14 millions) for evaluation.

  18. Experiments • The performance of NERQ is evaluated in terms of Top N accuracy.

  19. Experiments • We performed experiments to make comparison between the WS-LDA approach and two baseline methods: Determ and LDA. • Determ learns the contexts of a certain class by simply aggregating all the contexts of named entities belonging to that class. • LDA and WS-LDA take a probabilistic approach

  20. Experiments Movie Contexts Game Contexts Determ LDA WS-LDA Determ LDA WS-LDA Book Contexts Music Contexts Determ LDA WS-LDA Determ LDA WS-LDA

  21. Table 5: Comparisons on Learned Named Entities of Each Class (P@N) Movie Game Book Music Average-Class

  22. Experiments • Comparisons between WS-LDA and LDA

  23. Conclusion • Formalized the Problem of NERQ • Proposed a novel method for NERQ • Develop a new topic model called WSLDA • Future Works: • We plan to add more classes and conduct the experiments. • The proposed method focuses on single named entity queries. • Some queries contained the named entity out of predefined classes. (e.g. American beauty company) • Some contexts were not learned in our approach since they are uncommon. (e.g lyrics for # by chris brown )

More Related