html5-img
1 / 25

Using term informativeness for named entity detection

Using term informativeness for named entity detection. Advisor : Dr. Hsu Reporter : Chun Kai Chen Author : Jason D. M. Rennie and Tommi Jaakkola. 2005.SIGIR 353-360. Outline. Motivation Objective Introduction Mixture Models Experiment Summary. Motivation.

stan
Download Presentation

Using term informativeness for named entity detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using term informativeness for named entity detection Advisor :Dr. Hsu Reporter:Chun Kai Chen Author:Jason D. M. Rennie and Tommi Jaakkola 2005.SIGIR 353-360

  2. Outline • Motivation • Objective • Introduction • Mixture Models • Experiment • Summary

  3. Motivation • Informal communication (e-mail, bulletin boards) poses a difficult learning environment • because traditional grammatical and lexical information are noisy • timely information can be difficult to extract • Interested in the problem of extracting information from informal, written communication.

  4. Objective • Introduced a new informativeness score that directly utilizes mixture model likelihood to identify informative words.

  5. Mixture Models • Identified informative words • looking at the difference in log-likelihoodbetween a mixture model and a simple unigram model • The simplest model • ni for the number of flips per document • hi for the number of heads • θ = 0.5 • mixture model • Mixture score

  6. Mixture Models(example1) • Example • Keyword “fish” , D1={fish fish fish} D2={I am student} • four short “documents”: {{HHH},{TTT},{HHH},{TTT}} • simple unigram model {{HHH},{TTT},{HHH},{TTT}} ={0.53(1-0.5)(3-3)}×{0.50(1-0.5)(3-0)}×{0.53(1-0.5)(3-3)}×{0.50(1-0.5)(3-0)} = 0.53× 0.53× 0.53× 0.53 = 0.000244140625=2-12 • mixture model {HHH}= {0.5 × 13 ×(1-1)(3-3)+(1-0.5) × 03 ×(1-0)(3-3)} = 0.5 +0 {TTT}= {0.5 × 10 ×(1-1)(3-0)+(1-0.5) × 00 ×(1-0)(3-0)} = 0 +0.5 {{HHH},{TTT},{HHH},{TTT}}=0.5 × 0.5 × 0.5 × 0.5=0.0625=2-4

  7. Mixture Models(example2) • Example • four short “documents”: {{HTT},{TTT},{HTT},{TTT}} • simple unigram model {{HTT},{TTT},{HTT},{TTT}} ={0.51(1-0.5)(3-1)}×{0.50(1-0.5)(3-0)}×{0.51(1-0.5)(3-1)}×{0.50(1-0.5)(3-0)} = 0.53× 0.53× 0.53× 0.53 = 2-12 • mixture model {HTT}= {0.5 × 0.331 ×(1-0.33)(3-1)+(1-0.5) × 0.661 ×(1-0.66)(3-1)} = (0.5 × 0.33 × 0.662)+(0.5 × 0.66 ×0.332 )=0.071874+0.035937=0.107811 {HTT},{TTT},{HTT},{TTT}}=0.107811 × 0.5 × 0.107811 × 0.5=0.0029058

  8. Mixture Models(example3) • Example • four short “documents”: {{HTTTT},{TTT},{HTT},{TTT}} • simple unigram model {{HTTTT},{TTT},{HTT},{TTT}} ={0.51(1-0.5)(5-1)}×{0.50(1-0.5)(3-0)}×{0.51(1-0.5)(3-1)}×{0.50(1-0.5)(3-0)} = 0.55× 0.53× 0.53× 0.53 = 2-14 • mixture model {HTTTT}={0.5 × 0.21 ×(1-0.2)(5-1)+(1-0.5) × 0.81 ×(1-0.8)(5-1)} =(0.5 × 0.2 × 0.84)+(0.5 × 0.8 ×0.24 ) = 0.04096+0.00064=0.0416 {{HTTTT},{TTT},{HTT},{TTT}}=0.0416 × 0.5 × 0.107811 × 0.5=0.0011212344

  9. Mixture Models(Mixture score) • {{HHH},{TTT},{HHH},{TTT}} =0.0625 / 2-12 • {{HTT},{TTT},{HTT},{TTT}} = 0.0029058 /2-12 • {{HTTTT},{TTT},{HTT},{TTT}} = 0.0011212344 / 2-14

  10. Named Entity Extraction Performance

  11. Introduction(1/4) • The web is filled with information, • but even more information is available in the informal communications people send and receive on a day-to-day basis • We call this communication informal because structure is not explicit and the writing is not fully grammatical. • We are interested in the problem of extracting information from informal, written communication.

  12. Introduction(2/4) • Newspaper text is harder to deal with. • But, newspaper articles have proper grammar with correct punctuation and capitalization; • part-of-speech taggers show high accuracy on newspaper text • Informal communication • even these basic cues are noisy—grammar rules are bent, capitalization may be ignored or used haphazardly and punctuation use is creative

  13. Introduction(3/4) • Restaurant bulletin boards • contain information about new restaurants almost immediately after they open • a temporary closure, new management, better service or a drop in food quality. • This timely information can be difficult to extract. • An important sub-task of extracting information from restaurant bulletin boards is identifying restaurant names.

  14. Introduction(4/4) • If we had a good measure of how topic-oriented, or “informative,” • we would be better able to identify named entities • It is well known that informative words have “peaked” or “heavy-tailed” frequency distributions. • Many informativeness scores have been introduced • Inverse Document Frequency (IDF) • Residual IDF • xI • the z-measure • Gain

  15. Mixture Models • Exhibiting two modes of operation: • A high frequency mode • when the document is relevant to the word • A low (or zero) frequency mode • when the document is irrelevant • Identified informative words • by looking at the difference in log-likelihoodbetween a mixture model and a simple unigram model

  16. Mixture Models • Example • Consider the following four short “documents”:{{HHH},{TTT},{HHH},{TTT}} • The simplest model for sequential binary data is the unigram. • ni for the number of flips per document • hi for the number of heads • θ = 0.5 • The unigram is a poor model for the above data. • The unigram has no capability to model the switching nature of the data. • the data likelihood is 2−12

  17. Mixture Models • Example • Consider the following four short “documents”:{{HHH},{TTT},{HHH},{TTT}} • The likelihood for a mixture of two unigrams is: • 各取一半的比例 • A mixture is a composite model. • data likelihood is 2−4

  18. Mixture Models • The two extra parameters of the mixture allow for a much better modeling of the data. • Mixture score is then the log-odds of the two likelihoods: • Interested in knowing the comparative improvement of the mixture model over the simple unigram. • Using EM to maximize the likelihood of the mixture model.

  19. Experimental Evaluation • The Restaurant Data • Using the task of identifying restaurant names in posts to a restaurant discussion bulletin board. • Collected and labeled six sets of threads of approximately 100 posts each from a single board. • Used Adwait Ratnaparkhi’s MXPOST and MXTERMINATOR software to determine sentence boundaries, tokenize the text and determine part-of-speech. • Handlabeled each token as being part of a restaurant name or not. • 56,018 token,1968 tokens were labeled as a restaurant name • 5,956 unique tokens. Of those, 325 were used at least once as part of a restaurant name

  20. Experimental Results

  21. Summary • Introduced a new informativenss measure, the Mixture score, and compared it against a number of other informativeness criteria. • Found the mixture score to be an effective restaurant word filter. • IDF*Mixture score is a more effective filter than either individually.

  22. Personal Opinion • Advantage • Disadvantage

More Related