1 / 15

COLACL 2006 Segment-based Hidden Markov Models for Information Extraction

COLACL 2006 Segment-based Hidden Markov Models for Information Extraction. Zhenmei Gu David R. Cheriton School of Computer Science University of Waterloo Nick Cercone Faculty of Computer Science Dalhousie University. JSYU, 2006.09.14. Outlines. Introduction Problem description

rozene
Download Presentation

COLACL 2006 Segment-based Hidden Markov Models for Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COLACL 2006 Segment-based Hidden Markov Models for Information Extraction Zhenmei Gu David R. Cheriton School of Computer Science University of Waterloo Nick Cercone Faculty of Computer Science Dalhousie University JSYU, 2006.09.14

  2. Outlines • Introduction • Problem description • Previous works • Main contributions • Algorithms • Document-based HMM IE • Segment-based HMM IE

  3. Introduction - Problem Description • Template filling IE problem • MUC: NE (name entity)  CO (corefernece)  TE (template element) • Template • Seminar announcement • Slots: Location, speaker, stime, etime • Algorithm • New aspect in evaluating HMM model • New approach in solving TE problem

  4. Introduction - Previous Works • Using HMM for IE • Leek 1997, extract gene name-location facts • Bikel et al. 1997, find name in IE • Freitag and McCallum 1999, extract filler for slot • Other Markovian sequence models for IE • MEMM • CRF

  5. Fillers Introduction – Main Contributions Doc Doc • HMM • Reduce noise • Document-based  Segment-based • Alleviate sparseness • Remove irrelevant words • Eliminate redundancies of slotfillers • Multiple slot fillers  single slot filler HMM Retrieval HMM Extractor HMM Selection Filler

  6. Document-based HMM IE (1/3) • HMM structure (used to extract fillers)

  7. Document-based HMM IE (2/3) SA domain, 485 documents, ten-fold cross validation evaluation Doc_HMM: Author’s IE system with Simple Good-Turning HMM_None: Other HMM IE system (Freitag and McCallum, 1999) HMM_Global: Other HMM IE system with shrinkage

  8. Document-based HMM IE (3/3) Redundancy (in a document) Rdocument = Incorrect extracted fillers/all returned fillers R = average of Rdocument

  9. Segment-based HMM IE (1/5) Doc • Step 1: Retrieval HMM • Filter text segments • that might contain a filler • Step 2: Extractor HMM • Label each segment (sentence) • with the most probable state sequence • Sort segments • according to their normalized likelihoods of their best state sequences • Return the filler(s) • having the largest likelihood Retrieval HMM Extractor HMM Filler

  10. Segment-based HMM IE (2/5) • Step 2: Extraction • The segment with the highest l(s) number is selected For each segment s with token length of n, its normalized best state sequence likelihood is defined as follows. where λ is the HMM and Q is any possible state sequence associated with s.

  11. Segment-based HMM IE (3/5) • Step1: Retrieval • Select a segment if • Qfiller= the set of state sequences that pass through any filler states • {all Q} = Qbg ∪Qfiller.

  12. Segment-based HMM IE (4/5) • Step1: Retrieval The state sequences not passing through any target filler states. = The probability of s following this particular background state path Qbg Let s = O1O2 · · ·OT , where T is the length of s in tokens.

  13. Segment-based HMM IE (5/5)

  14. Fillers Main Contributions Doc Doc • HMM • Reduce noise • Document-based  Segment-based • Alleviate sparseness • Remove irrelevant words • Eliminate redundancies of slotfillers • Multiple slot fillers  single slot filler HMM Retrieval HMM Extractor HMM Selection Filler

  15. Thanks!!

More Related