1 / 10

Term Burstiness in WSD and Pseudo Relevance Feedback

Term Burstiness in WSD and Pseudo Relevance Feedback. Atelach Alemu Argaw March 2006. Burstiness Model (Sarkar et al). Model gaps (not term occurrence) Mixture of exponential distributions Model the amount of time until a specific event occurs Between-burst (1/ l 1 , or l 1 ’)

turi
Download Presentation

Term Burstiness in WSD and Pseudo Relevance Feedback

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Term Burstiness in WSD and Pseudo Relevance Feedback Atelach Alemu Argaw March 2006

  2. Burstiness Model (Sarkar et al) • Model gaps (not term occurrence) • Mixture of exponential distributions • Model the amount of time until a specific event occurs • Between-burst (1/l1, or l1’) • Within-burst (1/l2 or l2’) • Reference:Sarkar, Avik; De Roeck,Anne; Garthwaite, Paul H. Term Re-occurrence Measures for Analyzing Style. In proceedings of SIGIR 2005 workshop on Stylistic Analysis Of Text For Information Access. 2005.

  3. Burstiness Model (Sarkar et al) • First occurrence • No occurrence: censoring

  4. Burstiness Model • Baysian parameter estimation • posterior  prior x likelihood • P(theta/D) = P(theta) x P(D/theta) • choose uninformative prior • estimate posterior using Gibbs Sampling (MCMC) • Random sampling from the population and using the sample values to estimate the posterior. • WinBUGS software

  5. WINBUGS

  6. Parameter estimates (Sarkar et al) l1’ = 1 / l1 • The mean of the exponential distribution with parameter lambda • Rarity of a term in the corpus : average gap at which the term occures if it has not occured recently l2 ’ = 1 / l2 • The rate of occurence of a term given it has occured recently • Within document burstiness P1 • Probability of a term occuring with rate l1’ P2 • Probability of a term occuring with rate l2’

  7. Burstiness Model (Sarkar et al) Word behaviours Small l1’, small l2’: frequently occurring function word Large l1’, small l2’: bursty content word Small l1’, large l2’: frequent but well spaced function word Large l1’, large l2’: infrequent scattered function word

  8. Test run Data • Europarl • English 164K • Morphology, POS • Swedish 130K • Converted to numeric format Pilot run • 1000 iteration burn-in • further 5000 iterations for estimate

  9. Discussion Points • Convergence • Inclusion of POS and morphological analysis Vs more data • How could context information be included? • Does it have to be parallel? • WSD Vs topicality and pseudo relevance feedback

More Related