1 / 20

Structured Annotations of Web Queries

Structured Annotations of Web Queries. Author: N. Sarkas et al. SIGMOD 2010. Agenda. Motivation Approach Query Processing Probabilistic Model. Motivation. Do keyword search on structured database The user leverages the experience on using search engine to search records in a database

calum
Download Presentation

Structured Annotations of Web Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structured Annotations of Web Queries Author: N. Sarkas et al.SIGMOD 2010

  2. Agenda • Motivation • Approach • Query Processing • Probabilistic Model

  3. Motivation • Do keyword search on structured database • The user leverages the experience on using search engine to search records in a database • Problems • Keyword matching may not get any result • Misinterpretation of the user’s intent • Q=“white tiger” • T=Shoes, Color=white, Shoe Line=tiger • T=Book, title=white tiger • The animal white tiger (not in the database) • Provide fast response (in ms) for web users • Not efficient if the system queries every database

  4. Approach • Annotation • Annotation token for a table (AT) = (t, T.A) • Q=50 inch LG • (LG,TVs.Brand), (LG,Monitors.Brand),(50 inch,TVs.Diagonal) • All tokens (nominal, ordinal and numerical) are annotated on the table. • Accept numerical tokens in a certain range

  5. Overall Query Processing • Generate structured annotation for the keywords • Find the maximal annotation • Annotation scoring

  6. Query Processing • Generate structured annotation for a query • Keyword: K1, k2 • (T1, {(k1,T1.Att1)(…)}, {free token})(T2, {(k1,T2.Att1) (…)}, {}) • Free token: a query keyword not associated with an attribute • Example “50 inch LG lcd” • (TVs, {(50 inch, TVs.Diagonal), (LG, TVs.Brand), (lcd, TVs.screen)}, {}) • (Monitors, {(50 inch, Monitors.Diagonal),(LG, Monitors.Brand), (lcd, Monitors.Screen)}, {}) • (Refrig, {(50 inch, Refrig.Width), (LG, Refrig.Brand)}, {lcd})

  7. Query Processing • Find the maximal annotation • Given a table, we want more annotation tokens, less free tokens • Annotation S = (T, AT, FT) • there’s no S’ = (T, AT’, FT’) s.t. AT’ > AT, FT’ < FT • AT: annotated token • FT: free token • Example • S1=(TVs, {(LG, TVs.Brand), (lcd, TVs.screen)}, {}) • S2=(TVs, {(LG, TVs.Brand), {lcd}) • S3=(TVs, {(50 inch, TVs.Diagonal), (LG, TVs.Brand), {lcd}) • S2 is not maximal

  8. Query Processing • Scoring annotation • Intuition • Query: LG 30 inch screen • Want: TV, monitor • Dislike • DVD Player • There’s no DVD player with a screen in the database • People don’t query size of a DVD player • Cell phone • The size of the screen is significantly smaller in the database • A probabilistic model is chosen for the scoring

  9. Probabilistic Model • Generative probabilistic model • If the user searches a table, what are the words that the user may use (the probability of each word)? • P(T.Ai): the probability that the user search table T and the subset of the attributes T.Ai • Given attributes, users select tokens with probability • : the attributes of table T + free tokens • Example “LG 30 inch screen” • Need to simplify the equation

  10. Probabilistic Model • Assumption 1 • Annotated and free tokens are independent • Assumption 2 • The user depends on the table to choose free tokens • The user depends on the attributes of the table to choose annotated tokens, not on free tokens • ……………………(2) • Si: given query q, the annotation Sq=S1,…,Sk

  11. Probabilistic Model • The equation (2) assumes that all queries are targeting some table in the data collection. • Not true. Ex: Q=“green apple”. • Annotation: green=color, apple = brand. • Could green apple mean a fruit? • Approach • Open Language Model (OLM) table: capture open-world queries (ex: the log of Bing ) • Sq={S1,…,Sk, Sk+1}, where Sk+1=SOLM. • SOLM=(OLM,{FTq}) • ……………………. (3) • To keep plausible annotation

  12. Probabilistic Model • We have two probabilistic models • ……………(2) • …………………. (3) • What’s next? • Maximize the probability • Simplify the equation when necessary • Build up a system which is based on the model

  13. Thank You!

  14. The Probabilistic Model • Consider a query from web log • It can either be formulated by an annotation or is a free-text query. • = • = * UMT=all possible names and values that can be associated with table T = confidence level Ex: FT=computer. T1=Monitor, T2=TV

  15. Probabilistic Model • Given • Observed data: web query log • Model: = • To Find • and that maximize the likelihood

  16. Expectation-Maximization (EM) • Initial step • Select initial • Repeat • Expectation step • Based on current , estimate • Maximization step • Based on current , maximize

  17. Thank You!

  18. Probabilistic Model • The fraction of the entries in a table T that take the values ATi. • Q=“50 inch LG lcd” • S = (TVs, {(LG,TVs.Brand),(50 inch, TVs.Diagonal),{lcd}). • T.A = {Brand, Diagonal} • T(AT.V) = all records in TV of brand LG with diagonal size 50 inch. • [offline] A mapping from the value to the number of matched records in the table

  19. Probabilistic Model • Maximum likelihood estimation • Given • A set of observed data , …} • A proposed model • To find • The parameter that maximize the likelihood • Rephrase • Given • Observed data: web query log • Model: = • To Find • and that maximize the likelihood • EM algorithm can be used to solve Likelihood If I were to flip a fair coin 100 times, what is the probability of it landing heads-up every time?“ Given that I have flipped a coin 100 times and it has landed heads-up 100 times, what is the likelihood that the coin is fair?“ (source: wikipedia)

  20. Expectation-Maximization (EM) • An iterative algorithm with 2 steps • Expectation step • Estimate parameters • Maximization step • Calculate expected value Q of the log likelihood function • Find parameter that maximize Q.

More Related