1 / 30

Keyword++: A Framework to Improve Keyword Search Over Entity Databases

Keyword++: A Framework to Improve Keyword Search Over Entity Databases. Motivation. Current keyword search over databases have limitations for entity databases related to keyword matching Not returning all relevant results Returning irrelevant results . Motivation. Related Work.

huyen
Download Presentation

Keyword++: A Framework to Improve Keyword Search Over Entity Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Keyword++: A Framework to Improve Keyword Search Over Entity Databases

  2. Motivation • Current keyword search over databases have limitations for entity databases related to keyword matching • Not returning all relevant results • Returning irrelevant results

  3. Motivation

  4. Related Work • Most previous searching for entity database required users to input formatted queries • Examples • (Amazon customer service #phone) • (#professor #university #research=’database’) • Where each word with a # referred to an entity and other words are meant as keywords

  5. Problem • Given • Search interface S over an entity relation E • Ϙis a set of historical keyword queries • Find • For all k in Ϙfind its mapping Mσ(k) and its confidence score Ms(k) for the mapping • Using mapping M, find the best CNF (Conjunctive normal form) SQL query Tσ(Q) for a keyword query Q

  6. Mapping Keywords to Predicates • DQP (Differential Query Pair) • Qf and Qb where Qf = Qb U {k} • Qfis the foreground query (set of keywords) • Qbis the background query (set of keywords) • k is the differential keyword

  7. Mapping Keywords to Predicates • DQP (Differential Query Pair) • Qb= [small laptop] • Returns 20 laptops, only 3 have brand “Lenovo” • Qf= [small IBM laptop] • Returns 10 laptops, 5 have brand “Lenovo”

  8. Mapping Keywords to Predicates • Generating DQPs for Keywords • Given query Q and a keyword k in Q • Make new DQPs by Qf = Q and Qb = Q - {k} • With historical keyword queries, Ϙcan be used • Get all Qfand Qbin Ϙwhere Qf = Qb U {k}

  9. Mapping Keywords to Predicates • Scoring Predicates using DQPs • D(A) is the range of values for a given attribute • For every value v in D(A), let p(v, A, Se) be the probability that the attribute A has the value v for a set of objects Se • P(A, Se) is the distribution of p(v, A, Se) for all v in D(A) • SfandSbare the sets of results for Qf and Qb

  10. Mapping Keywords to Predicates • Correlation Metrics • KL-divergence (used for categorical predicates) • Measures the difference between two probabilities • Given SfandSb, the KL-divergence is:

  11. Mapping Keywords to Predicates • Correlation Metrics • A is BrandName and v is “Lenovo” • Qb= [small laptop] • Probability of .15 (3 out of 20) • Qf= [small IBM laptop] • Probability of .5 (5 out of 10)

  12. Mapping Keywords to Predicates • Correlation Metrics • Earth Mover’s Distance (used for numerical predicates) • Measures the difference between two probability distributions • Given SfandSb, and the sorted values for D(A), the EMD is:

  13. Mapping Keywords to Predicates • Correlation Metrics • A is ScreenSize, Qb= [IBM laptop], Qf= [small IBM laptop]

  14. Mapping Keywords to Predicates • Score Aggregation • Given a keyword k and a set of DQPs each with respect to k, the aggregate score for keyword k with respect to a predicate σ is:

  15. Mapping Keywords to Predicates • Scoring Threshold • Categorical and Numerical Predicates • Keyword queries with low numbers of DQPs must have a higher threshold to create a mapping

  16. Mapping Keywords to Predicates • Scoring Thresholds • Create mapping Mσ(k) with Ms(k)= AggScore

  17. Query Translation • Q = [t1, t2, …, tq] • Qi = [t1, …, ti] is the prefix of Q with itokens • Example • Q = [small IBM laptop] and n = 2 • Q1 = [small] and Ts(Q1) = Ms(“small”) • Q2 = [small IBM] • Ts(Q2) = Ts(Q1) + Ms(“IBM”) • Ts(Q2) = Ms(“small IBM”) • Pick the one with the higher score for rewriting Q2

  18. Query Translation • SELECT * FROM Table WHERE cnf(σA=v) AND cnf(σContains(A,t)) ORDER BY {σ(A,SO)} • cnf(σA=v) is a conjunctive form of categorical predicates • cnf(σContains(A,t)) is a conjunctive form of textual predicates • {σ(A,SO)} is a list an ordered list of numerical predicates

  19. Query Translation • Example • Q = [small IBM laptop] • SELECT * FROM Table WHERE BrandName = ‘Lenovo’ AND ProductDescription LIKE ‘%laptop%’ ORDER BY ScreenSize ASC

  20. Experiments • Dataset • Entity table with 8,000 laptops • 28 categoricalattributes • 7 numerical attributes • 2 textual attributes (ProductName and ProductDescription)

  21. Experiments • Comparison Methods • Ground truth from 100,000 web search queries classified as web queries • Compared with keyword-and approach and query-portal approach • keyword-and: returns entities contain all query tokens • query-portal approach: web search engine • Evaluated for precision, recall, and Jaccard

  22. Experiments • Results

  23. Fuzzy Matching of Web Queries to Structured Data

  24. Motivation • Example • A user issues a keyword query “Indy 4 near San Fran,” instead of “Indiana Jones and the Kingdom of the Crystal Skull near the city of San Francisco”

  25. Problem • Synonyms, Hypernyms, and Hyponyms • Let ε be the set of entities over which the synonyms are to be defined • Let S be the universal set of strings where each string is sequence of one or more words • We assume their exists an oracle function F(s, ε) -> E where s ∈ S and E⊆ ε

  26. Problem • Synonyms, Hypernyms, and Hyponyms • Synonym: s1 ∈ S is a synonym of another string s2∈ Sif and only if F(s1, ε) = F(s2, ε) • Example: s1 = “Indiana Jones IV” and s2 = “Indian Jones 4” • Hypernym: s1 ∈ S is a hypernymof another string s2∈ S if and only F(s1, ε) ⊃F(s2, ε) • Example: s1 = “Indiana Jones series” • Hyponym: s1 ∈ S is a hyponymof another string s2∈ S if and only F(s1, ε) ⊂F(s2, ε)

  27. Problem • Web Synonym Finding • Given a set of string U, the data sets A and L and the reference set of entities ε • Return for each string u ∈ U, its unique set of Web synonyms Wu = { w∈S | GA(u, P) ≈ GL(w, P) }

  28. Candidate Generation • Finding Surrogates • Issue a search to the Bing Search API and maintain the top-k results • A web page p is a surrogate for u, the keyword query, if p is in the results • Referencing Surrogates • A query w is a synonym candidate for u if at least one surrogate of u has been clicked when w was issued as the keyword query

  29. Candidate Selection • Intersection Page Count • Intersecting Click Ratio

  30. Candidate Selection

More Related