1 / 32

Search Query Disambiguation from Short Sessions

Search Query Disambiguation from Short Sessions. Lilyana Mihalkova & Raymond Mooney The University of Texas at Austin. Query Disambiguation. scrubs. ?. Existing Work. Well-studied problem: [e.g., Sugiyama et al. 04, Sun et al. 05, Dou et al. 07]

omar
Download Presentation

Search Query Disambiguation from Short Sessions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search Query Disambiguationfrom Short Sessions Lilyana Mihalkova & Raymond Mooney The University of Texas at Austin

  2. Query Disambiguation scrubs ?

  3. Existing Work • Well-studied problem: [e.g., Sugiyama et al. 04, Sun et al. 05, Dou et al. 07] • Common Assumption: Information about each user is available over a relatively long period of time.

  4. Privacy Concerns • NY Times: “A Face is Exposed for AOL Searcher no. 4417749” • [Conti, 06]: “Googling Considered Harmful”

  5. Pragmatic Concerns • Identifying users across search sessions • Log-in? • IP Address? • Managing and protecting user-specific information

  6. Proposed Setting • Base personalization only on short-term search histories • complete search histories cannot be reconstructed • Relate current session to previous short sessions of other users, based on the search activity in these sessions

  7. How Short is Short-Term? Number of sessions with that many queries Number of queries before ambiguous query

  8. 98.7 fm huntsville hospital www.star987.com www.huntsvillehospital.com kroq ebay.com www.kroq.com www.ebay.com scrubs scrubs ??? ??? scrubs.com scrubs-tv.com Is This Enough Info?

  9. More Closely Related Work • [Almeida & Almeida 04]: Similar assumption of short sessions, but better suited for a specialized search engine (e.g. on computer science literature) • [Krause & Horvitz 08]: Explicitly models the tradeoff between better performance and more user information.

  10. Main Challenge • How to harness this small amount of potentially noisy information available for each user? • Exploit the relations among users, sessions, URLs • Use statistical relational learning (SRL) [Getoor & Taskar 07]

  11. huntsville hospital huntsvillehospital.org ebay ebay.com scrubs ??? Using Relational Information huntsville school . . . scrubs scrubs.com . . . hospitallink.com scrubs scrubs-tv.com … ebay.com

  12. Details • Used Markov logic networks (MLNs) [Richardson & Domingos 06] • MLN structure is provided as domain knowledge • Weights are learned from the data • Weight learning: Adapted contrastive divergence [Lowd & Domingos 07] for incremental learning

  13. Predicates • Evidence predicates • provide information about clicked URLs and keywords shared between sessions, i.e. • shares-keyword-between-clicks(ActiveS, backgroundS, keyword) • shares-keyword-between-click-and-search(ActiveS, backgroundS, keyword) • shares-clicks(ActiveS, BackgroundS, hostname) • provide information about clicked URLs and keywords in current session • Query predicate • states that user will chose particular URL • clicks-on(ActiveS, hostname)

  14. Re-Ranking of Search Results • Search engine produces a list of search results • For each possible search result R, compute the probability that clicks-on(ActiveS, R) • Rank the search results by their likelihood of being clicked

  15. ambiguous query some query www.clickedResult.com www.someplace1.com . . . ambiguous query www.someplace1.com MLN 1 • User will click on at least one result • User will select result chosen by previous user with whom a click is shared

  16. ambiguous query some query www.clickedResult.com www.someplace1.com some other ambiguous query www.aClick.com MLN 2 • MLN1 + • User will select result chosen by previous user with whom a keyword is shared • click-to-click, click-to-search, search-to-click, search-to-search

  17. MLN 3 • MLN 2 + • User will choose result that shares a keyword with a previous search or click in the current session www.someResult.com some query www.anotherPossibility.com www.someplace1.com www.yetAnother.com ambiguous query

  18. Data • Collected from the MSN engine in May 2006 • Contains time-stamped records of searches and clicked URLs, grouped by sessions • Average session length is 3.28 • No across-session identifiers • Used first 25 days for training/validation and last 6 days for test

  19. Data Limitation #1: • Data does not specify what queries are ambiguous • Consider query as ambiguous, if over all pages clicked after searching for this query, at least 2 fall in different high-level categories in the DMOZ (dmoz.org) hierarchy. • Limit to query strings of up to two words (43.7%) • 6,360 ambiguous queries (2.4% of all two-word query strings)

  20. Data Limitation #2 • Data does not provide the full list of search results presented to the user; only the ones actually clicked • Assume that the URLs seen by the user are those clicked by at least one person after searching for the exact query string • Consequence: result sets have differing lengths

  21. Result Set Sizes Number of queries with that result set size Size of result set for ambiguous query

  22. Evaluation Metrics: MAP • Mean average precision • identical to the area under the interpolated precision/recall curve

  23. Evaluation Metrics: AUC-ROC • Area under the ROC curve • identical to the mean average true negative rate

  24. Baselines • Random: Rank randomly • Click-Sim: Rank by similarity based on shared clicks • Click-KW-Sim: Rank by similarity based on shared clicks and keywords

  25. Click-Sim . . . huntsville hospital scrubs huntsvillehospital.org scrubs.med scrubs Average similarity . . . based on shared clicks scrubs ??? scrubs.med . . . . . . . . . . . . scrubs scrubs scrubs scrubs scrubs.med scrubs.med scrubs.tv scrubs.tv scrubs.tv

  26. Click-KW-Sim . . . huntsville hospital scrubs huntsvillehospital.org scrubs.med scrubs Average similarity . . . based on shared clicks scrubs ??? and keywords scrubs.med . . . . . . . . . . . . scrubs scrubs scrubs scrubs scrubs.med scrubs.med scrubs.tv scrubs.tv scrubs.tv

  27. Results (MAP) * * * *

  28. Results (AUC-ROC) * * *

  29. Current/Future Work • Incorporating more information in the models • Actual content of clicked pages • Popularity of pages • Weighing evidence based on how close it is in time to ambiguous query • Learning separate weights for each connecting keyword or domain/group of keywords or domains • Revising the provided clauses

  30. Questions?

  31. 1 • The popularity of a possible result provides a strong signal, but providing relational information on top of popularity gives further performance improvements • Rank by popularity + click-KW-Sim baseline: • MAP (0.383), AUC-ROC (0.536) • Rank by popularity only: • MAP(0.380), AUC-ROC (0.525)

  32. 2 Number of sessions with that many clicks Number of distinct clicks before ambiguous query

More Related