navigation aided retrieval n.
Skip this Video
Loading SlideShow in 5 Seconds..
Navigation Aided Retrieval PowerPoint Presentation
Download Presentation
Navigation Aided Retrieval

Loading in 2 Seconds...

play fullscreen
1 / 23

Navigation Aided Retrieval - PowerPoint PPT Presentation

Download Presentation
Navigation Aided Retrieval
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Navigation Aided Retrieval Shashank Pandit & Christopher Olston Carnegie Mellon & Yahoo

  2. Search & Navigation Trends • Users often search and then supplement the search by extensively navigating beyond the search page to locate relevant information. • Why ? • Query formulation problems • Open ended search tasks • Preference for orienteering

  3. Search & Navigation Trends • User behaviour in IR tasks not often fully exploited by search engines ……….. • Content based – words • PageRank – in and out links for popularity • Collaborative – clicks on results • Search engines do not examine these navigation patterns ………(they fail to mention SearchGuide – Coyle et al that does)

  4. NAR – Navigation Aided Recommendation • New retrieval paradigm that incorporates post query user navigation as an explicit component – NAR • A query is seen as a means to identify starting points for further navigation by users • The starting points are presented to the user in a result-list and they permit easy navigation to many documents which match the users query

  5. NAR • Navigation retrieval with Organic structure • Structure naturally present in pre-existing web documents • Advantages • Human oversight – human generated categories etc • Familiar user Interface – list of documents (i.e. result-list) • Single view of document collection • Robust implementation – no semantic knowledge required

  6. The model • D – set of documents in corpus, T - users search task • ST – answer set for search task, QT- the set of valid queries for task T • Query submodel – belief distribution for the answer set given a query. What is the likelihood that doc d solves the task - Relevance • Navigation submodel – likelihood that a user starting at a particular document will be able to navigate (under guidance) to a document that solves the task.

  7. Conventional probabilistic IR Model • No outward navigation considered • Probability of solving the task depends on whether there is a document in the document collection which solves the task • Probability of the document solving a task is based on its “relevance” to the query

  8. Navigation-Conscious Model • Considers browsing as part of the search task • Query submodel – any probabilistic IR relevance ranking model • Navigation submodel – Stochastic model of user navigation WUFIS (Chi et al)

  9. WUFIS W(N, d1, d2) - probability that a user with need N will navigate from d1 to d2. • Scent provided by anchor and surrounding text. • The probability of a link being followed is related to how well a user’s need matches the scent – similarity between weighted vector of need terms and scent terms.

  10. Final Model • Documents starting point score = Query submodel X Navigation submodel

  11. Volant - Prototype

  12. Volant - Preprocessing • Content Engine • R(d,q) –estimated by Okapi DM25 scoring function • Connectivity Engine • Estimates the probability of a user with need N(d2) navigating from d1 to d2 starting with dw • Dijikstra’s algorithm used to generate tuples

  13. Volant – Starting points • Query entered -> ranked list of starting points • Retrieve from the content engine all documents, d’, that are relevant to the query • For each document retrieved from 1 retrieve from the connectivity engine all documents d for which W(N(d’),d,d’)>0 • For each unique d, compute the starting point score. • Sort in decreasing order of starting point score

  14. Volant – Navigation Guidance • When a user is navigation Volant intercepts the document and highlights links that lead to documents relevant to their query, q. • Retrieve from content engine all documents d’ that are relevant to q • For each d’ retrieved, get the documents that can lead to d from the connectivity engine i.e.W(N(d’),d,d’)>0 • For each tuple retrieved in step 2 highlight the links that point to dw

  15. Evaluation • Hypothesis • In query only scenarios Volant does not perform significantly worse that conventional approaches • In combined query/navigation scenarios Volant selects high-quality starting points. • In a significant fraction of query navigation scenarios the best organic starting point is of higher quality than the one that can be synthesized using existing techniques.

  16. Search Task Test Sets • Navigation prone scenarios are difficult to predict. Simplified Clarity Score was used to determine a set of ambiguous and unambiguous queries • Unambiguous – 20 search tasks with highest clarity from Trek 2000 • Ambiguous - 48 randomly selected tasks from Trek 2003

  17. Performance on Unambiguous Queries • Mean Average Precision • No significant difference • Why? Relevant documents tended not to be siblings or close cousins so Volant deemed that the best starting points were the documents themselves.

  18. Performance on Ambiguous Queries • User study – 48 judges judge the suitability of starting documents as starting points • 30 starting points generated • 10 Trec winner 2003 CSIRO • 10 Volant with user guidance • 10 (same as first 10 Volant) Volant without user guidance

  19. Performance on Ambiguous Queries • Rating criteria • Breadth – spectrum of people, different interests • Accessibility – how easy to navigate and find info • Appeal – presentation of material • Usefulness – would people be able to complete their task from this point. • Each judge spent 5 hours on their task

  20. Results

  21. Summary & Future Work • Effectiveness – responds to users and positions them at suitable starting point for their task, guides them to further information in a query driven fashion. • Relationship to conventional IR – generalizes conventional probabilistic IR model and is successful in scenarios where IR techniques fail – ambiguous queries etc

  22. Discussion • Cold Start Problem • Scalability • Bias in Evaluation