Web Information retrieval (Web IR)
130 likes | 244 Views
This handout explores the integration of user behavior in ranking functions for web information retrieval systems. It outlines a ranking function ( R = f(text{Query}, text{User behavior}, text{Web graph} & text{content features}) ) and discusses both explicit and implicit user interactions. Notably, 80% of user clicks connect directly to the query, providing valuable click-through data. By leveraging this data, systems can better fill the gap between user needs and search results, yielding improved relevance and precision in search outcomes.
Web Information retrieval (Web IR)
E N D
Presentation Transcript
Web Information retrieval (Web IR) Handout #13: Ranking based on User Behavior Ali Mohammad Zareh Bidoki ECE Department, Yazd University alizareh@yaduni.ac.ir
Finding Ranking Function • R=f( Query, User behavior, web graph & content features) • How can we use the user behavior? • Explicit • Implicit • 80% of user clicks are related to query • Click-through data • From search Engines log
Click-through data Triple (q,r,c) q=query r=ranked list c=set of clicked docs c q r Click-through data (by Joachims )
Benefits of Using Click through data • Democracy in Web • Filling gap between user needs and results • User clicks are more valuable that a page content (Search engine precision is evaluated by user no page creators) • Degree of relevancy between query and documents will increase (Adding click metadata to document)
Docs Docs Words Queries Users 1 1 1 1 1 2 2 2 2 2 n q n m w Web graph Web Entities
Document Expansion Using Click TD • First time Google used Anchortext as a document content • Anchor text is view of a document from another document
Long term incremental learning • Di vector of a document in ith iteration • Q is vector of the query that this document is clicked • Alpha is learning rate
Naïve Method (NM)A bipartite graph for docs and queries • Mij is number of clicks on document j for query i
Naïve Method (Cont.) • The weight between query qj and document di: • The meta data for document i is:
Co-Visited Method • If two pages are clicked by the same query they called co-visited. • The similarity between two docs i and j is (visited(di) shows number of clicks on di and visited(di,dj) shows number of queries in which both are clicked):
Co-Visited Disadvantages • It only considers documents similarity (not query similarity) • As users clicks on top 10 pages, click data are sparse (1.5 queries for each page) • So similarity is not precise
Iterative Method (IM) • O(q): set of clicked page for q • Oi(q): the ith clicked page for q • I(d): set of queries in which it is clicked on d • Ii(d): The ith query in which it is clicked on d
Experimental Results • Experimental results on a real large query click-through log, i.e. MSN query log data, indicate that the proposed algorithm relatively outperforms • the baseline search system by 157%, • naïve query log mining by 17% and • co-visited algorithm by 17% • on top 20 precision respectively.