Optimizing search engines using clickthrough data
Download
1 / 40

Optimizing search engines using clickthrough data - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ ΤΟΜΕΑΣ ΤΕΧΝΟΛΟΓΙΑΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ ΥΠΟΛΟΓΙΣΤΩΝ. Optimizing search engines using clickthrough data. Problem. Optimization of web search results ranking Ranking algorithms mainly based on similarity

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Optimizing search engines using clickthrough data' - yair


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Optimizing search engines using clickthrough data

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ

ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ

ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

ΤΟΜΕΑΣ ΤΕΧΝΟΛΟΓΙΑΣ ΠΛΗΡΟΦΟΡΙΚΗΣ ΚΑΙ ΥΠΟΛΟΓΙΣΤΩΝ

Optimizing search engines using clickthrough data


Problem
Problem

  • Optimization of web search results ranking

    • Ranking algorithms mainly based on similarity

      • Similarity between query keywords and page text keywords

      • Similarity between pages (Page Rank)

    • No consideration for user personal preferences

Digital libraries

High rank

Vacation

Low rank


Problem1
Problem

  • E.g.: query “delos”

    • Digital libraries

    • Archaeology

    • Vacation

  • Room for improvement by incorporating user behaviour data: user feedback

    • Use of previous implicit search information to enhance result ranking

Digital libraries

High rank

Vacation

Low rank


Types of user feedback
Types of user feedback

  • Explicit feedback

    • User explicitly judges relevance of results with the query

      • Costly in time and resources

      • Costly for user  limited effectiveness

      • Direct evaluation of results

  • Implicit feedback

    • Extracted from log files

      • Large amount of information

      • Indirect evaluation of results through click behaviour


Implicit feedback categories
Implicit feedback (Categories)

  • Clicked results

    • Absolute relevance:

      • Clicked result -> Relevant

        • Risky: poor user behaviour quality

      • Percentage of result clicks for a query

      • Frequently clicked groups of results

      • Links followed from result page

    • Relative relevance:

      • Clicked result -> More relevant than non-clicked

        • More reliable

  • Time

    • Between clicks

      • E.g., fast clicks  maybe bad results

    • Spent on a result page

      • E.g., a lot of time  relevant page

    • First click, scroll

      • E.g., maybe confusing results


Implicit feedback categories1
Implicit feedback (Categories)

  • Query chains: sequences of reformulated queries to improve results of initial search

    • Detection:

      • Query similarity

      • Result sets similarity

      • Time

    • Connection between results of different queries:

      • Enhancement of a bad query with another from the query chain

      • Comparison of result relevance between different query results

  • Scrolling

    • Time (quality of results)

    • Scrolling behaviour (quality of results)

    • Percentage of page (results viewed)

  • Other features

    • Save, print, copy etc of result page  maybe relevant

    • Exit type (e.g. close window  poor results)


Joachims approach
Joachims approach

  • Clickthrough data

    • Relative relevance

    • Indicated by user behaviour study

  • Method

    • Training of svm functions

    • Training input: inequations of query result rankings

    • Trained function: weight vector for features examined

    • Use of trained vector to give weights to examined features

  • Experiments

    • Comparison of method with existing search engines


Clickthrough data
Clickthrough data

  • Form

    • Triplets (q,r,c) = (query,ranked results,links clicked)

    • …in the log file of a proxy server

  • Relative relevance (regarding a specific query)

    • dk more relevant than dl if

      • dk clicked and

      • dl not clicked and

      • dk lower initial ranking than dl

d5 more relevant than d4

d5 more relevant than d2

d3 more relevant than d2


Clickthrough data studies
Clickthrough data (Studies)

  • Why not absolute relevance?

    • User behaviour influenced by initial ranking

    • Study: Rank and viewership

      • Percentage of queries where a user viewed the search result presented at a particular rank

    • Conclusion: Most times users view only the few first results

    • Study: Rank and viewership

      • Percentage of queries where a user clicked the result presented at a particular rank, both in normal and swapped conditions

    • Conclusion: Users tend to click on higher ranked results irrespective of content


Method system train input
Method (System train input)

d5 more relevant than d4

d5 more relevant than d2

d3 more relevant than d2

  • Data extracted from log

    • Relevance inequalities:

      • dk <rq dl

      • dkmore relevant than dl for query q

    • Construct relevance inequalites for every query q and every result pair (dk , dl) for q

  • For each link di (result of query q):

    • construct a feature vector Φ(q,di)


Method system train input1
Method (System train input)

  • Feature vector:

    • Describes the quality of the match between a document di and a query q

    • Φ(q,di) = [rank_X, top1_X, top10_X, …..]


Method system train input2
Method (System train input)

  • Weight vector:

    • w = [w1, w2,…,wn]

    • Assigns weights to each of the features in Φ(q,di)

    • Sdi= w * Φ(q,di) assigns a score to document di for query q

    • w: unknown  must be trained by solving the system of relative relevance inequalities derived from clickthrough data


Method system training
Method (System training)

d5 more relevant than d4

d5 more relevant than d2

d3 more relevant than d2

  • Initial input transformation:

    • dk <rm dl => Sdk >rm Sdl =>

    • w * Φ(qm,dk) > w * Φ(qm,dl) =>

    • [w1, w2,…,wn] Φ(qm,dk) >[w1, w2,…,wn] Φ(qm,dl)

    • System ofrelevance inequalites for every query q and every result pair (dk , dl) for q

  • Object of training:

    • Knowing, for every clicked link di , its feature vector,

    • Find an optimal weight vector w which satisfies most of the inequalities above (optimal solution)

    • With vector w known, we can calculate a score for every link, whether clicked or not, hence a new ranking


Method system training1
Method (System training)

  • Intuitively

    • Example: 2 dimension feature vector

      • X-axis: similarity between query terms and link text

      • Y-axis: time spent on link page

    • Looking for optimal weight vector w

    • If training input: r1 < r2 < r3 < r4

      • Optimal solution w1: text similarity more important for ranking

    • If training input: r2 < r3 < r1 < r4

      • Optimal solution w2: time more important for ranking

    • w optimized by maximizing δ

Link2

Link3

Link 4

Link 1


Experiments
Experiments

  • Based on a meta-search engine

    • Combination of results from different search engines

    • “Fair” presentation of results:

      • For a meta-meta-search engine combining 2 engines:

      • Top z results of meta-search contain

      • x results from engine A

      • y results from engine b

      • x + y = z

      • |x – y| <= 1


Experiments1
Experiments

  • Meta-search engine

  • Assumptions

    • Users click more often on more relevant links

    • Preference for one of the search engines not influenced by ranking


Experiments results
Experiments (Results)

  • Conditions

    • 20 users (university students)

    • 260 training queries collected

    • 20 days of training

    • 10 days of evaluation

  • Comparison with

    • Google

    • MSNSearch

    • Toprank (meta-search for Google, MSNSearch, Altavista, Excite, Hotbot)

  • Results


Experiments results1
Experiments (Results)

  • Weight vector


Open issues
Open issues

  • Trade-off between amount of training data and homogeneity

  • Clustering algorithms to find homogenous groups of users

  • Adaptation to the properties of a particular document collection

  • Incremental online learning/feedback algorithm

  • Protection from spamming


Joachims approach ii
Joachims approach II

  • Clickthrough data

    • Addition of new relative relevance criterions

    • Addition of query chains

  • Method

    • Modified feature vector

      • Rank features

      • Term/document features

    • Constraints to the svm optimization problem

      • To avoid trivial/incorrect solutions


Query chains
Query chains

  • Sequences of reformulated queries

    • Poor results for first query

      • Too many/few terms

      • Not representative terms

        • q1: “special collections”  q2: “rare books”

      • Incorrect spelling

        • q1: “Lexis Nexus”  q2: “Lexis Nexis”

    • Execution of new query

      • Addition/deletion of query terms to get better results

    • In every query of the sequence, searching for the same thing

  • Ways of detection

    • Term similarity between queries

    • Percentage of common results

    • Time between queries


Query chains detection method
Query chains detection method

  • 1st strategy

    • Data recorded from log for 1285 queries:

      • query, date, IP address, results returned, number of clicks on the results, session id uniquely assigned

    • Manual grouping of queries in query chains

    • Division of data set into training and testing set

    • Training

      • Feature vector for every pair of query

      • Training of a weight vector indicating which features are more important for a query pair to be a query chain

    • Testing

      • Recognition of query chains in testing set

      • Comparison with manual grouping

      • 94.3% accuracy

  • 2nd strategy

    • Assumption: Query chain if:

      • Queries from the same IP

      • Time between 2 queries < 30 min

      • 91.6% accuracy

    • Adoption of (simpler) 2nd strategy


Clickthrough data studies1
Clickthrough data (studies)

  • Conclusion: Most times users view the first 2 results

  • Conclusion: Most times users view the next result after the last clicked


Relevance criteria query
Relevance criteria (query)

d5 more relevant than d4

d5 more relevant than d2

d3 more relevant than d2

  • Click >q Skip Above

    • dk more relevant than dl if

      • dk clicked and

      • dl not clicked and

      • dk lower initial ranking than dl

  • Click First >q No-Click Second

    • d1 more relevant than d2 if

      • d1 clicked

      • d2 not clicked

d1 more relevant than d2


Relevance criteria query chain
Relevance criteria (query chain)

FOR QUERY 1 (q’)

d8 more relevant than d7

d8 more relevant than d6

  • Click >q1 Skip Above

    • dk more relevant than dl if

      • dk clicked and

      • dl not clicked and

      • dk lower initial ranking than dl

  • Click First >q1 No-Click Second

    • d1 more relevant than d2 if

      • d1 clicked

      • d2 not clicked

FOR QUERY 1 (q’)

d5 more relevant than d6

QUERY 1

QUERY 2


Relevance criteria query chain ii
Relevance criteria (query chain II)

FOR QUERY 1 (q’)

d6 more relevant than d1

d6 more relevant than d2

d6 more relevant than d4

  • Click >q1 Skip Earlier Query

  • Click First >q1 Top 2 Earlier Query

FOR QUERY 1 (q’)

d6 more relevant than d1

d6 more relevant than d2

QUERY 1

QUERY 2


Relevance criteria experiment
Relevance criteria (Experiment)

  • Sequences of reformulated queries

    • Search on 16 subjects

    • Relevance inequalities produced by previous criteria

    • Comparison with explicit relevance judgements on query chains


Method svm training
Method (SVM training)

  • Feature vector consists of

    • Rank features φfirank(d, q)

    • Term/document features φterms(d, q)

  • Rank features

    • One φfirank(d, q) for every retrieval function fi examined

    • Every φfirank(d, q) consists of 28 features

      • For ranks 1,2,3,…10,15,20,…100 in the initial ranking

      • If rank of d in initial ranking ≤ 1,2,3,…10,15,20,…100 set 1 else set 0

    • Allow us to learn weights for and combine different retrieval functions


Method svm training1
Method (SVM training)

  • Term/document features

    • φterms(d, q) consists of N*M features

      • N number of terms

      • M number of documents

    • Set 1 if di = d and tj ε q

    • Trains weight vector to learn assosiations between terms and documents


Method constraints
Method (Constraints)

  • Problem

    • Most relevance criteria suggest that a lower ranked link (in the initial ranking) is better than a higher ranked link

    • Trivial solution: reverse the initial ranking by assigning negative values to weight vector

  • Solution

    • Each w*φfirank(d, q) ≥ a minimun positive value


Experiments2
Experiments

  • Based on a meta-search engine

    • Initial retrieval function from Nutch

  • Training

    • 9949 queries

    • 7429 clicks

    • Pqc: Total of 120134 relative relevance inequalities

    • Pnc: A set of Pqc generated without use of query chains

  • Results: (rel0 initial ranking)


Experiments3
Experiments

  • Results

    • Trained weights for term/document features


Open issues1
Open issues

  • Tolerance of noise and malicious clicks

  • Different weights in each query of a query chain

  • Personalized ranking functions

  • Improvement of performance


Related work fox et al 2005
Related work (Fox et al. 2005)

  • Evaluation of implicit measures

  • Use of explicit ratings of user satisfaction

  • Method

    • Bayesian modeling and decision trees

    • Gene analysis

  • Results: Importance of combination of

    • Clickthrough

    • Time

    • Exit type

  • Features used:


  • Related work agichtein brill dumais 2006
    Related work (Agichtein, Brill, Dumais 2006)

    • Incorporation of implicit feedback features directly into the initial ranking algorithm

      • BM25F

      • RankNet

    • Features used:


    Related work
    Related work

    • Query/links clustering based on features extracted from implicit feedback

    • Score based on interrelations between queries and web pages


    Possible extensions
    Possible extensions

    • Utilization of time feedback

      • As relative relevance criterion

      • As a feature

    • Other types of feedback

      • Scrolling

      • Exit type

    • Combination with absolute relevance clickthrough feedback

      • Percentage of result clicks for a query

      • Links followed from result page

    • Query chains

      • Improvement of detection method

    • Link association rules

      • For frequently clicked groups of results

    • Query/links clustering

    • Constant training of ranking functions


    References
    References

    • Search Engines that Learn from Implicit Feedback

      • Joachims, Radlinski

    • Optimizing Search Engines Using Clickthrough Data

      • Joachims

    • Query Chains: Learning to Rank from Implicit Feedback

      • Radlinski, Joachims

    • Evaluating implicit measures to improve web search

      • Fox et al.

    • Implicit Feedback for Inferring User Preference A Bibliography

      • Kelly, Teevan

    • Improving Web Search Ranking by Incorporating UserBehavior

      • Agichtein, Brill, Dumais

    • Optimizing web search using web click-through data

      • Xue, Zeng, Chen, Yu, Ma, Xi, Fan


    Clickthrough data studies2
    Clickthrough data (Studies)

    • Why not absolute relevance?

      • Study: Rank and viewership

        • Percentage of queries where a user clicked the result presented at a particular rank, both in normal and swapped conditions

      • Conclusion: Users tend to click on higher ranked results irrespective of content


    Types of user feedback1
    Types of user feedback

    • Explicit feedback

      • User explicitly judges relevance of results with the query

        • Costly in time and resources

        • Costly for user  limited effectiveness

        • Direct evaluation of results

    • Implicit feedback

      • Extracted from log files

        • Large amount of information

        • Real user behaviour (not expert judgement)

        • Indirect evaluation of results through click behaviour


    ad