Query Recommendation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 101

Query Recommendation Xiaofei Zhu ([email protected]) L3S Research Center, Leibniz Universität Hannover PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on
  • Presentation posted in: General

Query Recommendation Xiaofei Zhu ([email protected]) L3S Research Center, Leibniz Universität Hannover. Introduction. ?. Short (1-2 words). Ambiguous (e.g., Java). Lack of domain knowledge. original query. Query Recommendation.

Download Presentation

Query Recommendation Xiaofei Zhu ([email protected]) L3S Research Center, Leibniz Universität Hannover

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Query recommendation xiaofei zhu zhu l3s de l3s research center leibniz universit t hannover

Query Recommendation

Xiaofei Zhu ([email protected])

L3S Research Center, Leibniz Universität Hannover


Introduction

Introduction

?

Short

(1-2 words)

Ambiguous

(e.g., Java)

Lack of domain knowledge


Query recommendation

original query

Query Recommendation

  • It aims to provide users alternative queries, which can represent their information needs more clearly in order to return better search results .

recommendation


Query recommendation1

Query Recommendation

  • How to do query recommendation?

    • Find alternative queries with similar search intent.

    • Differ with Document , Image?


Query log

Query log

  • Query log.

    • A query log records information about the search actions of the users of a search engine.

  • A typical query log is a set of records <qi,ui,ti,Vi,Ci>

    • qi – the submitted query

    • ui– an anonymized identifier for the user who submitted the query

    • ti– timestamp, the time at which the query was submitted for search.

    • Vi – the set of returned results to the query

    • Ci - the set of documents clicked by the user.


Example of query log aol 2006

Example of query log (AOL, 2006)

AnonIDQuery QueryTimeItemRankClickURL

7051923motorola text messages 2006-03-24 19:35:311http://www.telusmobility.com

7051923motorola text messages 2006-03-24 19:35:314http://support.t-mobile.com

7051923motorola t730 text messages 2006-03-24 19:38:402http://www.phonescoop.com

7051923motorola t730 text messages 2006-03-24 19:38:403http://www.1800mobiles.com

7051923motorola t730 text messages 2006-03-24 19:38:405http://cgi.ebay.com

7051923motorola t730 text messages 2006-03-24 19:38:407http://phonearena.com

7051923spike muscle car 2006-03-25 12:57:432http://www.classicauto-sales.com

7051923spike muscle car 2006-03-25 12:57:435http://sev.prnewswire.com

7051923spike muscle car 2006-03-25 13:00:22

7051923usps 2006-03-25 14:23:211http://www.usps.com

7051923vc2 auctions 2006-03-25 14:31:41

7051923auctions for 1 2006-03-25 14:33:47


Microsoft 2006 rfp dataset

Microsoft 2006 RFP dataset

TimeQueryQueryIDSessionIDResultCount

2006-05-01 00:00:01defination Gravitational46c13f0705f6436b19ab975e898d46d111

2006-05-01 00:00:01kimclementa3d2cae45e2b4c5b1b748d1afa9b482810

2006-05-01 00:00:01scientology crazy beliefs418324ef33d14ed210f477402db84c9a10

2006-05-01 00:00:01www.joj.sk489238bdf8834d6816271eb6bf174c5c9

2006-05-01 00:00:04www.selectcareers.comf92efd8044904ac4193f9f8442d44c480

2006-05-01 00:00:08What is May Day?37afe7af832649d221f6a0dfea4348ac14

2006-05-01 00:00:10vikings draft choices suckb0519e4528d84b44196b0bb2f1d643f210

2006-05-01 00:00:10wwwcrownawards.com9eda4716dfb045e204e3a26067a847480

2006-05-01 00:00:15Australian minersba6d190cc4cd4fd3136fd5e571d2488610

QueryIDQueryTimeURL Position

0000003a718649f2schwab2006-05-11 08:07:35http://www.schwab.com/ 1

0000006d43b549c1us geography2006-05-04 14:23:00http://www.enchantedlearning.com/usa/ 3

0000006d43b549c1us geography2006-05-04 14:23:03http://www.sheppardsoftware.comState15s_500.html 4

0000016aa52e4fbcwwf2006-05-21 09:25:34http://www.panda.org/ 2

000002aa6e27443fbiggercity2006-05-07 13:30:45http://www.biggercity.com/chat/ 1

1000005aac1f6423fstudios2006-05-09 14:21:29http://www.shawneestudios.com/contact_us.php 1

1000008d8afaa459awww.nfl.com2006-05-28 18:22:39http://www.nfl.com/teams/NYJ.html 7

7000009c2848e4a68north hills school district2006-05-04 12:29:12http://www.nhsd.net/ 1


How to use query log for query recommendation

How to use query log for query recommendation?

Click-through data

If user clicks a document after she issues a query, then the clicked document is more or less relevant to the submitted query, thus the query can be represented by it clicked documents.

  • Click-through data records the clicked documents after user submit a query to the search engine.

Basic Assumption

[Mei, CIKM’08]

[Beeferman, KDD’00]

Query Feature

Representation

If two queries co-clicked many common documents, then they have similar search intent.

Query-URL Graph


How to use query log for query recommendation1

How to use query log for query recommendation?

Query Session

If two queries frequently co-occur in the same sessions, then they are relevant to each other.

  • Query session: a single user submits a sequence of related queries in a time interval for a specific search task.

[Foneseca, LA-WEB’03]

Basic Assumption

[Zhang, WWW’06]

[Boldi, CIKM’08, WSCD’09]

Association Rules

Continuous submitted queries in short time interval by the same user share similar search intent.

Query Graph


High relevant query recommendation

High Relevant Query Recommendation

  • Query Suggestion Using Hitting Time (CIKM’08)

    • Click-through Data

    • Query-URL Bipartite Graph

  • Query Suggestions Using Query-Flow Graphs (WSCD’09)

    • Session Data

    • Query-Flow Graph


High relevant query recommendation1

High Relevant Query Recommendation

  • Query Suggestion Using Hitting Time (CIKM’08)

    • Click-through Data

    • Query-URL Bipartite Graph

  • Query Suggestions Using Query-Flow Graphs (WSCD’09)

    • Session Data

    • Query-Flow Graph


Query suggestion using hitting time cikm 08

Query Suggestion Using Hitting Time (CIKM’08)

5

  • Query-URL Bipartite Graph

    • Edges between V1 and V2

    • No edge inside V1 or V2

    • Edges are weighted

    • e.g., V1 = query; V2 = Url

  • Transition Probabilities

A

4

V1

4

V2

7

7

1

i

3

j


Query suggestion using hitting time cikm 081

2

5

1

3

4

Query Suggestion Using Hitting Time (CIKM’08)

  • Random Walk and Hitting Time

    • Hitting time. How long does it take to hit node a in a random walk starting at node b ?

  • Start at 1


Query suggestion using hitting time cikm 082

2

5

1

3

4

Query Suggestion Using Hitting Time (CIKM’08)

  • Random Walk and Hitting Time

    • Hitting time. How long does it take to hit node a in a random walk starting at node b ?

  • Start at 1

  • Pick a neighbor i based on the transition probability.

  • Move to i

t=1


Query suggestion using hitting time cikm 083

2

5

1

3

4

Query Suggestion Using Hitting Time (CIKM’08)

  • Random Walk and Hitting Time

    • Hitting time. How long does it take to hit node a in a random walk starting at node b ?

  • Start at 1

  • Pick a neighbor i uniformly at random

  • Move to i

  • Continue

t=2


Query suggestion using hitting time cikm 084

2

5

1

3

4

Query Suggestion Using Hitting Time (CIKM’08)

  • Random Walk and Hitting Time

    • Hitting time. How long does it take to hit node a in a random walk starting at node b ?

  • Start at 1

  • Pick a neighbor i uniformly at random

  • Move to i

  • Continue

If the random walk hits a node quickly, then its close to the start node!

Hitting time!

t=2


Hitting time from i to a

Hitting time from ito A

Graph G

i

A


Hitting time from i to a1

Hitting time from ito A

Graph G

j

i

A

k


Hitting time from i to a2

Hitting time from ito A

Graph G

j

i

A

k


Generate query suggestion

Generate Query Suggestion

  • Construct a (kNN) subgraph from the query log data (of a predefined number of queries/urls)

  • Compute transition probabilities p(i  j)

  • Compute hitting time hiA

  • Rank candidate queries using hiA

Query

Url

300

T

www.aa.com

aa

15

www.theaa.com/travelwatch/planner_main.jsp

mexiana

american airline

en.wikipedia.org/wiki/Mexicana


Result query suggestion

Result: Query Suggestion

Query = ‘aa’


High relevant query recommendation2

High Relevant Query Recommendation

  • Query Suggestion Using Hitting Time (CIKM’08)

    • Click-through Data

    • Query-URL Bipartite Graph

  • Query Suggestions Using Query-Flow Graphs (WSCD’09)

    • Session Data

    • Query-Flow Graph


Query suggestions using query flow graphs wscd 09

Query Suggestions Using Query-Flow Graphs (WSCD’09)

  • Session Data

    • Definition: the sequence of queries of one particular user within a specific time limit .


Query graph

Query Graph

two consecutive queries

queries that are not neighbors in the same session

  • This model works by accumulating many query sessions and adding up the similarity values for many same query pairs

Z. Zhang and O. Nasraoui. Mining search engine query logs for query recommendation. In WWW, pages 1039–1040, 2006.


Query flow graph

Query-Flow Graph

P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The query-flow graph: model and applications”. CIKM 2008.


Build query flow graph

Build Query-flow Graph

  • The key aspect of the construction of the query-flow graph is to define the weighting function w.

represent the number of times the transition was observed in the same search session.


Query recommendation2

Query Recommendation

  • The query recommendation methods are based on the probability of being at a certain node after performing a random walk over a query graph.

  • Random Walk with restart

    • a random surfer starts at the initial query q

    • at each step

      • α , follows one of the outlinks from the current node

      • 1 - α , jumps back to q


Query recommendation3

Query Recommendation

  • The query recommendation methods are based on the probability of being at a certain node after performing a random walk over a query graph.

  • Random Walk with restart

M - the transition matrix of a Markov chain

P -row-normalized weight matrix of the query flow graph

ej -the vector j-th entry is 1,others are zeroes


Random walks

Random walks

  • Random walks on graphs correspond to Markov Chains

    • The set of states S is the set of nodes of the graph G

    • The transition probability matrix is the probability that we follow an edge from one node to another


Definitions

1

1

1

1/2

1

1

1

1/2

Definitions

Adjacency matrix A

Transition matrix P


Random walk

1

1/2

1

1/2

random walk

t=0


Random walk1

1

1

1/2

1/2

1

1

1/2

1/2

random walk

t=0

t=1


Random walk2

1

1

1

1/2

1/2

1/2

1

1

1

1/2

1/2

1/2

random walk

t=0

t=1

t=2


Random walk3

1

1

1

1

1/2

1/2

1/2

1/2

1

1

1

1

1/2

1/2

1/2

1/2

random walk

t=0

t=1

t=2

t=3


Probability distributions

Probability Distributions

xt(i) = probability that the surfer is on node i at time t

xt+1(i) = ∑j(Probability of being at node j)*Pr(j->i)

=∑jxt(j)*P(j,i)

xt+1 = xtP= xt-1*P*P= xt-2*P*P*P = …=x0 Pt

What happens when the surfer keeps walking for a long time?


What happens when the surfer keeps walking for a long time

What happens when the surfer keeps walking for a long time?

  • Stationary Distribution

    • Intuitively

      • the stationary distribution at a node is related to the amount of time a random walker spends visiting that node.

    • Mathematically

      • Remember that we can write the probability distribution at a node as

        xt+1 = xtP.

      • For the stationary distribution v0 we have

        v0 = v0 P

v0 is the left eigenvector of the transition matrix P !


Interesting questions

Interesting questions

  • Does a stationary distribution always exist? Is it unique?

    • Yes, if the graph is “well-behaved”, i.e., P is ergodic

  • P is ergodic if :

    • irreducible

    • aperiodic

Irreducible: There is a path from every node to every other node.

Aperiodic: State i is periodic with period k if all paths from i to i have length that is multiple of k. Otherwise, it’s aperiodic.

Irreducible

Not irreducible

Aperiodic

Periodicity is 3


Query recommendation xiaofei zhu zhu l3s de l3s research center leibniz universit t hannover

  • If a markov chain P is irreducible and aperiodic then the largest eigenvalue of the transition matrix will be equal to 1 and all the other eigenvalues will be strictly less than 1.

    • Let the eigenvalues of P be {σi| i=0:n-1} in non-increasing order of σi .

    • σ0 = 1 >σ1 > σ2 >=……>= σn


Result query suggestion q apple and q jeep

Result: Query Suggestion (q =“apple” and q =“jeep” )


Why diversity query recommendation

Why Diversity Query Recommendation

相关性

  • Actually, in query recommendation, only providing the “relevant” recommendations is far away from satisfying users’ information needs.

apple ipad 3

apple tree

apple iphone 4s

apple seed

apple computer

Original Query:Apple

The queries we recommend should cover multiple potential search intents of users and minimize the risk that users will not be satisfied.


High diversity query recommendation

High Diversity Query Recommendation

  • Diversifying Query Suggestion Results [Hao Ma, AAAI’10]

    • Query-URL graph

    • Hitting time

  • A Unified Framework for Recommending Diverse and Relevant Queries[Xiaofei Zhu, WWW’11]

    • Manifold

    • Manifold Ranking with Stop Points


High diversity query recommendation1

High Diversity Query Recommendation

  • Diversifying Query Suggestion Results [H. Ma, AAAI’10]

    • Query-URL graph

    • Hitting time

  • A Unified Framework for Recommending Diverse and Relevant Queries[X.F. Zhu, WWW’11]

    • Manifold

    • Manifold Ranking with Stop Points


Graph construction

Graph Construction

Figure 1: Example for Bipartite Graph

(extracted from the clickthrough data)


Determining the first suggested query

Determining the First Suggested Query

  • Initial Transition Probability

--

the number of click frequency between node i and node j

--

normalization term, is the total number of times that the

query node i has been issued in the dataset.

--

initial transition probability from node i to node j


Determining the first suggested query1

Determining the First Suggested Query

  • Random Jump

    • In addition to the transition probability, there are random relations among different queries.

    • It adds a uniform random relation among different queries

--

the probability of taking a “random jump”, i.e., transit among different queries

--

Without any prior knowledge, it sets , where d is a uniform stochastic distribution vector


Determining the first suggested query2

Determining the First Suggested Query

  • Random Walk on the Query-URL graph

    • With the transition probabilistic matrix P defined, it then can perform the random walk on the query-URL graph.

    • the probability of transition from node i to node j after a t step random walk as:

Explain:

1) The random walk sums the probabilities of all paths of length t between the two nodes. if there are many paths the transition probability will be high

2) The larger the transition probability Pt(i, j) is, the more the node j is similar to the node i.


Determining the first suggested query3

Determining the First Suggested Query

  • the largest transition probability from node q will be recommendedas the first suggested query

    • performing a t-step random walk

  • parameter t

    • determines the resolution of the Markov random walk

      • Large t: the random walk depend more on the graph structure

      • Small t: preserves information about the starting node


Ranking the rest queries

Ranking the Rest Queries

  • Employ the hitting time to rank and diversify the rest of the queries.

    • Hitting time

      • Let S be a subset of vertex set V, the expected hitting time h(i|S) of the random walk is the expected number of steps before node i is visiting the starting set S.

N(i) denotes the neighbors of node i


Ranking the rest queries1

Ranking the Rest Queries

  • Property

    • those nodes strongly connected to s1 will have many fewer visits by the random walk

    • nodes far away from s1 still allow the random walk to move among them and thus receive more visits

  • The second suggestion node

    • select the second suggestion node s2 ∈ Q with the largest expected hitting time to the subset S containing two nodes q and s1.


Result query suggestion1

Result: Query Suggestion


High diversity query recommendation2

High Diversity Query Recommendation

  • Diversifying Query Suggestion Results [Hao Ma, aaai’10]

    • Query-URL graph

    • Hitting time

  • A Unified Framework for Recommending Diverse and Relevant Queries[Xiaofei Zhu, WWW’11]

    • Manifold

    • Manifold Ranking with Stop Points


Query recommendation xiaofei zhu zhu l3s de l3s research center leibniz universit t hannover

Query Recommendation

Manifold ranking

Import stop points

A novel unified framework

Manifold ranking with stop points

relevance

diversity


Query recommendation xiaofei zhu zhu l3s de l3s research center leibniz universit t hannover

query1

query2

queryn

Affinity matrix W


Traditional manifold ranking process

Traditional manifold ranking process

Step 1:

Step 2:

Step 3:

W- affinity matrix, D – diagonal matrix


Manifold ranking with stop points

Manifold ranking with stop points


Query recommendation xiaofei zhu zhu l3s de l3s research center leibniz universit t hannover

(1)

(2)

(3)

(4)


Results query recommendation abc yamaha

Results: Query recommendation (‘abc’, ‘yamaha’)


Evaluation metrics

Evaluation Metrics

  • Automatic Evaluation

    • Open Directory Project(ODP) <-> Relevance

      • Given two queries q and q’

c(q): ‘Arts/Television/News’

c(q’): Arts/Television/Stations/North America /United States’

l(c, c’): their longest common prefix , e.g., ‘Arts/Television’

: the longest category of c and c’, e.g., 5


Evaluation metrics1

Evaluation Metrics

  • Automatic Evaluation

    • Open Directory Project(ODP) <-> Relevance

      • Given two queries q and q’

c(q): ‘Arts/Television/News’

c(q’): Arts/Television/Stations/North America /United States’


Evaluation metrics2

Evaluation Metrics

  • Automatic Evaluation

    • Commercial search engine (i.e., Google) <-> Diversity

      • Given two queries q and q’

o(q, q) is the number of overlapped URLs among the

top k search results of query q and q’.


Evaluation metrics3

Evaluation Metrics

  • Automatic Evaluation

    • Commercial search engine (i.e., Google) <-> Diversity

      • Given two queries q and q’


Evaluation metrics4

Evaluation Metrics

  • Automatic Evaluation

    • Open Directory Project(ODP) <-> Relevance

    • Commercial search engine (i.e., Google) <-> Diversity

  • Evaluation metrics

    • Q-measure

β - parameter to control the tradeoff between relevance and diversity


Experiments

Experiments

  • Average Q-measure of Query Recommendation over Different Recommendation Size under 5 Approaches.

Proposed Method


Experiments1

Experiments

Recommendation pool

  • Manual Evaluation

    • Recommendation pool

    • 3 human judges

    • Label tool

search results


Experiments2

Experiments

  • Evaluation Metrics

  • α-nDCG(α -normalized Discounted Cumulative Gain )

  • Intent-Coverage


Experiments3

Experiments

Table 2: Performance of recommendation results over a sample of queries under five different approaches.


Why high utility query recommendation

Why High Utility Query Recommendation

  • Focuses on recommending users relevant queries to their initial queries.

initial query

  • Common Query Terms

  • (Wen J. et al, WWW2001)

  • Same Clicked Documents

  • (Mei Q. et al, CIKM 2008)

  • Co-Occurring in Same Search Sessions

  • (Zhang Z.et al, WWW 2006)

Query Level

query 1

Only recommend relevant query is enough for find useful search results?

query 2

query 3


Why high utility query recommendation1

Why High Utility Query Recommendation

iphone sell time

‘iphone start sell’

Recommend High Utility Query

‘iphone initial release’


High utility query recommendation

High Utility Query Recommendation

  • More Than Relevance: High Utility Query Recommendation By Mining Users’ Search Behaviors[X.F. Zhu, CIKM’12]

    • Probabilistic Graphical Model (Query Utility Model)

  • Recommending High Utility Query via Session-Flow Graph [X.F. Zhu, ECIR’13]

    • Session-Flow Graph

    • Two-phase model based on absorbing random walk


High utility query recommendation1

High Utility Query Recommendation

  • More Than Relevance: High Utility Query Recommendation By Mining Users’ Search Behaviors[X.F. Zhu, CIKM’12]

    • Probabilistic Graphical Model (Query Utility Model)

  • Recommending High Utility Query via Session-Flow Graph [X.F. Zhu, ECIR’13]

    • Session-Flow Graph

    • Two-phase model based on absorbing random walk


A typical search session

A Typical Search Session

bad perceived utility

bad posterior utiltiy

red - relevant √ - attractiveness


Probabilistic graphical model

Probabilistic Graphical Model

Ri: whether there is a reformulation at position i

Ci:whether the user clicks on some of the search results of the reformulation at position i;

Ai:whether the user is attracted by the search results of the reformulaiton at position i;

Si:whether the user’s information needs have been satisfied at position i;


Parameter estimation

Parameter Estimation

  • Maximum Likelihood Estimation

Where


Parameter estimation1

Parameter Estimation

  • Log Likelihood Function


Parameter estimation2

Parameter Estimation

  • Maximize Log Likelihood Function

Lagrange multiplier

Regularization term


Parameter estimation3

Parameter Estimation

  • Optimization Condition:


Parameter estimation4

Parameter Estimation

  • Newton-Raphson


Experimental results

Experimental Results

  • Dataset

    • Our experiments are based on publicly available query logs, namely UFindIt log data. There are totally 40 search tasks represented by 40 test queries.


Experimental results1

Experimental Results

  • Metric

    • QRR (Query Relevant Ratio)

Measuring the probability that a user finds relevant results when she uses query q for her search task

  • MRD (Mean Relevant Document)

Measuring the average number of relevant results a user finds when she uses query q for her search task.


Experimental results2

Experimental Results

PTU

CT

QUM

CO

PCU

QF

ADJ

Query-Flow Graph (QF): query-flow graph based on collective search sessions, and perform a random walk on this graph for query recommendation [cikm'08].

Click-through Graph (CT): query-URL bipartite graph, employs the hitting time as a measure to select queries for recommendation [cikm'08].

Adjacency (ADJ): given a test query q, the top frequent queries in the same session adjacent to q are recommended to users[www'06].

Co-occurrence (CO): given a test query q, the top frequent queries co-occurred in the same session with q are selected as recommendations [wsdm'10].

Query Utility Model(QUM): the expected information gain users obtained from the search results of the query according to their original information needs, which is the product of the two component utilities.

Two component utilities (i.e., perceived utility and posterior utility) in the QUM method: Perceived Utility method (PCU) and Posterior Utility method (PTU).


Experiments4

Experiments

Impact of parameter μ to the performance of QUM


Limitation of qum method

Limitation of QUM method

  • Cannot make full use of the click-through information.

    • it only considers whether the search results of a reformulated query have some clicked documents or not, but does not take individually clicked document into consideration.

  • It is necessary to proposes a novel method to further capture these specific clicked documents for modeling query utility.


Framework of our approach

Framework of Our Approach

Two-phase model based on Absorbing Random Walk (TARW)

Session-Flow Graph

Query-Flow Graph

Document Nodes

Reformulation Behaviors

+

Click Behaviors

Random Walk

Absorbing States

Absorbting Random Walk


Session flow graph

Session Flow Graph

query session

q → q1→ q3

q → q3→ q4

q → q4

Query-Flow Graph: Boldi et al. (CIKM 2008)


Session flow graph1

Session Flow Graph

query session

q → q1:u1:u2→ q3:u3

q → q3→ q4:u4:u5

q → q4:u6

Session Flow Graph: expands query-flow graph (document nodes + failure nodes)


Session flow graph2

Session Flow Graph

  • Definition:

Adjacency Matrix

Nodes

Edges


Two phase model based on absorbing random walk tarw

Two-phase model based on absorbing random walk (TARW)

Two-phase Model Based on Absorbing Random Walk

Forward Utility Propagation

Backward Utility Propagation

> Utility score was transferred from the original query node to reformulation node, and at last absorbed by document node and failure node.

> Utility score was inversely transferred from document nodes to reformulation node.

Recommendation: queries with the highest utilities.


Forward utility propagation

Forward Utility Propagation

  • Assign transition probability to different types of nodes (reformulation, document, failure):

α2

α3

Reformulation Node

—— α1

Document Node

—— α2

α1

Failure Node

—— α3

α1+α2+α3=1


Parameter setting

Parameter Setting:

Previous work (Sadikov, WWW2010): share the same transition probability setting (a1,a2,a3) to different types of nodes.

—— Reformulation node

α1

—— document node

α2

—— failure node

α3

  • Our work: assign transition probability based on characteristics of each candidate query.

prior transition probability

observed transition probability

posterior transition probability


Transition probability

Transition Probability

Reformulation Nodes

Document Nodes:

Failure Node:


Computing the distribution

Computing the Distribution

  • In the forward utility propagation, the corresponding transition matrix is:

PQ : n  n transition matrix on query nodes

PD : n  m matrix of transition from query node to document node

PS : n 1 matrix of transition from query to failure node.

ID,IS: identity matrix, denoting document nodes and failure nodes are absorbing states.

reducible (no station distribution)


Computing the distribution1

Computing the Distribution

  • Computing the absorbing distribution by an iterative way:

Pt[i, j] represents the probability of node i to node j after t step walk.

we only have to compute the probability from query to document.

O(tn3+n2m)

in recommendation scenario, only the probability from original query to documents are needed, i.e. computing the matrix row of original query.

O(tn2+nm)


Backward utility propagation

Backward Utility Propagation


Experimental results3

Experimental Results

  • Dataset

    • Our experiments are based on publicly available query logs, namely UFindIt log data. There are totally 40 search tasks represented by 40 test queries.


Experimental results4

Experimental Results

  • Metric

    • QRR (Query Relevant Ratio)

Measuring the probability that a user finds relevant results when she uses query q for her search task

  • MRD (Mean Relevant Document)

Measuring the average number of relevant results a user finds when she uses query q for her search task.


Experimental results5

Experimental Results

  • Overall Evaluation Results

TARW

TARW method significantly better than all the baseline recommendation methods

(p-value <= 0.05))


Evaluation of document utility

Evaluation of Document Utility

  • Baseline methods:

    • Document Frequency Based Method (DF)

      • the click frequency of a document reflects users preference for that document when they search with the original query

    • Session Document Frequency Based Method (SDF)

      • clicked documents within the same search session convey the similar search intent

    • Markov-model Based Method (MM):

      • Based on the learned document distribution for the original query by a Markov-model based method


Evaluation of document utility1

Evaluation of Document Utility

  • Metrics:

    • Precision at position k([email protected])

    • Normalized Discounted Cumulative Gain(NDCG)

    • Mean Average Precision (MAP)


Evaluation of document utility2

Evaluation of Document Utility

TARW improvements over MM by:

using an adaptive transition probability setting to different types of nodes

modeling users' behaviors of giving up their search tasks by introducing the failure nodes.


Summary

Summary

  • query recommendation techniques

    • High Relevant Query Recommendation

    • High Diversity Query Recommendation

    • High Utility Query Recommendation


  • Login