query log analysis
Download
Skip this Video
Download Presentation
Query Log Analysis

Loading in 2 Seconds...

play fullscreen
1 / 23

Query Log Analysis - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on

Query Log Analysis. Naama Kraus. Slides are based on the papers: Andrei Broder , A taxonomy of web search Ricardo Baeza -Yates , Graphs from Search Engine Queries Hassan, Jones, Klinkner , Beyond DCG: User Behavior as a Predictor of a Successful Search. A Taxonomy of Web Searches.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Query Log Analysis' - tamah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
query log analysis

Query Log Analysis

Naama Kraus

Slides are based on the papers:

Andrei Broder, A taxonomy of web search

Ricardo Baeza-Yates, Graphs from Search Engine Queries

Hassan, Jones, Klinkner,

Beyond DCG: User Behavior as a Predictor of a Successful Search

a taxonomy of web searches
A Taxonomy of Web Searches
  • [Andrei Broder] classifies web queries according to their intent:
    • Navigational - reach a particular site
      • Example: cnn , Oracle
    • Informational - acquire some information
      • Example: the history of haifa , information retrieval
    • Transactional - perform some web-mediated activity. Further interaction is expected.
      • E.g. shopping, downloading files, accessing databases
      • Example: new balance shoes , Israel flights
query log
Query Log
  • Search Engine Query Log records users’ searches
  • A typical record contains
    • Anonymous User id u
    • Search query q
    • Returned documents V
    • Clicked documents C
    • Timestamp t
query log example
Query Log Example

1234 , apple, 12:04

1234, apple ipod, 12:05

1234ynet, 12:13

145google, 12:20

145eBay, 12:56

32ynet news, 12:59

145Solaris systen, 13:01

145Solaris system, 13:05

session
Session
  • A sequence of searches of one particular user u within a specific time limit
  • S = < <u, q1 ,t1> , …, <u, qk, tk> >
  • t1 < …< tk (=> ordered sequence)
  • ti+1 – ti < t0 (=> t0 is a timeout threshold)
  • Note1 may contain non related queries
  • Note2 identifying sessions is easy
session example
Session Example
  • 1234 , apple, 12:04
  • 1234, apple ipod, 12:05
  • 1234 ynet, 12:13
  • 1234 apple store, 12:20
  • 1234 cnn news, 12:56
  • 1234 cnn webcast, 12:59
  • 1234 apple apps, 13:01
  • Session 1
  • Session 2
  • Timeout threshold = 30 minutes
query chain
Query Chain
  • A sequence of queries with a similar information need of a particular user
    • Also known as mission or logical session
  • Example:
      • haifa maps
      • haifa travel
      • attractions in haifa
  • Note1 contains related queries only
  • Note2 identifying chains is difficult
query chain example
Query Chain Example
  • 1234 , apple, 12:04
  • 1234, apple ipod, 12:05
  • 1234 ynet, 12:13
  • 1234 apple store, 12:20
  • 1234 cnn news, 12:56
  • 1234 cnn webcast, 12:59
  • 1234 apple apps, 13:01
  • chain1
  • chain2
click graph
Click Graph

Bipartite graph

Nodes in left side are uniquequeries

Nodes in right side are uniqueURLs

An edge between q,u if there exists

in the log a click on u for query q

Edges may be weighted according to

number of clicks

This graph is used by numerous

Algorithm for various purposes

E.g., query and URL clustering,

query recommendations …

query graphs
Query Graphs

Each unique query is

a node in the graph

Next slides –

Connection types

between queries

(edges)

Proposed by

[Ricardo Baeza-Yates]

query graphs word graph
Query Graphs – Word Graph

paris hotels

An edge between nodes

exists, if queries share

common terms

Possible node weight –

Number of occurrences

in the log

Possible edge weight -

Jaccard distance

london attractions

paris attractions

cheap paris hotels

query graphs session graph
Query Graphs – Session Graph

paris hotels

Node’s q weight is the number of

sessions that contain the

query q (usually equals

number of query occurrences)

A directed edge from q1 to q2

if q1 occurred before q2

in the same session

Edge’s weight is number

of such occurrences

paris attractions

london attractions

cheap paris hotels

query graphs url cover graph
Query Graphs – URL Cover Graph

paris hotels

An edge exists between q1

and q2, if they share clicked

URLs

Node weight = #occurrences

Edge’s weight is the number of

common clicks

paris attractions

london attractions

cheap paris hotels

query graph url link graph
Query Graph – URL Link Graph

An edge exists between q1

and q2, if there is at least one link between a url click of q1 and a url click of q2

Node weight =#occurrences

Edge’s weight is the number

of such common links

paris hotels

paris attractions

london attractions

cheap paris hotels

query graph url terms graph
Query Graph –URL Terms Graph

Represent a clicked URL by

a set of terms

(whole page, snippet, anchors,

title, a combination …)

Weight terms by their frequencies

Node weight =#occurrences

There’s an edge between q1 and

q2 if there are at least m common

terms in at least one clicked

url of q1 and one clicked url of q2

Edge weight is sum of frequencies

of common terms

paris hotels

paris attractions

london attractions

cheap paris hotels

user behavior as a predictor of a successful search
User Behavior as a Predictor of a Successful Search
  • Goal: given a sequence of user actions within a specific logical session, predict whether the search goal ended up successfully or not
    • Success – user is satisfied with the results
    • Failure – user is unsatisfied
  • Method:
    • Analyze the query log and learn success/failure patterns
    • Use learned models for prediction
  • Proposed by [Hassan, Jones and Klinkner]
slide17
Data
  • A rich query log of queries and user actions:
    • Query (Q)
    • Search Click (SR)
    • Sponsored Search Click (AD)
    • Related Search Click (RL)
      • Query recommendations
    • Spelling Suggestion Click (SP)
    • Shortcut Click (SC)
      • E.g. image, video, news …
    • Any Other Click (OTH)
      • E.g. browser tab
data labeling
Data Labeling
  • Random sample of user sessions
  • Human editors labeled data:
    • Detected logical sessions
    • Success/Failure
      • definitely successful, probably successful, unsure, probably unsuccessful, and definitely unsuccessful
markov models
Markov Models
  • Partition training data into two splits
    • successful goals
    • unsuccessful goals
  • For each group construct a Markov Model derived from seen action sequences
    • A Model describes the user behavior in case of a successful/unsuccessful search goal
    • Action type is a state
    • Weight a transition from one state to another according to its probability as observed in the data (MLE)
illustration
Illustration

0.3

0.1

0.6

1

0.4

Q

SR

START

END

0.1

0.5

1

1

AD

RL

prediction 1
Prediction (1)
  • Given a user’s action sequence, need to predict whether it is successful or not
  • We’ve learned two models Ms and Mf of successful and unsuccessful patterns
  • Compute the probability that a given sequence S={S1,…,Sn} was generated from Ms, same for Mf
  • Predict success/non success by computing log likelihood
    • Formulas in next slide
prediction 2
Prediction (2)

Formulas taken from the paper

ad