Query log analysis
Download
1 / 23

Query Log Analysis - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on

Query Log Analysis. Naama Kraus. Slides are based on the papers: Andrei Broder , A taxonomy of web search Ricardo Baeza -Yates , Graphs from Search Engine Queries Hassan, Jones, Klinkner , Beyond DCG: User Behavior as a Predictor of a Successful Search. A Taxonomy of Web Searches.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Query Log Analysis' - tamah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Query log analysis

Query Log Analysis

Naama Kraus

Slides are based on the papers:

Andrei Broder, A taxonomy of web search

Ricardo Baeza-Yates, Graphs from Search Engine Queries

Hassan, Jones, Klinkner,

Beyond DCG: User Behavior as a Predictor of a Successful Search


A taxonomy of web searches
A Taxonomy of Web Searches

  • [Andrei Broder] classifies web queries according to their intent:

    • Navigational - reach a particular site

      • Example: cnn , Oracle

    • Informational - acquire some information

      • Example: the history of haifa , information retrieval

    • Transactional - perform some web-mediated activity. Further interaction is expected.

      • E.g. shopping, downloading files, accessing databases

      • Example: new balance shoes , Israel flights


Query log
Query Log

  • Search Engine Query Log records users’ searches

  • A typical record contains

    • Anonymous User id u

    • Search query q

    • Returned documents V

    • Clicked documents C

    • Timestamp t


Query log example
Query Log Example

1234 , apple, 12:04

1234, apple ipod, 12:05

1234ynet, 12:13

145google, 12:20

145eBay, 12:56

32ynet news, 12:59

145Solaris systen, 13:01

145Solaris system, 13:05


Session
Session

  • A sequence of searches of one particular user u within a specific time limit

  • S = < <u, q1 ,t1> , …, <u, qk, tk> >

  • t1 < …< tk (=> ordered sequence)

  • ti+1 – ti < t0 (=> t0 is a timeout threshold)

  • Note1 may contain non related queries

  • Note2 identifying sessions is easy


Session example
Session Example

  • 1234 , apple, 12:04

  • 1234, apple ipod, 12:05

  • 1234 ynet, 12:13

  • 1234 apple store, 12:20

  • 1234 cnn news, 12:56

  • 1234 cnn webcast, 12:59

  • 1234 apple apps, 13:01

  • Session 1

  • Session 2

  • Timeout threshold = 30 minutes


Query chain
Query Chain

  • A sequence of queries with a similar information need of a particular user

    • Also known as mission or logical session

  • Example:

    • haifa maps

    • haifa travel

    • attractions in haifa

  • Note1 contains related queries only

  • Note2 identifying chains is difficult


  • Query chain example
    Query Chain Example

    • 1234 , apple, 12:04

    • 1234, apple ipod, 12:05

    • 1234 ynet, 12:13

    • 1234 apple store, 12:20

    • 1234 cnn news, 12:56

    • 1234 cnn webcast, 12:59

    • 1234 apple apps, 13:01

    • chain1

    • chain2


    Click graph
    Click Graph

    Bipartite graph

    Nodes in left side are uniquequeries

    Nodes in right side are uniqueURLs

    An edge between q,u if there exists

    in the log a click on u for query q

    Edges may be weighted according to

    number of clicks

    This graph is used by numerous

    Algorithm for various purposes

    E.g., query and URL clustering,

    query recommendations …


    Query graphs
    Query Graphs

    Each unique query is

    a node in the graph

    Next slides –

    Connection types

    between queries

    (edges)

    Proposed by

    [Ricardo Baeza-Yates]


    Query graphs word graph
    Query Graphs – Word Graph

    paris hotels

    An edge between nodes

    exists, if queries share

    common terms

    Possible node weight –

    Number of occurrences

    in the log

    Possible edge weight -

    Jaccard distance

    london attractions

    paris attractions

    cheap paris hotels


    Query graphs session graph
    Query Graphs – Session Graph

    paris hotels

    Node’s q weight is the number of

    sessions that contain the

    query q (usually equals

    number of query occurrences)

    A directed edge from q1 to q2

    if q1 occurred before q2

    in the same session

    Edge’s weight is number

    of such occurrences

    paris attractions

    london attractions

    cheap paris hotels


    Query graphs url cover graph
    Query Graphs – URL Cover Graph

    paris hotels

    An edge exists between q1

    and q2, if they share clicked

    URLs

    Node weight = #occurrences

    Edge’s weight is the number of

    common clicks

    paris attractions

    london attractions

    cheap paris hotels


    Query graph url link graph
    Query Graph – URL Link Graph

    An edge exists between q1

    and q2, if there is at least one link between a url click of q1 and a url click of q2

    Node weight =#occurrences

    Edge’s weight is the number

    of such common links

    paris hotels

    paris attractions

    london attractions

    cheap paris hotels


    Query graph url terms graph
    Query Graph –URL Terms Graph

    Represent a clicked URL by

    a set of terms

    (whole page, snippet, anchors,

    title, a combination …)

    Weight terms by their frequencies

    Node weight =#occurrences

    There’s an edge between q1 and

    q2 if there are at least m common

    terms in at least one clicked

    url of q1 and one clicked url of q2

    Edge weight is sum of frequencies

    of common terms

    paris hotels

    paris attractions

    london attractions

    cheap paris hotels


    User behavior as a predictor of a successful search
    User Behavior as a Predictor of a Successful Search

    • Goal: given a sequence of user actions within a specific logical session, predict whether the search goal ended up successfully or not

      • Success – user is satisfied with the results

      • Failure – user is unsatisfied

    • Method:

      • Analyze the query log and learn success/failure patterns

      • Use learned models for prediction

    • Proposed by [Hassan, Jones and Klinkner]


    Data

    • A rich query log of queries and user actions:

      • Query (Q)

      • Search Click (SR)

      • Sponsored Search Click (AD)

      • Related Search Click (RL)

        • Query recommendations

      • Spelling Suggestion Click (SP)

      • Shortcut Click (SC)

        • E.g. image, video, news …

      • Any Other Click (OTH)

        • E.g. browser tab


    Data labeling
    Data Labeling

    • Random sample of user sessions

    • Human editors labeled data:

      • Detected logical sessions

      • Success/Failure

        • definitely successful, probably successful, unsure, probably unsuccessful, and definitely unsuccessful


    Markov models
    Markov Models

    • Partition training data into two splits

      • successful goals

      • unsuccessful goals

    • For each group construct a Markov Model derived from seen action sequences

      • A Model describes the user behavior in case of a successful/unsuccessful search goal

      • Action type is a state

      • Weight a transition from one state to another according to its probability as observed in the data (MLE)



    Illustration
    Illustration

    0.3

    0.1

    0.6

    1

    0.4

    Q

    SR

    START

    END

    0.1

    0.5

    1

    1

    AD

    RL


    Prediction 1
    Prediction (1)

    • Given a user’s action sequence, need to predict whether it is successful or not

    • We’ve learned two models Ms and Mf of successful and unsuccessful patterns

    • Compute the probability that a given sequence S={S1,…,Sn} was generated from Ms, same for Mf

    • Predict success/non success by computing log likelihood

      • Formulas in next slide


    Prediction 2
    Prediction (2)

    Formulas taken from the paper


    ad