1 / 23

Query Log Analysis

Query Log Analysis. Naama Kraus. Slides are based on the papers: Andrei Broder , A taxonomy of web search Ricardo Baeza -Yates , Graphs from Search Engine Queries Hassan, Jones, Klinkner , Beyond DCG: User Behavior as a Predictor of a Successful Search. A Taxonomy of Web Searches.

tamah
Download Presentation

Query Log Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Query Log Analysis Naama Kraus Slides are based on the papers: Andrei Broder, A taxonomy of web search Ricardo Baeza-Yates, Graphs from Search Engine Queries Hassan, Jones, Klinkner, Beyond DCG: User Behavior as a Predictor of a Successful Search

  2. A Taxonomy of Web Searches • [Andrei Broder] classifies web queries according to their intent: • Navigational - reach a particular site • Example: cnn , Oracle • Informational - acquire some information • Example: the history of haifa , information retrieval • Transactional - perform some web-mediated activity. Further interaction is expected. • E.g. shopping, downloading files, accessing databases • Example: new balance shoes , Israel flights

  3. Query Log • Search Engine Query Log records users’ searches • A typical record contains • Anonymous User id u • Search query q • Returned documents V • Clicked documents C • Timestamp t

  4. Query Log Example 1234 , apple, 12:04 1234, apple ipod, 12:05 1234ynet, 12:13 145google, 12:20 145eBay, 12:56 32ynet news, 12:59 145Solaris systen, 13:01 145Solaris system, 13:05 …

  5. Session • A sequence of searches of one particular user u within a specific time limit • S = < <u, q1 ,t1> , …, <u, qk, tk> > • t1 < …< tk (=> ordered sequence) • ti+1 – ti < t0 (=> t0 is a timeout threshold) • Note1 may contain non related queries • Note2 identifying sessions is easy

  6. Session Example • 1234 , apple, 12:04 • 1234, apple ipod, 12:05 • 1234 ynet, 12:13 • 1234 apple store, 12:20 • 1234 cnn news, 12:56 • 1234 cnn webcast, 12:59 • 1234 apple apps, 13:01 • Session 1 • Session 2 • Timeout threshold = 30 minutes

  7. Query Chain • A sequence of queries with a similar information need of a particular user • Also known as mission or logical session • Example: • haifa maps • haifa travel • attractions in haifa • Note1 contains related queries only • Note2 identifying chains is difficult

  8. Query Chain Example • 1234 , apple, 12:04 • 1234, apple ipod, 12:05 • 1234 ynet, 12:13 • 1234 apple store, 12:20 • 1234 cnn news, 12:56 • 1234 cnn webcast, 12:59 • 1234 apple apps, 13:01 • chain1 • chain2

  9. Click Graph Bipartite graph Nodes in left side are uniquequeries Nodes in right side are uniqueURLs An edge between q,u if there exists in the log a click on u for query q Edges may be weighted according to number of clicks This graph is used by numerous Algorithm for various purposes E.g., query and URL clustering, query recommendations …

  10. Query Graphs Each unique query is a node in the graph Next slides – Connection types between queries (edges) Proposed by [Ricardo Baeza-Yates]

  11. Query Graphs – Word Graph paris hotels An edge between nodes exists, if queries share common terms Possible node weight – Number of occurrences in the log Possible edge weight - Jaccard distance london attractions paris attractions cheap paris hotels

  12. Query Graphs – Session Graph paris hotels Node’s q weight is the number of sessions that contain the query q (usually equals number of query occurrences) A directed edge from q1 to q2 if q1 occurred before q2 in the same session Edge’s weight is number of such occurrences paris attractions london attractions cheap paris hotels

  13. Query Graphs – URL Cover Graph paris hotels An edge exists between q1 and q2, if they share clicked URLs Node weight = #occurrences Edge’s weight is the number of common clicks paris attractions london attractions cheap paris hotels

  14. Query Graph – URL Link Graph An edge exists between q1 and q2, if there is at least one link between a url click of q1 and a url click of q2 Node weight =#occurrences Edge’s weight is the number of such common links paris hotels paris attractions london attractions cheap paris hotels

  15. Query Graph –URL Terms Graph Represent a clicked URL by a set of terms (whole page, snippet, anchors, title, a combination …) Weight terms by their frequencies Node weight =#occurrences There’s an edge between q1 and q2 if there are at least m common terms in at least one clicked url of q1 and one clicked url of q2 Edge weight is sum of frequencies of common terms paris hotels paris attractions london attractions cheap paris hotels

  16. User Behavior as a Predictor of a Successful Search • Goal: given a sequence of user actions within a specific logical session, predict whether the search goal ended up successfully or not • Success – user is satisfied with the results • Failure – user is unsatisfied • Method: • Analyze the query log and learn success/failure patterns • Use learned models for prediction • Proposed by [Hassan, Jones and Klinkner]

  17. Data • A rich query log of queries and user actions: • Query (Q) • Search Click (SR) • Sponsored Search Click (AD) • Related Search Click (RL) • Query recommendations • Spelling Suggestion Click (SP) • Shortcut Click (SC) • E.g. image, video, news … • Any Other Click (OTH) • E.g. browser tab

  18. Data Labeling • Random sample of user sessions • Human editors labeled data: • Detected logical sessions • Success/Failure • definitely successful, probably successful, unsure, probably unsuccessful, and definitely unsuccessful

  19. Markov Models • Partition training data into two splits • successful goals • unsuccessful goals • For each group construct a Markov Model derived from seen action sequences • A Model describes the user behavior in case of a successful/unsuccessful search goal • Action type is a state • Weight a transition from one state to another according to its probability as observed in the data (MLE)

  20. Transition Weighting - MLE

  21. Illustration 0.3 0.1 0.6 1 0.4 Q SR START END 0.1 0.5 1 1 AD RL

  22. Prediction (1) • Given a user’s action sequence, need to predict whether it is successful or not • We’ve learned two models Ms and Mf of successful and unsuccessful patterns • Compute the probability that a given sequence S={S1,…,Sn} was generated from Ms, same for Mf • Predict success/non success by computing log likelihood • Formulas in next slide

  23. Prediction (2) Formulas taken from the paper

More Related