1 / 1

eXtract: A Snippet Generation System for XML Search

eXtract: A Snippet Generation System for XML Search. Yu Huang, Ziyang Liu, Yi Chen Arizona State University . http://eXtract.asu.edu. Motivation: . Good snippets help users to easily judge the relevance and find desired results. Problem: How to generate good snippets for XML search?.

renata
Download Presentation

eXtract: A Snippet Generation System for XML Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. eXtract: A Snippet Generation System for XML Search Yu Huang, Ziyang Liu, Yi Chen Arizona State University http://eXtract.asu.edu Motivation: Good snippets help users to easily judge the relevance and find desired results. Problem: How to generate good snippets for XML search? No existing work on XML snippet generation yet. Contributions: eXtract - the first system on snippet generation for XML search[Huang et al, SIGMOD ’08] Challenge: What are good snippets? Challenge: What information in result is significant to achieve the properties? Solution: Designed an algorithm to generate IList Solution: Identified desirable properties • Self-contained • Distinguishable • Representative • Small • The entities involved in the query result • Keys of the query result • Dominant features 0 Challenge: How to select instances in the result when generating a snippet to maximally cover IList within a size bound? Solution: Designed an efficient and effective algorithm that generates good snippets from IList • Defined Instance Selection Problem: how to select node instances in a query result to cover as many items in IList as possible in the ranked order to generate a snippet within a bound? • Theorem: The Instance Selection Problem is NP-hard. • Designed a greedy algorithm that generates good snippets efficiently. retailer apparel Texas Sample Query: Sample Snippet (of size 11) Find the apparel retailers in Texas. A Query Result retailer retailer Features and their occurrences entity: store: clothes: clothes: clothes: attribute: city: fitting: situation: category: value: occurrences Houston:2 Dallas: 1 men: 146 women: 101 children: 53 casual: 223 formal: 77 outwear: 116 suit: 92 pants: 43 shirts: 39 shorts: 10 … name product store store store name product store apparel Brook Brothers Brook Brothers apparel name state city merchandises state merchandises Texas Houston Galleria Bad Texas … clothes clothes clothes clothes clothes Good … situation category category fitting situation category situation fitting situation category fitting men casual outwear suit casual men casual outwear men formal suit Dominance score (DS): DS (Houston) = 2/(3/2) = 1.33, DS (children) = 53/(300/3) = 0.53 IList : Texas, apparel, retailer, store, Brook Brothers, outwear, suit, casual, men Keywords Related entities Key Dominant features Experiments: • Comparison of Google Desktop, Greedy (eXtract), Optimal algorithm for instance selection. • User study scores are 2.3, 3.9 and 4.2 out of 5, respectively. Quality Speed Precision Recall Time(s) 34th International Conference on Very Large Data Bases, August 23th-28th, 2008, Auckland, New Zealand

More Related