1 / 14

Diversifying Query Results on Semi-Structured Data

Diversifying Query Results on Semi-Structured Data. Md. Mahbub Hasan University of California, Riverside. XML Document. Bib. PhDThesis. PhDThesis. Paper. School. School. Title. Author. Author. Author. UToronto. First Name. First Name. First Name. Last Name. Last Name.

Download Presentation

Diversifying Query Results on Semi-Structured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Diversifying Query Results on Semi-Structured Data Md. MahbubHasan University of California, Riverside

  2. XML Document Bib PhDThesis PhDThesis Paper School School Title Author Author Author UToronto First Name First Name First Name Last Name Last Name Last Name UToronto Networking Michalis Michalis Christos Faloutsos Faloutsos Faloutsos

  3. Query • Find all Bibliography records related to Faloutsos Bib //Bib//Faloutsos Faloutsos XPath Expression Twig Pattern

  4. Results Bib PhDThesis PhDThesis Paper School School Title Author Author Author UToronto First Name First Name First Name Last Name Last Name Last Name UToronto Networking Michalis Michalis Christos Faloutsos Faloutsos Faloutsos

  5. Problem • Suppose we can return the user only two results( k = 2) • Which two results we should return?

  6. Which Two Results We Should Return? Bib PhDThesis PhDThesis Paper School School Title Author Author Author UToronto First Name First Name First Name Last Name Last Name Last Name UToronto Networking Michalis Michalis Christos Faloutsos Faloutsos Faloutsos

  7. Solution • Suppose we can return the user only two results( k = 2) • Which two results we should return? • Return the results that are most diverse to each other • The idea is to help the user to better understand/explore the result set

  8. Diversity Problem Can be divided into two subproblems • How to compute the distance between two results? • How to find k most diverse results efficiently from the set of candidate answers?

  9. How to Compute the Distance between Two Results? • Two types of differences between results • Structural difference • Content difference

  10. Structural Differences Bib Bib PhDThesis PhDThesis Paper School School Title Author Author Author Bib UToronto First Name First Name First Name Last Name Last Name Last Name UToronto Networking Christos Michalis Michalis Faloutsos Faloutsos Faloutsos

  11. Content Differences Bib Bib PhDThesis PhDThesis Paper School School Title Author Author Author Bib UToronto First Name First Name First Name Last Name Last Name Last Name UToronto Networking Christos Michalis Michalis Faloutsos Faloutsos Faloutsos

  12. Finding Diverse Results • Naïve Approach • Compute all pair-wise distances of the results • Find the k-result subset with maximum diversity • Challenges to improve the naïve approach • Reduce the number of distance computations • Prune large fraction of k-result subsets

  13. Conclusion • Distance Measure for Structural Query results • Novel and Efficient • Considers both Structural and Content Information • Diversification Algorithm • Heuristic approach to improve the naïve algorithm • Future Work • Consider approximate matches • Approximation in structure • Approximation in value

  14. Thank You!

More Related