1 / 23

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data. Guoliang Li et al. The Problem. Keyword search introduces false positives. i.e.: “Conference 2008 Canada Data Integration”. The Problem. Websites are organized through content.

delu
Download Presentation

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EASE: An Effective 3-in-1 Keyword Search Method forUnstructured, Semi-structured and Structured Data Guoliang Li et al.

  2. The Problem • Keyword search introduces false positives i.e.: “Conference 2008 Canada Data Integration”

  3. The Problem • Websites are organized through content “Dr Pain, Math 343, Linear Algebra”

  4. The Solution Combine linked pages for search, ordered by ranking

  5. t s r u v The Solution • r-Radius Steiner Graph Problem • r-Radius Graph • Centric Distance: shortest path • Radius: minimal centric distance

  6. t “Math 343” s “Dr Pain” r u v The Solution • r-Radius Steiner Graph Problem • Content node: Contains a keyword • Steiner node: Two content nodes

  7. r-Radius Steiner Graph on search • Example:

  8. r-Radius Steiner Graph on search

  9. r-Radius Steiner Graph on search The graph model for the publication database

  10. Adjacency Matrix

  11. Finding r-Radius Graphs • Query: “Shanmugasundaram, Guo, XRANK”

  12. Avoiding Overlapping • Maximal r-Radius Graph • It is not contained in another r-Radius subgraph • But wait! There is still overlap • No problem: • Graph Clustering • Graph Partitioning

  13. Graph Clustering

  14. Ranking • TF-IDF-based IR ranking (tf,idf,ndl) is ok • Better yet: structural compactness-based DB ranking (SIM) • More compact more relevant • Length of path inversely proportional to ranking

  15. Indexing • IR score and Sim score are combined • An inverted index (EI-Index) is created • The inverted index stores keyword pairs and scores

  16. Experiments

  17. Results

  18. Results

  19. Results

  20. Results

  21. Strengths of the Paper • Very well written paper • Deep research on the topic • Mathematical based and proved • Baseline with current methods • Good results

  22. Weakness and Future Work • It might be too complex • Could work on ways to find Steiner graphs faster • It doesn’t consider cases of farming sites or bogus sites

  23. Questions?

More Related