1 / 12

Grouper: A Dynamic CLUSTERIN G INTERFACE to WEB SEARCH RESULTS

Grouper: A Dynamic CLUSTERIN G INTERFACE to WEB SEARCH RESULTS. Erdem Sarıgil - 21000089 Oğuz Yılmaz - 21000082. Grouper. Interface to the results of the HuskySearch Dynamically groups the search results into clusters using Suffix Tree Clustering Algorithm (STC)

psyche
Download Presentation

Grouper: A Dynamic CLUSTERIN G INTERFACE to WEB SEARCH RESULTS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grouper: A Dynamic CLUSTERING INTERFACE to WEB SEARCH RESULTS Erdem Sarıgil - 21000089 Oğuz Yılmaz - 21000082

  2. Grouper • Interface to the results of the HuskySearch • Dynamically groups the search results into clustersusing Suffix Tree Clustering Algorithm (STC) • The goal make search engine results easy to browse by clustering them • Grouper receives hit from different engines, and only looks at the top hits from each search engine

  3. Post-retrieval Clustering • Based on the returned document set • Superior results than pre-retrieval clustering • Some key requirements: • Coherent Clusters • Efficiently Browsable • Speed • Algorithmic Speed • Snippet-Tolerance

  4. Suffix Tree Clustering (STC) • Linear time clustering algorithm • STC has three logical steps: • Document cleaning • Identifying base clusters using a suffix tree • Merging these base clusters into clusters • STC has several novel characteristics: • Overlapping clusters • Bag-of-words • Well suited for Web document clustering • Robust in such “noisy” situations

  5. User Interface

  6. User Interface (cont’d)

  7. Making the Clusters Easy to Browse Three heuristic to identify redundant phases: • Word Overlap • Sub- and Super- Strings • Most General Phase with Low Coverage

  8. Speeeeed • Quality Search • Time Quality OR Time Quality • the vice president of vice president

  9. Coherent Clusters

  10. Comparison • Number of documents followed • Time Spent • Click Distance

  11. Comparison (cont’d)

  12. Thanks for our patience

More Related