html5-img
1 / 19

A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs. Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong July 1st , 2009. Link Analysis. IR Models. for. for. - HITS. - VSM. - PageRank.

kueng
Download Presentation

A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Generalized Co-HITS Algorithm and Its Application to Bipartite Graphs Hongbo Deng, Michael R. Lyu and Irwin King Department of Computer Science and Engineering The Chinese University of Hong Kong July1st, 2009

  2. Link Analysis IR Models for for - HITS - VSM - PageRank - Language Model - etc. - etc. - Personalized PageRank (PPR) - Linear Combination - etc. Introduction Many data can be modeled as bipartite graphs Content Graph Relevance Semantic relations Incorporate Content with Graph

  3. An Illustration google mapquest mapquest map quest mapquest google.com google united states map map quest united states map map of florida weather mapquest.com us map world map Query suggestion for query “map”: • Noisy link data • Lack of relevance constraints More reasonable HITS PPR

  4. Outline • Introduction • Generalized Co-HITS • Preliminaries • Iterative Framework • Regularization Framework • Experiments • Conclusion

  5. Preliminaries Content Graph X Y Explicit links: Hidden links:

  6. Initial scores Score propagation Generalized Co-HITS • Basic idea • Incorporate the bipartite graph with the content information from both sides • Initialize the vertices with the relevance scores x0, y0 • Propagate the scores (mutual reinforcement)

  7. Generalized Co-HITS • Iterative framework

  8. Smoothness Fit initial scores Iterative  Regularization Framework • Consider the vertices on one side

  9. Generalized Co-HITS • Regularization Framework R2 R1 R3 Wuu Wvv Intuition: the highly connected vertices are most likely to have similar relevance scores.

  10. Generalized Co-HITS • Regularization Framework The cost function: Optimization problem: Solution:

  11. Application to Query-URL Bipartite Graphs • Bipartite graph construction • Edge weighted by the click frequency • Normalize to obtain the transition matrix • Overall Algorithm

  12. Outline • Introduction • Preliminaries • Generalized Co-HITS • Iterative Framework • Regularization Framework • Experiments • Conclusion

  13. Experimental Evaluation • Data collection • AOL query log data • Cleaning the data • Removing the queries that appear less than 2 times • Combining the near-duplicated queries • 883,913 queries and 967,174 URLs • 4,900,387 edges • 250,127 unique terms

  14. Evaluation: ODP Similarity • A simple measure of similarity among queries using ODP categories (query  category) • Definition: • Example: • Q1: “United States”  “Regional > North America > United States” • Q2: “National Parks”  “Regional > North America > United States > Travel and Tourism > National Parks and Monuments” • Precision at rank n (P@n): • 300 distinct queries 3/5

  15. Experimental Results • Comparison of Iterative Framework personalized PageRank one-step propagation general Co-HITS Result 1: The improvements of OSP and CoIter over the baseline (the dashed line) are promising when compared to the PPR. The initial relevance scores from both sides provide valuable information.

  16. Experimental Results • Comparison of Regularization Framework single-sided regularization double-sided regularization Result 2: SiRegu can improve the performance over the baseline. CoRegu performs better than SiRegu, which owes to the newly developed cost function R3. Moreover, CoRegu is relatively robust.

  17. Experimental Results • Detailed Results Result 3: The CoRegu-0.5 achieves the best performance. It is very essential and promising to consider the double-sided regularization framework for the bipartite graph.

  18. Conclusions • Propose the Co-HITS algorithm to incorporate the bipartite graph with the content information from both sides. • The Co-HITS algorithm is more general, which includes HITS and personalized PageRank as special cases. • The CoRegu is more robust with the newly developed cost function, which achieves the best performance with consistent and promising improvements.

  19. Q&A Thanks!

More Related