1 / 30

Research on Enterprise Track of TREC 2007

Research on Enterprise Track of TREC 2007. Huizhong Duan, Qi Zhou, Zhen Lu, Ou Jin, Shenghua Bao, Yunbo Cao and Yong Yu Apex Knowledge & Data Management Lab. Presenter: Yangbo Zhu. Document Search. Outline. Static Ranking Approaches. Link Sparse; Similar, Small Rank. HostRank Algorithm.

cwen
Download Presentation

Research on Enterprise Track of TREC 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Research on Enterprise Track of TREC 2007 Huizhong Duan, Qi Zhou, Zhen Lu, Ou Jin, Shenghua Bao, Yunbo Cao and Yong Yu Apex Knowledge & Data Management Lab Presenter: Yangbo Zhu

  2. Document Search

  3. Outline

  4. Static Ranking Approaches Link Sparse; Similar, Small Rank.

  5. HostRank Algorithm

  6. Calculate the Host’s Importance http://www.csiro.au/science http://www.atnf.csiro.au www.atnf.csiro.au/~rgooch http://www.ento.csiro.au

  7. Propagation of the Host’s Importance • Hierarchical Weight Structure www.atnf.csiro.au/computing www.atnf.csiro.au/computing/software www.atnf.csiro.au/computing/software/smongo

  8. Propagation of the Host’s Importance • The factor ω is defined as: • Index(p) is a boolean value denoting whether the page is an index page. • Link(P) is define as the percentage of the inlinks of Page P Reference: G. Xue, Q. Yang, H. Zeng, Y. Yu, Z. Chen: Exploiting the Hierarchical Structure for Link Analysis. In: Proceedings of SIGIR2005

  9. Data Preprocessing

  10. Title Extraction Title H1 H2 H1

  11. Body Detection • Dividing the page based on DOM tree structure.

  12. Body Detection • Filtering divided parts

  13. Similarity Matching

  14. Position Weighting

  15. Query Combination

  16. BM25 Equations

  17. Expert Search

  18. Outline S. Bao, H. Duan, Q. Zhou, M. Xiong, Y. Cao and Y. Yu: Research on Expert Search at Enterprise Track of TREC 2006. In: proceedings of 15th Text Retrieval Conference (TREC 2006), 2006.

  19. Static Expert Ranking

  20. ExpertRank

  21. ExpertRank

  22. Topic Sensitive ExpertRank Topic Sensitive Expert Rank

  23. Topic Sensitive ExpertRank

  24. Data Preprocessing

  25. Parsing Corpus for Expert Name List • Some anti-spam format

  26. Parsing Corpus for Expert Name List zywang@noble.org Emails with single letter in its person name part No.10@csiro.au publishing.emu@csiro.au a.scott@dem.csiro.au

  27. VisualPageRank and Expert Homepage Detection • VisualPageRank • Too simple: Too complicated:

  28. VisualPageRank and Expert Homepage Detection • Example of Expert Homepage

  29. Query Expansion

  30. Conclusion

More Related