1 / 17

Efficient Grouping of Search Engine Returned Citations for Person Name Queries

This research paper proposes a solution to the problem of search engines returning too many citations for person name queries. The solution involves grouping the citations by person using attributes, links, and page similarity. The confidence matrix for each facet is combined to create a final confidence matrix, from which grouping is determined using the Stanford Certainty Measure. Precision and recall measurements are used to evaluate the effectiveness of the solution.

wwallen
Download Presentation

Efficient Grouping of Search Engine Returned Citations for Person Name Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grouping Search-Engine Returned Citations for Person Name Queries Reema Al-Kamha Research Supported by NSF

  2. The Problem • Search engines return too many citations • Example: “Christopher Young” • Google returns around 26,500 citations • Many people named “Christopher Young” • It would help to group the citations by person. • How do we group them?

  3. “Christopher Young” Query to Google

  4. “Christopher Young” Query Results for Our System

  5. Our Solution • Three facets • Attributes • Links • Page Similarity • Confidence matrix for each facet • Final confidence matrix

  6. Attributes Email Address, Phone, City, State, Zip Code.

  7. Confidence Matrix for Attributes Facet D1&D5 have the same State. D1&D9 have the same State. D4&D9 have the same City.

  8. Links • Returned citations that have a same host www.cs.byu.edu/info/dwembley.html www.cs.byu.edu/info/directory.php • One returned citation links to another returned citation.

  9. Confidence Matrix for Links Facet D1 D0 D5 D0

  10. Page Similarity • Similarity between two documents to which the two returned citations link • The number of shared pairs of adjacent capitalized words

  11. Confidence Matrix for Page Similarity Facet

  12. Final Matrix • Combine the confidence matrices using Stanford Certainty Measure. • For Example: D1, D5 • Confidence value for the attribute facet is 0.49 • Confidence value for the link facet is 0 • Confidence value for the link facet is 0.95 • Confidence value between D1, D5 is 0.49+0.95- 0.49*0.95 = 0.97

  13. Final Matrix and Grouping Method {D0,D1}, {D0,D5}, {D1,D4}, {D1,D5}, {D1,D8}, {D1,D9}, {D4,D5}, {D4,D8}, {D4,D9}, {D5,D8}, {D5,D9}, {D8,D9} {D0,D1,D4,D5,D8,D9}, {D2}, {D3}, {D6}, {D7}

  14. Recall and Precision • Assume we get:{0,1,3} {2,4} {5} • The correct grouping is: {0,1,2,3} {4,5} • We get:(0,1) (0,3) (1,3) (2,4) • The correct group gives: (0,1) (0,2) (0,3) (1,2) (1,3) (2,3) (4,5) • R=3/7 , P=3/(3+1)

  15. Split and Merge • Assume we get:{0,1,3} {2,7,4} {5} {6} • The correct grouping is: {0,1,3,5,6} {2,7} {4} • Merge: 1/8 +1/8 = 2/8 • Split: 1/8

  16. Measurements • Precision and Recall • R=89% , P=96.6% • Weighted Merge and Split • M=0.036 , S=0.008

  17. Contributions • Grouped person-name queries by person • Provided an additional tool for search engine queries

More Related