1 / 12

Using ODP Metadata to Personalize Search

Using ODP Metadata to Personalize Search. Presented by Lan Nie 0 9 / 2 1/2005, Lehigh University. Introduction. ODP metadata 4 million sites, 590,000 categories Tree Structure Categories: inner node Pages: leaf node, high quality, representative Using ODP Metadata to personalize Search

hedia
Download Presentation

Using ODP Metadata to Personalize Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using ODP Metadata to Personalize Search Presented by Lan Nie 09/21/2005, Lehigh University

  2. Introduction • ODP metadata • 4 million sites, 590,000 categories • Tree Structure • Categories: inner node • Pages: leaf node, high quality, representative • Using ODP Metadata to personalize Search • 4 billion vs. 4 million • Using ODP Metadata for personalized search • Is biasing possible in the ODP context? Extend ODP classifications from its current 4 million to a 4 billion Web automatically by biasing

  3. Using ODP Metadata For Personalized Search • User Profile: several topics from ODP selected by user • Personalized Search • Send Q to a search Engine S(E.g., Google, ODP Search) • Res=URLs returned by S • For i= 1 to size(Res) Dist[i]=Distance(Res[i], Prof) • Resort Res based on Dist • Representation • Both user profile and URL(50% in Google directory) can be represented as a set of nodes in the directory tree • Distance ( Profile, URL) • Minimum distance between the 2 set of nodes.

  4. Naïve Distances Minimum tree distance • Intra-topic links • Subsumer Graph shortest path • Inter-topic links • Complex Distance The bigger the subsumer’s depth is, the more related are the nodes • Combing with Google PageRank Some Google Results are not annotated

  5. Experimental Results

  6. Extending ODP Annotations To The Web • Manual annotation for the whole web is impossible • Biasing is an implicit way for extending annotations to the Web • Is basing possible in the ODP context? Are ODP entries good biasing sets to obtain relevant results: generate rankings which are different enough from the non-biased ranking • When does biasing make a difference? Find the characteristics the biasing set has to exhibit in order to obtain relevant results

  7. Experimental Setup • Compare the similarity between top 100 non-biased PageRank results and biased results • Similarity Measure • OSIM: degree of overlap between the top n elements of two rank lists • KSim: degree of agreement on ordering between the two rank lists

  8. Choice of Biasing Sets • Top [0-10]% PageRank pages • Top[0-2]% PageRank pages • Randomly selected pages • Low PageRank pages • Varied the sum of score within the set between 0.000005% and 10% of the total sum over all pages (TOT). • Experiments are done on a crawl of 3 million pages, and then applied on Stanford WebBase crawl.

  9. Biasing set consists of good pages

  10. Biasing set consists of random selected pages

  11. According to the random model of biasing, every set with TOT below 0.015% is good for biasing. • Results are not influence by the crawl size (3 million crawl vs 120 million WebBase crawl) • Entries in ODP have TOT below than 0.015% thus biasing is possible in the ODP context

  12. Conclusions • A Personalized search algorithm to rank urls based on the distance between user profile and url in the ODP taxonomy. • Biasing on ODP entries will take effect, thus it is feasible to extend the manual ODP classification to the Web is feasible

More Related