Personalized ontologies for web search and caching l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

Personalized Ontologies for Web Search and Caching PowerPoint PPT Presentation


  • 76 Views
  • Uploaded on
  • Presentation posted in: General

Personalized Ontologies for Web Search and Caching. Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer Science The University of Kansas. Outline. Motivation User profiles creation and maintenance evaluation Applications

Download Presentation

Personalized Ontologies for Web Search and Caching

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Personalized ontologies for web search and caching l.jpg

Personalized Ontologies for Web Search and Caching

Susan Gauch

Information and Telecommunications Technology Center

Electrical Engineering and Computer Science

The University of Kansas


Outline l.jpg

Outline

  • Motivation

  • User profiles

    • creation and maintenance

    • evaluation

  • Applications

    • re-ranking (and filtering) search results

    • Web caching

  • Conclusions


Motivation l.jpg

Motivation

  • Decrease access time for Web pages

    • Server approaches

      • use access logs to decrease access times for popular pages

      • not tailored to individuals

      • doesn’t decrease network traffic

    • Network approaches

      • cache popular pages multiple places in the network

      • not tailored to individuals


Personalization l.jpg

Personalization

  • Different information needs for different users

    • can we learn user’s interest?

      • Explicitly?

      • Implicitly

    • can we use this information?

      • improved search

      • improved browsing

      • faster Web page access


Intelligent web caching l.jpg

Intelligent Web Caching

  • Improved (and faster) search results

    • pre-caching all search results expensive

      • Internet search engines return 50% irrelevant pages

    • improved knowledge of user’s likely behavior

      • intelligent pre-caching

      • use past behaviors to predict future behaviors

      • pre-cache “best” pages close to individuals


Context l.jpg

Context

  • ProFusion: www.profusion.com

  • OBIWAN: distributed content based IR

    • Web clustered into regions

    • clustering criteria: content, location, company

    • search: query brokered to “best” regions; within region brokered to most promising sites

    • browsing a region means browsing its sites simultaneously

    • www.ittc.ukases.edu/obiwan


User profiles l.jpg

User Profiles

  • Applications

    • Usenet news filtering

    • recommendation services: web browsing, books

    • intelligent pre-caching

  • Should

    • accurately reflect actual interests

    • require as little feedback as possible

    • be dynamic


User profiles creation l.jpg

User profiles: Creation

  • Obvious and often used: keywords

    • not structured (ambiguous)

    • static

    • have to be explicitly mentioned

  • Our approach

    • watch over a user's shoulder while surfing

    • automatically determine documents’ content

    • central: large ontology(concept hierarchy)


Document classification l.jpg

Document Classification

  • Documents as weighted

  • keyword vectors:

    • n different words-> n dimensions

    • weights based on

      word frequency and rarity

  • Browsing hierarchy: 10 web pages per node

  • Concatenate them -> keyword vector

  • Content of a page: most similar vector


Updating profiles l.jpg

Updating profiles

  • Static: document related

    • content: weights of top nodes for surfed document

    • length of page

  • Dynamic: time spent

  • Combine them

    • for instance:weight * (time/length)

    • changes in interest in the five categories

  • User profile: weighted ontology


Profile evaluation l.jpg

Profile evaluation

  • Accordance with actual user interests

    • 10/20 interest categories describe actual interests

    • describe interests

      “pretty well”: 3.5/5

  • Convergence

    • stabilization of # ofcategories over time?

    • do converge after 320

      surfed pages!


Profiles summary l.jpg

Profiles: Summary

  • Stored as weighted ontologies

  • Profiles represent actual interests quite well

  • Up to 150 top categories

  • Two adjustment functions make profiles converge

    • after 320 pages

    • length of page doesn't really matter, but time spent does


Personalizing search results l.jpg

Personalizing Search Results

  • 50% of top 20 results irrelevant

  • Same search mechanism for 200 million people?

  • Goal:

    • identify relevant documents and put them on top of the result list

    • (pre-fetch relevant results)

  • Difficult problem: 10% increase is very good


Re ranking l.jpg

Re-Ranking

  • Ranking a function of:

    • search engine's original ranking

    • extents to which top 5 categories describe document's content

    • personal interest in each of these top categories

  • “More relevant items on top of result list”:

    • system’s ability topresent all relevant items

    • system’s ability to present only relevant items


Recall and precision l.jpg

Recall and Precision

  • Combination: Recall/Precision graphs

  • Example: ranked documents 1,…,20

    • relevant 2,5,10,14,19

    • recall points 1/5, 2/5, 3/5, 4/5, 5/5

    • precisions 1/2, 2/5, 3/10, 4/14, 5/19


Re ranking evaluation l.jpg

Re-Ranking: Evaluation

  • Overall performance increase of up to 8%

    • at each recall cutoff, up to 10% more relevant documents have been retrieved


Browsing assistance l.jpg

Browsing Assistance

  • Analyze current page

    • locate links

  • Identify which links are most likely to be followed by the user

    • popularity of the link overall

    • relevance of linked page to user’s interests

  • Problem

    • if you have to download the whole page to analyze it, you’ve increased the network utilization


Privacy l.jpg

Privacy

  • Is the user aware that their behavior is being monitored?

  • Can users turn it off?

  • Where are profiles stored?

  • With whom are profiles shared?

  • How are profiles protected?

  • How are profiles used?


Conclusions l.jpg

Conclusions

  • Automatic creation of structured user profiles is possible

  • Profiles are reasonably accurate

  • Applications in improving the search quality and Web page access efficiency

  • Evaluation of re-ranking search results: performance increase of up to 8%


Future work l.jpg

Future Work

  • Incorporating profile generator into browser

  • Connect system to ProFusion, OBIWAN

  • Personalize structure of ontology

  • Re-train classifier

  • More applications: recommendation service, web caching, browsing, ...

  • Explicit user feedback?


  • Login