dynamic reference sifting n.
Download
Skip this Video
Download Presentation
Dynamic Reference Sifting

Loading in 2 Seconds...

play fullscreen
1 / 21

Dynamic Reference Sifting - PowerPoint PPT Presentation


  • 68 Views
  • Uploaded on

Dynamic Reference Sifting. A Case Study in the Homepage Domain. Jonathan Shakes, Marc Langheinrich, and Oren Etzioni University of Washington Department of Computer Science and Engineering. Outline. Introduction Softbots and Dynamic Reference Sifters Searching the Web

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Dynamic Reference Sifting' - margaret-castillo


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
dynamic reference sifting

Dynamic Reference Sifting

A Case Studyin the Homepage Domain

Jonathan Shakes, Marc Langheinrich, and Oren Etzioni

University of Washington

Department of Computer Science and Engineering

outline
Outline
  • Introduction
    • Softbots and Dynamic Reference Sifters
    • Searching the Web
  • Case Study: Personal Homepages
    • Ahoy! The Homepage Finder
    • Experimental Results
  • Future and Related Work
    • Other Domains for DRS

Introduction - Outline

softbots and dynamic reference sifters
Softbots and Dynamic Reference Sifters
  • Dynamic Reference Sifters
    • Part of “Internet Softbots Project” [Etzioni and Weld, 1994]
  • Softbots
    • person states what
    • softbot determines how and where

Introduction - Softbots & DRS

information retrieval definitions
Information Retrieval Definitions
  • Precision
    • Measure of Search Service Accuracy
  • Recall
    • Measure of Search Service Comprehensiveness

Introduction - IR Definitions

precision
Precision

Relevant Search Results

  • Precision:

All Search Results

Search Space

Irrelevant Documents

Relevant Documents

Introduction - IR Definitions

All Search Results

recall
Recall

Relevant Search Results

  • Recall:

All Relevant Documents

Search Space

Irrelevant Documents

Relevant Documents

Introduction - IR Definitions

All Search Results

searching the web
Searching the Web
  • Web Indices (AltaVista, Hotbot)
    • Automated - high recall
    • Keyword based - low precision
  • Web Directories (Yahoo, A2Z)
    • Classified manually - high precision - low recall
  • Manual Search
    • slow

Introduction - Searching the Web

searching the web1
Searching the Web
  • Dynamic Reference SifterAn information retrieval tool that uses:
    • multiple, complementary data sources for high recall,
    • domain-specific filtering techniques for high precision, and
    • machine learning to improve performance over time.

Introduction - Searching the Web

case study the personal homepage domain
Case Study: The Personal Homepage Domain
  • “Conventional” Search Services
    • Indices find too much
    • Directories find too little
    • Manual Search takes too long
    • Failures are expensive
  • Ahoy! The Homepage Finderattempts to provide
    • High Recall
    • High Precision
    • Speed

Case Study - Overview

ahoy architecture
Ahoy! Architecture

User Input

Web PageReference Source

Case Study - Ahoy! Architecture

InstitutionalInformation Source

E-mail Address Sources

Filters

Output

performance analysis
Performance Analysis
  • Test using lists of known homepages
    • Researchers sample: 582 homepages
    • Transportation sample: 53 homepages
  • Compare against
    • MetaCrawler, Hotbot, AltaVista, Yahoo!
  • Maximize competitors’ performance by
    • using “expert” options
    • allowing up to 200 references

Case Study - Performance Analysis

performance analysis1
Performance Analysis
  • “Precision” - Researcher Sample

Case Study - Performance Analysis

performance analysis2
Performance Analysis
  • Top 10 References - Researcher Sample

Case Study - Performance Analysis

performance analysis3
Performance Analysis
  • Recall (all References) - Researcher Sample

Case Study - Performance Analysis

performance analysis4
Performance Analysis
  • Recall (all References) - Transportation Sample

Case Study - Performance Analysis

learning in ahoy
Learning in Ahoy!
  • Learns URL ‘patterns’
    • http://sdcc3.ucsd.edu/home-pages/<Login>/
    • 50,000+ patterns in 3 months
  • Indexes patterns by institution
    • 11,000+ institutions indexed in 3 months
  • Performance Impact
    • Up to 8% gain in recall

Case Study - Learning in Ahoy!

domain characteristics
Domain Characteristics
  • Many elements
  • Easily identifiable target
  • Some targets found in web indices
  • User can form specific query

Future Work - Domain characteristics

domain examples
Domain Examples
  • Personal Homepages
  • Articles or Papers
  • Product Reviews
  • Price Lists
  • Transportation Schedules
  • Recipes
  • Jokes
    • and more

Future Work - Domain examples

un related work
(un)Related Work
  • Automated Index Generation
    • WebCrawler, Lycos, AltaVista, ...
  • Automated Directory Generation
    • IAF, OKRA, WhoWhere?
  • Dynamic Internet Search
    • Netfind
  • Learning User Preferences on web
    • WebWatcher, Syskill & Webert, Firefly
  • Learning about the web
    • ShopBot, auto-generated wrappers

Future Work - (un)Related Work

summary and conclusions
Summary and Conclusions
  • Dynamic Reference Sifting
    • domain-specific, high precision, high recall, fast
  • Ahoy! the Homepage Finder
    • 2000 searches per day
    • 1-2 references returned per search
    • 50-75% targets found
      • 25% not found, often correctly so
    • 10-15 seconds per search
  • Future domains
    • Academic Papers, Jokes

Summary & Conclusions

ahoy the homepage finder
Ahoy! the Homepage Finder

http://www.cs.washington.edu/research/ahoy/

Ahoy! The Homepage Finder