Dynamic reference sifting
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Dynamic Reference Sifting PowerPoint PPT Presentation


  • 44 Views
  • Uploaded on
  • Presentation posted in: General

Dynamic Reference Sifting. A Case Study in the Homepage Domain. Jonathan Shakes, Marc Langheinrich, and Oren Etzioni University of Washington Department of Computer Science and Engineering. Outline. Introduction Softbots and Dynamic Reference Sifters Searching the Web

Download Presentation

Dynamic Reference Sifting

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Dynamic reference sifting

Dynamic Reference Sifting

A Case Studyin the Homepage Domain

Jonathan Shakes, Marc Langheinrich, and Oren Etzioni

University of Washington

Department of Computer Science and Engineering


Outline

Outline

  • Introduction

    • Softbots and Dynamic Reference Sifters

    • Searching the Web

  • Case Study: Personal Homepages

    • Ahoy! The Homepage Finder

    • Experimental Results

  • Future and Related Work

    • Other Domains for DRS

Introduction - Outline


Softbots and dynamic reference sifters

Softbots and Dynamic Reference Sifters

  • Dynamic Reference Sifters

    • Part of “Internet Softbots Project” [Etzioni and Weld, 1994]

  • Softbots

    • person states what

    • softbot determines how and where

Introduction - Softbots & DRS


Information retrieval definitions

Information Retrieval Definitions

  • Precision

    • Measure of Search Service Accuracy

  • Recall

    • Measure of Search Service Comprehensiveness

Introduction - IR Definitions


Precision

Precision

Relevant Search Results

  • Precision:

All Search Results

Search Space

Irrelevant Documents

Relevant Documents

Introduction - IR Definitions

All Search Results


Recall

Recall

Relevant Search Results

  • Recall:

All Relevant Documents

Search Space

Irrelevant Documents

Relevant Documents

Introduction - IR Definitions

All Search Results


Searching the web

Searching the Web

  • Web Indices (AltaVista, Hotbot)

    • Automated- high recall

    • Keyword based- low precision

  • Web Directories (Yahoo, A2Z)

    • Classified manually- high precision- low recall

  • Manual Search

    • slow

Introduction - Searching the Web


Searching the web1

Searching the Web

  • Dynamic Reference SifterAn information retrieval tool that uses:

    • multiple, complementary data sources for high recall,

    • domain-specific filtering techniques for high precision, and

    • machine learning to improve performance over time.

Introduction - Searching the Web


Case study the personal homepage domain

Case Study: The Personal Homepage Domain

  • “Conventional” Search Services

    • Indices find too much

    • Directories find too little

    • Manual Search takes too long

    • Failures are expensive

  • Ahoy! The Homepage Finderattempts to provide

    • High Recall

    • High Precision

    • Speed

Case Study - Overview


Ahoy architecture

Ahoy! Architecture

User Input

Web PageReference Source

Case Study - Ahoy! Architecture

InstitutionalInformation Source

E-mail Address Sources

Filters

Output


Performance analysis

Performance Analysis

  • Test using lists of known homepages

    • Researchers sample: 582 homepages

    • Transportation sample: 53 homepages

  • Compare against

    • MetaCrawler, Hotbot, AltaVista, Yahoo!

  • Maximize competitors’ performance by

    • using “expert” options

    • allowing up to 200 references

Case Study - Performance Analysis


Performance analysis1

Performance Analysis

  • “Precision” - Researcher Sample

Case Study - Performance Analysis


Performance analysis2

Performance Analysis

  • Top 10 References - Researcher Sample

Case Study - Performance Analysis


Performance analysis3

Performance Analysis

  • Recall (all References) - Researcher Sample

Case Study - Performance Analysis


Performance analysis4

Performance Analysis

  • Recall (all References) - Transportation Sample

Case Study - Performance Analysis


Learning in ahoy

Learning in Ahoy!

  • Learns URL ‘patterns’

    • http://sdcc3.ucsd.edu/home-pages/<Login>/

    • 50,000+ patterns in 3 months

  • Indexes patterns by institution

    • 11,000+ institutions indexed in 3 months

  • Performance Impact

    • Up to 8% gain in recall

Case Study - Learning in Ahoy!


Domain characteristics

Domain Characteristics

  • Many elements

  • Easily identifiable target

  • Some targets found in web indices

  • User can form specific query

Future Work - Domain characteristics


Domain examples

Domain Examples

  • Personal Homepages

  • Articles or Papers

  • Product Reviews

  • Price Lists

  • Transportation Schedules

  • Recipes

  • Jokes

    • and more

Future Work - Domain examples


Un related work

(un)Related Work

  • Automated Index Generation

    • WebCrawler, Lycos, AltaVista, ...

  • Automated Directory Generation

    • IAF, OKRA, WhoWhere?

  • Dynamic Internet Search

    • Netfind

  • Learning User Preferences on web

    • WebWatcher, Syskill & Webert, Firefly

  • Learning about the web

    • ShopBot, auto-generated wrappers

Future Work - (un)Related Work


Summary and conclusions

Summary and Conclusions

  • Dynamic Reference Sifting

    • domain-specific, high precision, high recall, fast

  • Ahoy! the Homepage Finder

    • 2000 searches per day

    • 1-2 references returned per search

    • 50-75% targets found

      • 25% not found, often correctly so

    • 10-15 seconds per search

  • Future domains

    • Academic Papers, Jokes

Summary & Conclusions


Ahoy the homepage finder

Ahoy! the Homepage Finder

http://www.cs.washington.edu/research/ahoy/

Ahoy! The Homepage Finder


  • Login