Organizing search results
Download
1 / 28

vwdiana - PowerPoint PPT Presentation


  • 263 Views
  • Uploaded on

Organizing Search Results. Susan Dumais Microsoft Research. Organizing Search Results. Algorithms and interfaces that improve the effectiveness of search Beyond ranked lists Main goal to support search Also information analysis and discovery Example applications

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'vwdiana' - Rita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Organizing search results l.jpg

Organizing Search Results

Susan Dumais

Microsoft Research


Organizing search results2 l.jpg
Organizing Search Results

  • Algorithms and interfaces that improve the effectiveness of search

    • Beyond ranked lists

    • Main goal to support search

    • Also information analysis and discovery

  • Example applications

    • SWISH, results classification

    • GridViz, results summarization

    • SIS, personal landmarks for context


Searching with information structured hierarchically swish l.jpg
Searching with Information Structured Hierarchically (SWISH)

  • Collaborators

    • Edward Cutrell, Hao Chen (Berkeley)

  • Key Themes

    • Going beyond long lists of results

    • Classification algorithms

    • UI techniques

  • More about it

    • http://research.microsoft.com /~sdumais


Slide4 l.jpg

Organizing Search Results

List Organization

SWISH Category Organization

=> Shopping

=> Automotive

=> Computers

=> Automotive

Query: “jaguar”


Web directory l.jpg

Buy or Sell a Car

Chat

Finance & Insurance

Magazines & Books

Maintenance & Repair

Makes, Models & Clubs

Motorcycles

New Car Showrooms

Off-Road, 4X4 & RVs

Other Auto Interests

Shows & Museums

Trucks & Tractors

Vintage & Classic

Web Directory

  • LookSmart Directory Structure

    • ~400k pages; 17k categories; 7 levels

    • 13 top-level categories; 150 second-level categories

  • Top-level Categories

  • Automotive

  • Business & Finance

  • Computers & Internet

  • Entertainment & Media

  • Health & Fitness

  • Hobbies & Interests

  • Home & Family

  • People & Chat

  • Reference & Education

  • Shopping & Services

  • Society & Politics

  • Sports & Recreation

  • Travel & Vacations


Swish system l.jpg
SWISH System

  • Combines the advantages of

    • Directories - Manually crafted structure but small <~3 million pages>

    • Search engines - Broad coverage but limited metadata <~3 billion pages>

    • Project search engine results to category structure

  • Two main components

    • Text classification models

    • UI for integrating search results and structure

      • Context (category structure) plus focus (search results)


Swish architecture l.jpg

...

web

search

results

local

search

results

Train

(offline)

Classify

(online)

manually

classified

web

pages

SVM

model

SWISH Architecture


Learning classification l.jpg
Learning & Classification

  • Support Vector Machine (SVM)

    • Accurate and efficient for text classification (Dumais et al., Joachims)

    • Model = weighted vector of words

      • “Automobile” = motorcycle, vehicle, parts, automobile, harley, car, auto, honda, porsche …

      • “Computers & Internet” = rfc, software, provider, windows, user, users, pc, hosting, os, downloads ...

  • Hierarchical models for LS directory

    • 1 model for top level; N models for second

    • Very useful in conjunction w/ user interaction


Slide9 l.jpg

User Interface Experiments

List Organization

Category Organization


Slide10 l.jpg

No Cat Names

+ Cat Names

Hover

Inline

Browse

Hover

Inline

Group Interface

List Interface


Slide11 l.jpg

HARD

HARD

EASY

EASY

Group

List

Effect of Query Difficulty


Swish summary and design implications l.jpg
SWISH: Summary and Design Implications

  • Text Classification

    • Learn accurate category models

    • Classify new web pages on-the-fly

    • Organize search results

  • User Interface

    • Tightly couple search results with category structure

    • User manipulation of presentation of category structure


Gridviz l.jpg
GridViz

  • Collaborators

    • George Robertson, Edward Cutrell, Jeremy Goecks (Georgia Tech)

  • Key Themes

    • Abstract beyond individual results

    • Highly interactive interface to support understanding of trends and relationships

  • More about it

    • http://research.microsoft.com/~sdumais


Gridviz14 l.jpg
GridViz

  • Summarize the results of a search

  • Grid-based design

    • Axes represent topic, time, people

    • Cells encode frequency, recency

  • Supports activities like:

    • What newsgroups are active (on topic x)?

    • What people are active, authoritative (on topic x)?

    • When did I last interact w/ people?



Slide16 l.jpg

List View

GridViz

User Interface Experiments


Gridviz summary l.jpg
GridViz Summary

  • Abstracting beyond individual results

  • Highly interactive interface

  • Grid-based design

    • Axes represent people, topic, time

    • Cells encode frequency, recency

  • Preliminary but promising


Stuff i ve seen sis l.jpg
Stuff I’ve Seen (SIS)

  • Collaborators

    • Edward Cutrell, Raman Sarin, JJ Cadiz, Gavin Jancke, Daniel Robbins, Merrie Ringel (Stanford)

  • Key Themes

    • Your content

    • Information re-use

    • Integration across sources

  • More about it

    • … internal for now


Search today l.jpg
Search Today …

  • Many locations, interfaces for finding things (e.g., web, mail, local files, help, history, intranet)

  • Often slow


Search with sis l.jpg
Search with SIS

  • Unified index of stuff you’ve seen

    • Unify access to information regardless of source – mail, archives, calendar, files, web pages, etc.

    • Full-text index of content plus metadata attributes (e.g., creation time, author, title, size)

    • Automatic and immediate update of index

    • Rich UI possibilities, since it’s your content

  • Architecture

    • Client side indexing and storage

    • Built using MS Search components



Sis alpha observations l.jpg
SIS Alpha Observations

  • 800+ internal users

    • Usage logs (incl different interfaces), survey data

  • File types opened

    • 76% Email

    • 14% Web pages

    • 10% Files

  • Age of items accessed

    • 7% today

    • 22% within the last week

    • 46% within the last month


Sis alpha observations23 l.jpg
SIS Alpha Observations

  • Use of other search tools

    • Non-SIS search for web, email, and files decreases

  • Importance of people

    • 25% of the queries involve people’s names

  • Importance of time

    • Date by far the most popular sort field, followed by rank, author, title

      • Even when rank is the default


Sis ui innovations timeline w landmarks l.jpg
SIS UI InnovationsTimeline w/ Landmarks

  • Importance of time

    • Timeline interface

  • Contextualize results using important landmarks as pointers into human memory

    • General: holidays, world events

    • Personal: important photos, appointments




Sis summary l.jpg
SIS Summary

  • Unified index of stuff you’ve seen

    • Fast access to full-text and metadata, from heterogeneous sources

    • Automatic and immediate update of index

    • Rich UI possibilities

  • Next steps

    • Better support for tagging -> “flatland”

    • Implicit queries for finding related info, and identifying “Stuff I Should See”

    • Integration with richer activity-based info, Eve


Organizinging search results l.jpg
Organizinging Search Results

  • Algorithms and interfaces to improve search

    • Use structure and context

  • Examples and key themes

    • SWISH … grouping

    • GridViz … abstraction

    • SIS … personal content and landmarks

  • Also

    • Important attributes: People, topics, time

    • Interaction

    • Evaluation

  • More information

    • http://research.microsoft.com/~sdumais

    • [email protected]

  • Christopher Lee of (SIG)IR …

    • http://www.cdvp.dcu.ie/SIGIR/index.html


ad