semi automated creation of facet hierarchies
Download
Skip this Video
Download Presentation
Semi-Automated Creation of Facet Hierarchies

Loading in 2 Seconds...

play fullscreen
1 / 76

Semi-Automated Creation of Facet Hierarchies - PowerPoint PPT Presentation


  • 303 Views
  • Uploaded on

Semi-Automated Creation of Facet Hierarchies. Marti Hearst School of Information, UC Berkeley Joint work with Dr. Emilia Stoica. Outline. Faceted Metadata Definition Advantages Flamenco: Search Interface Design using Faceted Metadata Castanet:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Semi-Automated Creation of Facet Hierarchies' - MartaAdara


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
semi automated creation of facet hierarchies

Semi-Automated Creation ofFacet Hierarchies

Marti Hearst

School of Information, UC Berkeley

Joint work with Dr. Emilia Stoica

outline
Outline
  • Faceted Metadata
    • Definition
    • Advantages
  • Flamenco:
    • Search Interface Design using Faceted Metadata
  • Castanet:
    • (Semi) Automated Tool for Creation of Category Systems
    • Comparison to State-of-the-Art Alternatives
  • Conclusions
focus search and navigation of large collections
Focus: Search and Navigation of Large Collections

Shopping Sites

Digital Libraries

E-Government

Sites

Image

Collections

problems with site search
Problems with Site Search
  • Study by Vividence in 2001 on 69 Sites
    • 70% eCommerce
    • 31% Service
    • 21% Content
    • 2% Community
  • Poorly organized search results
    • Frustration and wasted time
  • Poor information architecture
    • Confusion
    • Dead ends
    • "back and forthing"
    • Forced to search
what we want to achieve
What we want to Achieve
  • Integrate browsing and searching seamlessly
  • Support exploration and learning
  • Avoid dead-ends, “pogo’ing”, and “lostness”
main idea
Main Idea
  • Use hierarchical faceted metadata
  • Design the interface to:
    • Allow flexible navigation
    • Provide previews of next steps
    • Organize results in a meaningful way
    • Support both expanding and refining the search
the problem with hierarchy

robin

penguin

salmon

cobra

bat

otter

wolf

robin

bat

penguin

otter, seal

salmon

wolf

robin

bat

salmon

wolf

cobra

otter

penguin

seal

The Problem With Hierarchy
  • Most things can be classified in more than one way.
  • Most organizational systems do not handle this well.
  • Example: Animal Classification

Skin

Covering

otter

penguin

robin

salmon

wolf

cobra

bat

Locomotion

Diet

the problem with hierarchy8
The Problem with Hierarchy
  • Inflexible
    • Force the user to start with a particular category
    • What if I don’t know the animal’s diet, but the interface makes me start with that category?
  • Wasteful
    • Have to repeat combinations of categories
    • Makes for extra clicking and extra coding
  • Difficult to modify
    • To add a new category type, must duplicate it everywhere or change things everywhere
the problem with hierarchy9
The Problem With Hierarchy

start

swim

fly

run

slither

fur

scales

feathers

fur

scales

feathers

fur

scales

feathers

fish

fish

fish

fish

fish

fish

fish

fish

fish

rodents

rodents

rodents

rodents

rodents

rodents

rodents

rodents

rodents

insects

insects

insects

insects

insects

insects

insects

insects

insects

salmon

bat

robin

wolf

the idea of facets
The Idea of Facets
  • Facets are a way of labeling data
    • A kind of Metadata (data about data)
    • Can be thought of as properties of items
  • Facets vs. Categories
    • Items are placed INTO a category system
    • Multiple facet labels are ASSIGNED TO items
the idea of facets11
The Idea of Facets
  • Create INDEPENDENT categories (facets)
    • Each facet has labels (sometimes arranged in a hierarchy)
  • Assign labels from the facets to every item
    • Example: recipe collection

Ingredient

Cooking

Method

Chicken

Stir-fry

Bell Pepper

Curry

Course

Cuisine

Main Course

Thai

the idea of facets12
The Idea of Facets
  • Break out all the important concepts into their own facets
  • Sometimes the facets are hierarchical
    • Assign labels to items from any level of the hierarchy

Preparation Method

Fry

Saute

Boil

Bake

Broil

Freeze

Desserts

Cakes

Cookies

Dairy

Ice Cream

Sorbet

Flan

Fruits

Cherries

Berries

Blueberries

Strawberries

Bananas

Pineapple

using facets
Using Facets
  • Now there are multiple ways to get to each item

Preparation Method

Fry

Saute

Boil

Bake

Broil

Freeze

Desserts

Cakes

Cookies

Dairy

Ice Cream

Sherbet

Flan

Fruits

Cherries

Berries

Blueberries

Strawberries

Bananas

Pineapple

Fruit > Pineapple

Dessert > Cake

Preparation > Bake

Dessert > Dairy > Sherbet

Fruit > Berries > Strawberries

Preparation > Freeze

next view the list
Next, view the list!

The user must first choose an

Award type (literature), then browse

through the laureates in

chronological order.

No choice is given to, say organize

by year and then award, or by

country, then decade, then award, etc.

using facets32
Using Facets
  • The system only shows the labels that correspond to the current set of items
    • Start with all items and all facets
    • The user then selects a label within a facet
    • This reduces the set of items (only those that have been assigned to the subcategory label are displayed)
    • This also eliminates some subcategories from the view.
advantages of facets
Advantages of Facets
  • Can’t end up with empty results sets
    • (except with keyword search)
  • Helps avoid feelings of being lost.
  • Easier to explore the collection.
    • Helps users infer what kinds of things are in the collection.
    • Evokes a feeling of “browsing the shelves”
  • Is preferred over standard search for collection browsing in usability studies.
    • (Interface must be designed properly)
advantages of facets34
Advantages of Facets
  • Seamless to add new facets and subcategories
  • Seamless to add new items.
  • Helps with “categorization wars”
    • Don’t have to agree exactly where to place something
  • Interaction can be implemented using a standard relational database.
  • May be easier for automatic categorization
information previews
Information previews
  • Use the metadata to show where to go next
    • More flexible than canned hyperlinks
    • Less complex than full search
  • Help users see and return to previous steps
  • Reduces mental work
    • Recognition over recall
    • Suggests alternatives
  • More clicks are ok only if (J. Spool)
      • The “scent” of the target does not weaken
      • If users feel they are going towards, rather than away, from their target.
facets vs hierarchy
Facets vs. Hierarchy
  • Early Flamenco studies compared allowing multiple hierarchical facets vs. just one facet.
  • Multiple facets was preferred and more successful.
limitation of facets
Limitation of Facets
  • Do not naturally capture MAIN THEMES
  • Facets do not show RELATIONS explicitly

Aquamarine

Red

Orange

Door

Doorway

Wall

  • Which color associated with which object?

Photo by J. Hearst, jhearst.typepad.com

terminology clarification
Terminology Clarification
  • Facets vs. Attributes
    • Facets are shown independently in the interface
    • Attributes just associated with individual items
      • E.g., ID number, Source, Affiliation
      • However, can always convert an attribute to a facet
  • Facets vs. Labels
    • Labels are the names used within facets
    • These are organized into subhierarchies
  • Synonyms
    • There should be alternate names for the category labels
    • Currently (in Flamenco) this is done with subcategories
      • E.g., Deer has subcategories “stag”, “fawn”, “doe”
flamenco usability studies
Flamenco Usability Studies
  • Usability studies done on 3 collections:
    • Recipes (epicurious): 13,000 items
    • Architecture Images: 40,000 items
    • Fine Arts Images: 35,000 items
  • Conclusions:
    • Users like and are successful with the dynamic faceted hierarchical metadata, especially for browsing tasks
    • Very positive results, in contrast with studies on earlier iterations.
most recent usability study
Most Recent Usability Study
  • Participants & Collection
    • 32 Art History Students
    • ~35,000 images from SF Fine Arts Museum
  • Study Design
    • Within-subjects
      • Each participant sees both interfaces
      • Balanced in terms of order and tasks
    • Participants assess each interface after use
    • Afterwards they compare them directly
      • Data recorded in behavior logs, server logs, paper-surveys; one or two experienced testers at each trial.
      • Used 9 point Likert scales.
      • Session took about 1.5 hours; pay was $15/hour
post interface assessments
Post-Interface Assessments

All significant at p<.05 except “simple” and “overwhelming”

post test comparison

15

16

2

30

1

29

4

28

8

23

6

24

28

3

1

31

2

29

Post-Test Comparison

Which Interface Preferable For:

Baseline

Faceted

Find images of roses

Find all works from a given period

Find pictures by 2 artists in same media

Overall Assessment

More useful for your tasks

Easiest to use

Most flexible

More likely to result in dead ends

Helped you learn more

Overall preference

how to create facet hierarchies

How to Create Facet Hierarchies?

Our Approach: Castanet

our approach

Build

tree

Compress

tree

Select terms

Get

hypernym

paths

WordNet

Divide into facets

Our Approach
  • Leverage the structure of WordNet

Documents

1 select terms
Select well distributed

terms from collection

red

blue

1. Select Terms

Build

tree

Comp.

tree

Documents

Select terms

Get

hypernym

paths

WordNet

2 get hypernym path

Build

tree

Comp.

tree

Documents

Select terms

Get

hypernym

paths

abstraction

abstraction

property

property

WordNet

visual property

visual property

color

color

chromatic color

chromatic color

red, redness

blue, blueness

2. Get Hypernym Path

red

blue

3 build tree

abstraction

abstraction

abstraction

property

property

property

visual property

visual property

visual property

color

color

color

chromatic color

chromatic color

chromatic color

red, redness

blue, blueness

red, redness

blue, blueness

red

blue

3. Build Tree

Build

tree

Comp.

tree

Documents

Select terms

Get

hypernym

paths

WordNet

red

blue

4 compress tree

color

chromatic color

red

blue

green

4. Compress Tree

Build

tree

Comp.

tree

Documents

Select terms

Get

hypernym

paths

WordNet

color

chromatic color

red, redness

blue, blueness

green, greenness

red

blue

green

4 compress tree cont
4. Compress Tree (cont.)

Build

tree

Comp.

tree

Documents

Select terms

Get

hypernym

paths

WordNet

color

color

chromatic color

red

blue

green

red

blue

green

5 divide into facets
5. Divide into Facets

Divide into facets

disambiguation

2 paths for same word

Sense 2 for word “tuna”

organism, being

=> fish

=> food fish

=> tuna

=> bony fish

=> spiny-finned fish

=> percoid fish

=> tuna

Sense 1 for word “tuna”

organism, being

=> plant, flora

=> vascular plant

=> succulent

=> cactus

=> tuna

2 paths for

same sense

Disambiguation
  • Ambiguity in:
    • Word senses
    • Paths up the hypernym tree
how to select the right senses and paths
How to Select the Right Senses and Paths?
  • First: build core tree
    • (1) Create paths for words with only one sense
    • (2) Use Domains
      • Wordnet has 212 Domains
        • medicine, mathematics, biology, chemistry, linguistics, soccer, etc.
      • Automatically scan the collection to see which domains apply
      • The user selects which of the suggested domains to use or may add own
      • Paths for terms that match the selected domains are added to the core tree
  • Then: add remaining terms to the core tree.
using domains
Using Domains

dip glosses:

Sense 1: A depression in an otherwise level surface

Sense 2: The angle that a magnet needle makes with horizon

Sense 3: Tasty mixture into which bite-size foods are dipped

dip hypernyms

Sense 1 Sense 2 Sense 3

solid shape, form food

=> concave shape => space => ingredient, fixings

=> depression => angle => flavorer

Given domain “food”, choose sense 3

castanet evaluation63
Castanet Evaluation
  • This is a tool for information architects, so people of this type did the evaluation
  • We compared output on
    • Recipes
    • Biomedical journal titles
  • We compared to two state-of-the-art algorithms
    • LDA (Blei et al. 04)
    • Subsumption (Sanderson & Croft ’99)
evaluation method
Evaluation Method
  • Information architects assessed the category systems
  • For each of 2 systems’ output:
    • Examined and commented on top-level
    • Examined and commented on two sub-levels
  • Then comment on overall properties
    • Meaningful?
    • Systematic?
    • Likely to use in your work?
evaluation results
Evaluation Results
  • Results on recipes collection for “Would you use this system in your work?”
    • Yes in some cases or yes definitely:
      • Pine (Castanet): 29/34
      • Oak (LDA): 0/18
      • Birch (Subsumption): 6/16
  • Results on quality of categories:
opportunities for tagging
Opportunities for Tagging
  • New opportunity: Tagging, folksonomies
    • (flickr de.lici.ous)
    • People are created facets in a decentralized manner
    • They are assigning multiple facets to items
    • This is done on a massive scale
    • This leads naturally to meaningful associations
conclusions
Conclusions
  • Flexible application of hierarchical faceted metadata is a proven approach for navigating large information collections.
    • Midway in complexity between simple hierarchies and deep knowledge representation.
    • Currently in use on e-commerce sites; spreading to other domains
  • Systems are needed to help create faceted metadata structures
    • Our WordNet-based algorithm, while not perfect, seems like it will be a useful tool for Information Architects.
acknowledgements
Acknowledgements
  • Flamenco Team
    • Brycen Chun, Ame Elliott, Jennifer English, Kevin Li, Rashmi Sinha, Emilia Stoica, Kirsten Swearingen, Ka-Ping Yee
  • Castanet
    • Emilia Stoica
  • Funding
    • This work supported in part by NSF (IIS-9984741)
for more information flamenco berkeley edu

For more information:flamenco.berkeley.edu

Thank you!

Marti Hearst & Emilia Stoica

ad