Loading in 2 Seconds...

An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies

Loading in 2 Seconds...

- 225 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies' - Melvin

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies

Yannis Tzitzikas 1

Anastasia Analyti 2

Nicolas Spyratos 3

Panos Constantopoulos 2,4

1 Instituto di Scienza e Technologie dell’Informazione CNR-ISTI,Italy

2 Institute of Computer Science, ICS-FORTH, Greece

3 Laboratoire de Recherche en Informatique, Universite de Paris-Sud, France

4 Department of Computer Science, University of Crete, Greece

Outline of the presentation

- Introduction - Motivation
- Faceted Classification and Faceted Taxonomies
- Advantages and Problems
- Compound Terms and Compound Taxonomies
- The Algebra
- Operations
- Examples
- Algorithms
- Deriving Navigational Trees
- Prototype implementation
- Concluding Remarks

Yannis Tzitzikas et al., EJC'2003

Introduction

- Existing ways to locate information in the Web
- searching (using search engines like Google)
- browsing (using catalogues like Yahoo!, ODP)

- Web Catalogues (or indices using controlled structured vocabularies):

[-]: index only a subset of the pages that are indexed by search engines

[+]: ensure indexing consistency

[+]: enable intelligent reasoning

[+]: enable browsing

- Currently, the catalogues are also exploited by the search engines:
- for improving the measuring of relevance
- for giving to the user a set of related pages to each page of the answer
- for limiting the scope of the search

Yannis Tzitzikas et al., EJC'2003

Drawbacks of the taxonomies that are used by Web Catalogues

DESIGNER

USER

(1) Big size (e.g. currently Open Directory has 460.000 terms)

(2) Inconsistent and incomplete terminology and structuring

- Laborious object indexing
- Hard to update/revise
- Large storage requirements

- Hard to understand
- Laborious browsing

Yannis Tzitzikas et al., EJC'2003

Faceted Classification and Faceted Taxonomies

Faceted classification was developed, prior to the existence ofcomputers,by S. R. Ranganathan (1892-1972), a Hindumathematician working as a librarian.

* A faceted taxonomy consists of a set of facets

* Each facet is a group of elemental concepts

* Each object is indexed by synthesizing elemental concepts

Key point:Faceted taxonomies do not require an a priori division of concepts into subconcepts (only relationships between elemental concepts are stored)

- Advantages of faceted taxonomies:
- they are easier to build and understand
- they require less storage space requirements
- they are more scalable

Yannis Tzitzikas et al., EJC'2003

Faceted Taxonomies

Location

Sports

Mainland

Islands

SeaSports

WinterSports

Crete

Pilio

Olympus

Yannis Tzitzikas et al., EJC'2003

Example of using one taxonomy

Complete

and balanced

decimal tree

Total: 111,111,111 terms

100 million indexing terms

1 billion pages

blocks of 10 pages

Yannis Tzitzikas et al., EJC'2003

Example of using a faceted taxonomy consisting of 4 facets

Total: 444 terms

100 terms

x

100 terms

x

100 terms

100 terms

400 terms

x

100 million indexing terms

1 billion pages

blocks of 10 pages

Yannis Tzitzikas et al., EJC'2003

Example of using a faceted taxonomy consisting of 8 facets

Total: 88 terms!

…

…

10 terms

…

…

10 terms

80 terms

x

x

100 million indexing terms

1 billion pages

blocks of 10 pages

Yannis Tzitzikas et al., EJC'2003

The Problem of Faceted Taxonomies

Location

Sports

Mainland

Islands

SeaSports

WinterSports

Crete

Pilio

Olympus

Invalid compound terms may appear during

object indexing or browsing/retrieval

A compound term is invalid if it cannot be applied to any object of the domain

- Consequences:
- laborious/erroneous object indexing
- difficulties in browsing

Yannis Tzitzikas et al., EJC'2003

Valid and Invalid Compound Terms

F

Location

Sports

Mainland

Islands

Example:

SeaSports

WinterSports

Crete

Pilio

Olympus

Valid Compound Terms

Sports.Location,

Sports.Islands

Sports.Crete

Sports.Mainland

Sports.Pilio

Sports.Olymous

SeaSports.Location,

SeaSports.Islands

SeaSports.Crete

SeaSports.Mainland

SeaSports.Pilio

WinterSports.Location,

WinterSports.Mainland

WinterSports.Pilio

WinterSports.Olympus

Valid

Invalid

Invalid Compound Terms

SeaSports.Olympus

WinterSports.Islands

WinterSports.Crete

Yannis Tzitzikas et al., EJC'2003

The Idea

Operations:

product

Combines terms from different facets

n-ary

plus-product

Combines terms from different facets plus positive modifiers

n-ary

minus-product

Combines terms from different facets plus negative modifiers

n-ary

self-product

Combines terms from one facet

unary

self-plus-product

Combines terms from one facet plus positive modifiers

unary

self-minus-product

Combines terms from one facet plus negative modifiers

unary

Initial Operands:

Facet terminologies:

Define an algebra with operators that allow specifying the set of valid compound terms without having to enumerate all of the valid compound terms.

Yannis Tzitzikas et al., EJC'2003

Compound Terms and Compound Taxonomies

- Compound term: any subset s of T
- Compound terminology S: a set of compound terms
- Compound taxonomy: a pair (S, ) where
- S is a compound terminology and

Example:

{Sports,Crete} {Sports},

{Sports,Crete} {Sports,Greece}

Greece

Sports

Crete

Yannis Tzitzikas et al., EJC'2003

The Product Operation

S

S’

{Greece}

{Sports}

{Greece}

{Sports}

{Islands}

{Greece,Sports}

{SeaSports}

{Islands,Sports}

{Greece,SeaSorts}

{Islands}

{SeaSports}

{Islands,SeaSorts}

Yannis Tzitzikas et al., EJC'2003

The Plus-Product Operation

P={{Islands,SeaSports}, {Greece,SnowSki}}

S

S’

{Sports}

{Greece}

{Greece}

{Sports}

{Islands}

{Greece,Sports}

{SeaSports}

{WinterSports}

{Islands}

{SeaSports}

{WinterSports}

{Islands,Sports}

{Greece,SeaSports}

{Greece,WinterSports}

{SnowSki}

{SnowSki}

{Islands,SeaSports}

{Greece,SnowSki}

Yannis Tzitzikas et al., EJC'2003

The Minus-Product Operation

N={{Islands, WinterSports}}

S

S’

{Sports}

{Greece}

{Greece}

{Sports}

{Islands}

{Greece,Sports}

{SeaSports}

{WinterSports}

{Islands}

{SeaSports}

{WinterSports}

{Islands,Sports}

{Greece,SeaSports}

{Greece,WinterSports}

{SnowSki}

{SnowSki}

{Islands,SeaSports}

{Greece,SnowSki}

Yannis Tzitzikas et al., EJC'2003

The Self-[Plus/Minus]-Product Operations

Self-Product

Self-Plus-Product

Self-Minus-Product

Yannis Tzitzikas et al., EJC'2003

The Self-Plus-Product: Example

P={{SeaSki,WindSurfing},

{SnowSki, SnowBoard}}

S

{Sports}

{Sports}

{SeaSports}

{WinterSports}

{SeaSports}

{WinterSports}

{SeaSki}

{Windsurfing}

{SnowSki}

{SnowBoard}

{SeaSki}

{Windsurfing}

{SnowSki}

{SnowBoard}

{SeaSki,WindSurfing}

{SnowSki,SnowBoard}

Yannis Tzitzikas et al., EJC'2003

Expressions and Well-formed Expressions

The set of expressions over a facet set {F1,…, Fk} is defined according to the grammar:

An expression e is well-formed if:

(a) each basic compound terminology appears at most once in e,

(b) the parameters P/N are subsets of the corresponding genuine compound terms

In this way:

- no conflicts arise
- monotonic behavior

Yannis Tzitzikas et al., EJC'2003

Example: Building the catalog of a tourist portal

P = {{Iraklio, Furn.Appartments}, {Iraklio,Rooms}, {Ammoudara, Furn. Appartments},

{Ammoudara,Rooms}, {Hersonisson, Furn.Apartments}, {Ammoudara, Bungalows, Jacuzzi},

{Hersonissos,Rooms,Indoor}, {Hersonissos, Bungalows,Outdoor} }

|P|=8

N = {{Iraklio, Bungalows}},

P = { {Hersonisson, Rooms, Indoor}, {Hersonissos, Bungalows,Outdoor},

{Ammoudara,Bungalows,Jacuzzi} }

|P|+|N|=4

Facilities

Accommodation

Location

Jacuzzi

SwimmingPool

Furn.

Appartments

Rooms

Bungalows

Iraklion

Ammoudara

Hersonissos

Indoor

Outdoor

3 facets, 13 terms, 890 compound terms from which only 96 are valid

Yannis Tzitzikas et al., EJC'2003

Checking the Validity of a Compound Term

Let Se be the compound terminology defined by an algebraic expression e.

We provide an algorithm for checking whether s Se without

having to compute (and store) the entire Se.

The time complexity for this algorithm is:

=> Only F and e have to be stored

Yannis Tzitzikas et al., EJC'2003

Generating Navigation Trees

Islands

Crete

SeaSports

byLocation

Mainland

Pilio

Pilio

Sports

WinterSports

byLocation

Mainland

Olympus

Crete

bySports

SeaSports

Islands

bySports

SeaSports

byLocation

Crete

byLocation

SeaSports

Pilio

bySports

WinterSports

Mainland

WinterSports

Olympus

bySports

Location

Objective: Given an expression e generate dynamically a navigation tree

with nodes that correspond to valid compound terms only

for using it during object indexing and browsing

The navigation tree also contains nodes for facet crossing

Yannis Tzitzikas et al., EJC'2003

Application in Web Catalogues

Taxonomies of existing catalogs

big,

incomplete,

scalability problems

Faceted Taxonomies + Algebra

small,

clear,

scalable

dynamically

Navigation Trees

P|N

Yannis Tzitzikas et al., EJC'2003

Prototype Implementation using a RDBMS

Three tables are used for storing the faceted taxonomy and the expression e.

TERMS

SUBSUMPTION

PARAMETERS

name

id

term1

term2

F1

F2

...

Fk

Architecture

Designer

Indexer/User

Nav. Tree

Generator

Expression

Builder

Validity

Checker

Storage Manager

RDBMS

Yannis Tzitzikas et al., EJC'2003

Concluding Remarks

Faceted Taxonomies :

[+]conceptual clarity (it is easier to understand)

[+] compactness (it takes less space)

[+] scalability (the update operations can be formulated easier and be performed more efficiently)

[-]invalid compound terms may appear.

The Proposed Algebra :

[+]provides a solution to the problem of invalid compound terms

[+] Aids indexingandbrowsing (and prevents errors)

Yannis Tzitzikas et al., EJC'2003

Download Presentation

Connecting to Server..