1 / 25

An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies

An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies . Yannis Tzitzikas 1 Anastasia Analyti 2 Nicolas Spyratos 3 Panos Constantopoulos 2,4 1 Instituto di Scienza e Technologie dell’Informazione CNR-ISTI,Italy 2 Institute of Computer Science, ICS-FORTH, Greece

Melvin
Download Presentation

An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Algebraic Approach for Specifying Compound Terms in Faceted Taxonomies Yannis Tzitzikas 1 Anastasia Analyti 2 Nicolas Spyratos 3 Panos Constantopoulos 2,4 1 Instituto di Scienza e Technologie dell’Informazione CNR-ISTI,Italy 2 Institute of Computer Science, ICS-FORTH, Greece 3 Laboratoire de Recherche en Informatique, Universite de Paris-Sud, France 4 Department of Computer Science, University of Crete, Greece

  2. Outline of the presentation • Introduction - Motivation • Faceted Classification and Faceted Taxonomies • Advantages and Problems • Compound Terms and Compound Taxonomies • The Algebra • Operations • Examples • Algorithms • Deriving Navigational Trees • Prototype implementation • Concluding Remarks Yannis Tzitzikas et al., EJC'2003

  3. Introduction • Existing ways to locate information in the Web • searching (using search engines like Google) • browsing (using catalogues like Yahoo!, ODP) • Web Catalogues (or indices using controlled structured vocabularies): [-]: index only a subset of the pages that are indexed by search engines [+]: ensure indexing consistency [+]: enable intelligent reasoning [+]: enable browsing • Currently, the catalogues are also exploited by the search engines: • for improving the measuring of relevance • for giving to the user a set of related pages to each page of the answer • for limiting the scope of the search Yannis Tzitzikas et al., EJC'2003

  4. Drawbacks of the taxonomies that are used by Web Catalogues DESIGNER USER (1) Big size (e.g. currently Open Directory has 460.000 terms) (2) Inconsistent and incomplete terminology and structuring • Laborious object indexing • Hard to update/revise • Large storage requirements • Hard to understand • Laborious browsing Yannis Tzitzikas et al., EJC'2003

  5. Faceted Classification and Faceted Taxonomies Faceted classification was developed, prior to the existence ofcomputers,by S. R. Ranganathan (1892-1972), a Hindumathematician working as a librarian. * A faceted taxonomy consists of a set of facets * Each facet is a group of elemental concepts * Each object is indexed by synthesizing elemental concepts Key point:Faceted taxonomies do not require an a priori division of concepts into subconcepts (only relationships between elemental concepts are stored) • Advantages of faceted taxonomies: • they are easier to build and understand • they require less storage space requirements • they are more scalable Yannis Tzitzikas et al., EJC'2003

  6. Faceted Taxonomies Location Sports Mainland Islands SeaSports WinterSports Crete Pilio Olympus Yannis Tzitzikas et al., EJC'2003

  7. Example of using one taxonomy Complete and balanced decimal tree Total: 111,111,111 terms 100 million indexing terms 1 billion pages blocks of 10 pages Yannis Tzitzikas et al., EJC'2003

  8. Example of using a faceted taxonomy consisting of 4 facets Total: 444 terms 100 terms x 100 terms x 100 terms 100 terms 400 terms x 100 million indexing terms 1 billion pages blocks of 10 pages Yannis Tzitzikas et al., EJC'2003

  9. Example of using a faceted taxonomy consisting of 8 facets Total: 88 terms! … … 10 terms … … 10 terms 80 terms x x 100 million indexing terms 1 billion pages blocks of 10 pages Yannis Tzitzikas et al., EJC'2003

  10. The Problem of Faceted Taxonomies Location Sports Mainland Islands SeaSports WinterSports Crete Pilio Olympus Invalid compound terms may appear during object indexing or browsing/retrieval A compound term is invalid if it cannot be applied to any object of the domain • Consequences: • laborious/erroneous object indexing • difficulties in browsing Yannis Tzitzikas et al., EJC'2003

  11. Valid and Invalid Compound Terms F Location Sports Mainland Islands Example: SeaSports WinterSports Crete Pilio Olympus Valid Compound Terms Sports.Location, Sports.Islands Sports.Crete Sports.Mainland Sports.Pilio Sports.Olymous SeaSports.Location, SeaSports.Islands SeaSports.Crete SeaSports.Mainland SeaSports.Pilio WinterSports.Location, WinterSports.Mainland WinterSports.Pilio WinterSports.Olympus Valid Invalid Invalid Compound Terms SeaSports.Olympus WinterSports.Islands WinterSports.Crete Yannis Tzitzikas et al., EJC'2003

  12. The Idea Operations: product Combines terms from different facets n-ary plus-product Combines terms from different facets plus positive modifiers n-ary minus-product Combines terms from different facets plus negative modifiers n-ary self-product Combines terms from one facet unary self-plus-product Combines terms from one facet plus positive modifiers unary self-minus-product Combines terms from one facet plus negative modifiers unary Initial Operands: Facet terminologies: Define an algebra with operators that allow specifying the set of valid compound terms without having to enumerate all of the valid compound terms. Yannis Tzitzikas et al., EJC'2003

  13. Compound Terms and Compound Taxonomies • Compound term: any subset s of T • Compound terminology S: a set of compound terms • Compound taxonomy: a pair (S, ) where • S is a compound terminology and Example: {Sports,Crete}  {Sports}, {Sports,Crete}  {Sports,Greece} Greece Sports Crete Yannis Tzitzikas et al., EJC'2003

  14. The Product Operation S S’ {Greece} {Sports} {Greece} {Sports} {Islands} {Greece,Sports} {SeaSports} {Islands,Sports} {Greece,SeaSorts} {Islands} {SeaSports} {Islands,SeaSorts} Yannis Tzitzikas et al., EJC'2003

  15. The Plus-Product Operation P={{Islands,SeaSports}, {Greece,SnowSki}} S S’ {Sports} {Greece} {Greece} {Sports} {Islands} {Greece,Sports} {SeaSports} {WinterSports} {Islands} {SeaSports} {WinterSports} {Islands,Sports} {Greece,SeaSports} {Greece,WinterSports} {SnowSki} {SnowSki} {Islands,SeaSports} {Greece,SnowSki} Yannis Tzitzikas et al., EJC'2003

  16. The Minus-Product Operation N={{Islands, WinterSports}} S S’ {Sports} {Greece} {Greece} {Sports} {Islands} {Greece,Sports} {SeaSports} {WinterSports} {Islands} {SeaSports} {WinterSports} {Islands,Sports} {Greece,SeaSports} {Greece,WinterSports} {SnowSki} {SnowSki} {Islands,SeaSports} {Greece,SnowSki} Yannis Tzitzikas et al., EJC'2003

  17. The Self-[Plus/Minus]-Product Operations Self-Product Self-Plus-Product Self-Minus-Product Yannis Tzitzikas et al., EJC'2003

  18. The Self-Plus-Product: Example P={{SeaSki,WindSurfing}, {SnowSki, SnowBoard}} S {Sports} {Sports} {SeaSports} {WinterSports} {SeaSports} {WinterSports} {SeaSki} {Windsurfing} {SnowSki} {SnowBoard} {SeaSki} {Windsurfing} {SnowSki} {SnowBoard} {SeaSki,WindSurfing} {SnowSki,SnowBoard} Yannis Tzitzikas et al., EJC'2003

  19. Expressions and Well-formed Expressions The set of expressions over a facet set {F1,…, Fk} is defined according to the grammar: An expression e is well-formed if: (a) each basic compound terminology appears at most once in e, (b) the parameters P/N are subsets of the corresponding genuine compound terms In this way: • no conflicts arise • monotonic behavior Yannis Tzitzikas et al., EJC'2003

  20. Example: Building the catalog of a tourist portal P = {{Iraklio, Furn.Appartments}, {Iraklio,Rooms}, {Ammoudara, Furn. Appartments}, {Ammoudara,Rooms}, {Hersonisson, Furn.Apartments}, {Ammoudara, Bungalows, Jacuzzi}, {Hersonissos,Rooms,Indoor}, {Hersonissos, Bungalows,Outdoor} } |P|=8 N = {{Iraklio, Bungalows}}, P = { {Hersonisson, Rooms, Indoor}, {Hersonissos, Bungalows,Outdoor}, {Ammoudara,Bungalows,Jacuzzi} } |P|+|N|=4 Facilities Accommodation Location Jacuzzi SwimmingPool Furn. Appartments Rooms Bungalows Iraklion Ammoudara Hersonissos Indoor Outdoor 3 facets, 13 terms, 890 compound terms from which only 96 are valid Yannis Tzitzikas et al., EJC'2003

  21. Checking the Validity of a Compound Term Let Se be the compound terminology defined by an algebraic expression e. We provide an algorithm for checking whether s  Se without having to compute (and store) the entire Se. The time complexity for this algorithm is: => Only F and e have to be stored Yannis Tzitzikas et al., EJC'2003

  22. Generating Navigation Trees Islands Crete SeaSports byLocation Mainland Pilio Pilio Sports WinterSports byLocation Mainland Olympus Crete bySports SeaSports Islands bySports SeaSports byLocation Crete byLocation SeaSports Pilio bySports WinterSports Mainland WinterSports Olympus bySports Location Objective: Given an expression e generate dynamically a navigation tree with nodes that correspond to valid compound terms only for using it during object indexing and browsing The navigation tree also contains nodes for facet crossing Yannis Tzitzikas et al., EJC'2003

  23. Application in Web Catalogues Taxonomies of existing catalogs big, incomplete, scalability problems Faceted Taxonomies + Algebra small, clear, scalable dynamically Navigation Trees P|N Yannis Tzitzikas et al., EJC'2003

  24. Prototype Implementation using a RDBMS Three tables are used for storing the faceted taxonomy and the expression e. TERMS SUBSUMPTION PARAMETERS name id term1 term2 F1 F2 ... Fk Architecture Designer Indexer/User Nav. Tree Generator Expression Builder Validity Checker Storage Manager RDBMS Yannis Tzitzikas et al., EJC'2003

  25. Concluding Remarks Faceted Taxonomies : [+]conceptual clarity (it is easier to understand) [+] compactness (it takes less space) [+] scalability (the update operations can be formulated easier and be performed more efficiently) [-]invalid compound terms may appear. The Proposed Algebra : [+]provides a solution to the problem of invalid compound terms [+] Aids indexingandbrowsing (and prevents errors) Yannis Tzitzikas et al., EJC'2003

More Related