Quality taxonomies
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Quality Taxonomies PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on
  • Presentation posted in: General

Quality Taxonomies. Dr. Claude Vogel Founder & CTO KM World 2000. Ontology / Taxonomy. Static Discovery. Root Ontology. Taxonomy Generation. Dynamic Discovery. What is Quality ?. “Best value for the money”

Download Presentation

Quality Taxonomies

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Quality taxonomies

Quality Taxonomies

Dr. Claude Vogel

Founder & CTO

KM World 2000


Ontology taxonomy

Ontology / Taxonomy

Static Discovery

Root Ontology

Taxonomy

Generation

Dynamic Discovery


What is quality

What is Quality ?

  • “Best value for the money”

  • According to this definition, you are entitled to get high performance from a costly product; likewise a low cost product or service is expected to be a poor delivery. For example, a loose demo delivery is both predictable and acceptable, since its quality is: low conformance / low cost.


What is quality1

What is Quality ?

  • “Good Quality is Nominal Conformance”

  • Taxonomy Quality is defined as Taxonomy Conformance to:

    • Valid requirements;

    • Explicitly documented development standards; and,

    • Implicit characteristics that are expected of all professionally developed taxonomies, such as the desire for good maintainability.


Standards

Standards

  • ISO 2788-1986

    • International Organization for Standardization. Documentation—Guidelines for the Establishment and Development of Monolingual Thesauri. 2nd ed. n.p.: ISO, 1986. (ISO 2788-1986(E)). (Available in the U.S. from American National Standards Institute)

  • ISO 5964-1985 

    • International Organization for Standardization. Documentation—Guidelines for the Establishment and Development of Multilingual Thesauri. n.p.: ISO, 1985. (ISO 5964-1985(E)). (Available in the U.S. from American National Standards Institute)

  • ANSI/NISO Z39.19-1993

    • National Information Standards Institute. Guidelines for the Construction, Format, and Management of Monolingual Thesauri. Bethesda, MD: NISO Press, 1994. 69p. (ANSI/NISO Z39.19-1993)

  • SEMIO Quality Plan v1 2000

  • ISO/IEC 13250 Topic Maps

  • RDF

    • Please refer to RDF at http://www.w3.org/RDF and XML at http://www/w3/org/XML


Project plan

Project Plan

  • Kick-off

  • Requirements Review

  • Lexicon Review

  • Taxonomy Review

  • Tags Review

  • Final Review


1 kick off

1. Kick-off

  • Objectives

    • Purpose

    • Scope

    • Scale

    • Users

    • Conditions of receipt

  • Roles

    • Supplier

    • Customer

      • Admin

      • KE

      • Experts

      • Users

  • Planning

  • Training and Transfer


2 requirements review

2. Requirements Review

  • Sources

  • Lexicon

  • Ontology

  • Install


Sources

Sources

  • Dispersion (Multiplicity, Size, Homogeneity)

  • Refresh

  • Access


Typical patterns

Typical Patterns

  • Disparity

    • Adjust sources

    • Adjust crawl strategy

    • Isolate communities / taxonomies


Lexicon

Lexicon

  • Vocabularies, etc.

  • Substitutions: Acronyms, Synonyms, etc.

  • Preferred Keywords: Brand Names, etc.

  • Banned Keywords


Typical patterns1

Typical Patterns

  • Lack of requirements

    • Use Librarian Resources


Ontology

Ontology

  • Thesaurus ?

  • Is the information domain analysis complete, consistent, and accurate ?

  • Is the partitioning of the problem complete ?


Typical patterns2

Typical Patterns

  • Directory versus Taxonomy

    • Isolate “directory” branches

  • Thesaurus versus Taxonomy

    • Put an ontology on top of thesaurus

    • Check ASAP match of thesaurus generics with extracted lexicon

  • Very high level design for top categories requirements

    • Plan to work bottom-up

  • See also Taxonomy (functions, combinations, etc.)


Install

Install

  • Implementation / Integration:

    • Are external and internal interfaces properly defined?

    • Are all requirements traceable to the system level?

    • Has prototyping been conducted for the user/customer?

    • Is performance achievable within the constraints imposed by other system elements?

    • Are requirements consistent with schedule, resources, and budget?


Typical patterns3

Typical Patterns

  • Scale

  • Security

  • Missing Documents


3 lexicon review

3. Lexicon Review

  • Coverage

    • Extracted words / Words

    • (Extracted Index / Index)

  • Sources bench-marking

    • Coverage

    • Extraction quality

    • Topic distribution

  • Structure

    • Most Frequent Phrases

    • Most Productive Generics

  • Substitutions

  • Exceptions


Typical patterns4

Typical Patterns

  • Low level of frequency / quality for the most meaningful content

    • Increase size of value corpus

    • Filter and re-import lexicon


4 taxonomy review

4. Taxonomy Review

  • Taxonomy Operation

    • Correctness

    • Reliability

    • Usability

    • Integrity

    • Efficiency

  • Taxonomy Revision

    • Maintainability

    • Flexibility

    • Testability

  • Taxonomy Transition

    • Portability

    • Reusability

    • Interoperability


Folk taxonomies design

Tax

Liability

Loan

Term loan

Short-term loan

Folk Taxonomies Design

The Berlin and Kay model: Taxonomy = Nomenclature + Terminology

Unique Beginner

Life Form

Generic

Specific

Varietal


Correctness

Correctness

  • Accuracy

  • Completeness

  • Consistency


Accuracy

Accuracy

Precision

Recall


Completeness

Completeness

Taxonomy

Maps

Lexicon

Collection


Concentration works against quality

Tagging

Taxonomy

Maps

Lexicon

Document Collection

Concentration Works Against Quality

  • Tagging Coverage

  • Ontology Coverage

  • Hook Coverage

  • Map Coverage

  • Lexical Coverage

  • Collection Coverage


Consistency typical patterns

Consistency:Typical Patterns

  • Objectivization

  • Hyperonymy

  • Speciation

  • Necessity


Objectivization

Employment

Firing

Hiring

Salaries

Avoid functional categories

Don’t mix functions / objects

Exhaust scripts

Match idiomatic phrases

Objectivization


Genericity

Parts

Air Conditioning

Belts and Hoses

Body

Brake System

Chassis

Engine

Exhaust System

Fuel System

Glass

Ignition

Avoid meronymy

Don’t mix meronymy / hyperonymy

Exhaust prototypes

Genericity


Speciation

Person

Unwelcome person

Unpleasant person

Selfish person

Opportunist

Backscratcher

Avoid “strings” of categories

Avoid (non-idioms) properties for categories

Speciation

(WordNet)


Necessity

Necessity

  • Avoid non-productive categories

  • Avoid combinations of categories


Nomenclature design structure quality index

lf

lf

lf

g

g

g

1

2

n

1

2

i

g

g

g

g

g

g

s

s

s

s

s

s

4

3

4

5

6

m

n

1

2

3

s

s

s

s

5

6

7

8

v

v

1

2

Nomenclature (Design Structure) Quality Index

Balance

UB

Level 0

Level 1

Depth

Level 2

i

j

Level 3

UB = unique beginner

lf = life-form

g = generic

s = specific

v = varietal

Level 4

Width


Complexity index

Complexity Index

  • Cyclometric complexity increases with number of Cross References within the Taxonomy, giving an indication of complexity and difficulty of testing.

  • Taxonomy Complexity Index combines:

    • autonomy

    • closure

    • similarity

    • typicality

    • commonality

    • redundancy

    • stability


Maturity index

Maturity index

  • The IEEE standard 982.1-1988 suggests a taxonomy maturity index to provide an indication of the stability of the taxonomy .

  • Maturity Index combines:

    • number of modules in current ontology / taxonomy.

    • number of modules in current ontology / taxonomy that have been changed.

    • number of modules added to current ontology / taxonomy.

    • number of modules deleted from the previous version of the ontology / taxonomy.


5 tags review

5. Tags Review

  • Document coverage

  • Concepts coverage

<tagset> <document> <docurl>http://www.TaxSource.com</docurl> <tag> <tagname>Liability</tagname> <weight>1.289</weight> </tag> <tag> <tagname>Federal Funds</tagname> <weight>0.746</weight> </tag> </document></tagset>


6 final review

6. Final Review

  • Receipt

  • Maintenance


Quality taxonomies1

Quality Taxonomies

Claude Vogel

[email protected]

KM World 2000


  • Login