  1. Dynamic Classification Workshop Roadmap & Quality Metrics Claude Vogel

  2. Outline = Roadmap • Definitions • Step by step • Phase 1 • Taxonomy design [QA] • Implementation & Tests • Lexicon extraction [QA] • Meta data generation [QA] • Phase 2 • Classification design [QA] • Implementation & Tests • Portal generation [QA] • Conclusion

  3. Your Problem • Hit lists are inefficient • Information is unstructured • Information structure is irrelevant

  4. Define “Find” • I’m looking for an “APARTMENT in CARLSBAD” Apartment Studio Carlsbad Oceanside Oceanside • I end up with a STUDIO in OCEANSIDE • “Find” is a result, not a starting point • Find is not: Search + Retrieval system • Find is a dynamic process

  5. Relate available information to OUR decision-making processes

  6. Dynamic Classification Rationale: Associate a semantic signature to structured and unstructured sources, then use this semantic representation to slice n’ dice sources. • Example 1 : Endeca • Meta-data index • Parametric classification • Example 2: Convera • Taxonomic index • Topical classification

  7. Reduce Complexity Domestic Sales and Marketing ? Jobs and Marketing ?

  8. Bonus Domestic Sales Marketing Jobs Categorize…

  9. Bonus Domestic Sales Marketing Jobs …And Classify! Domestic Sales and Marketing ?

  10. Bonus Domestic Sales Marketing Jobs …And Classify Again! Jobs and Marketing ?

  11. TAGS Leverage K-Assets

  12. Africa Munitions Genus to species Somalia Bombs Categories = Essential Knowledge “A reasonably stable definition of the basic components of the world”

  13. Africa Missiles Whatever Missiles Africa Classification = Accidental Knowledge “A relevant answer to a practical problem”

  14. A Twofold Process • Taxonomy driven categorization • Steady • Accurate • Scalable • Classification driven user interface • Flexible • Relevant • Focused

  15. Glossary • Paradigmatic models • Ontology, Taxonomy • Practical models • Inventory, Catalog, Classification

  16. Mammals Carnivora Canidae Canids Boxer … It stands about 56 to 61 cm (about 22 to 24 in) high and weighs about 30 kg (about 66 lb) Source: Microsoft Encarta. The Semiotic Triangle Concept Word Reference “Boxer” Boxer

  17. Taxonomy Lexicon Catalog Lexicon, Taxonomy, Catalog

  18. Ontology • An ontology is a foundation of categories representing a view of the world. An ontology reflects the commonly used and trusted breakdown of categories. For example, the breakdown of news items into categories of ‘World’, ‘Sports’, ‘Politics’, etc. is ontological.

  19. Taxonomy • A taxonomy is a hierarchical system describing genera and species. Species derive from a common genus and are hierarchically represented according to their essential characteristics and differences. For example, animals are categorized with the "Taxonomy of Life" which separates mammals from birds and spiders from insects, based on proper features and relative differences. This genus to species nomenclature is highlighted by terminology which moves from generic terms to binomial terms through lexical derivation and compounding. • A taxonomy doesn’t deal with things, but with the essence of things: a taxonomy is based on an ontology.

  20. Inventory, Catalog • Inventory • List of things which stand for themselves, as they are, where they are. • Catalog • Consolidated inventory, introducing for that purpose some kind of elementary classification. • In both cases, the things listed have a unique and non-ambiguous name: e.g. URL, serial number, etc.

  21. Classification • Arrangement of things according to some of their properties • Arrangement of types of things according to some of their properties Multiple classification systems might combine multiple ontologies in multiple ways. Things might have multiple locations in any given classification.

  22. Thesaurus Nomenclature

  23. Glossary • ANSII/NISO Z39.19-1993 • A thesaurus is a controlled vocabulary arranged in a known order and structured so that equivalence, homographic, hierarchical, and associative relationships among terms are displayed clearly and identified by standardized relationship indicators that are employed reciprocally. • The primary purposes of a thesaurus are (a) to facilitate retrieval of documents and (b) to achieve consistency in the indexing of written or otherwise recorded documents and other items, mainly for postcoordinate information storage and retrieval systems.

  24. Outline = Roadmap • Definitions • Step by step • Phase 1 • Taxonomy design [QA] • Implementation & Tests • Lexicon extraction [QA] • Meta data generation [QA] • Phase 2 • Classification design [QA] • Implementation & Tests • Portal generation [QA] • Conclusion

  25. Terrorism Geography Weapons Vertical Cartridges PlugandPlay

  26. Africa Algeria Angola Asia Afghanistan Armenia Europe Albania Andorra Middle East Bahrain Iran North and Central America Antigua and Barbuda Bahamas Pacific Australia Fiji South America Argentina Bolivia U.S. Alabama Alaska Example 1: Geography

  27. Example 2 : Defense Defense Communications Satellite Communications Tactical Communications Defense Systems Air Defense Antiaircraft Defense Systems Gun Air Defense Systems Antimissile Defense Systems Forward Area Air Defense Systems Terminal Defense Aircraft Defense Systems Antisubmarine Defense Systems Antiswimmer Defense Systems Countermeasures Acoustic Countermeasures

  28. Taxonomy Design Canon Ordnance Unique Beginner Fire Control Systems Life Form Sights Generic Gun Sights Specific Radar Gun Sights Varietal

  29. Example: Breads

  30. Ontology Proliferation

  31. Mass Nouns • Linnaeus: Higher taxa are artefacts: “ An order is a subdivision of classes needed to avoid placing together more genera than the mind can follow.” Philosophia Botanica • Some life-form categories are created to group objects together. Terms associated to these are often mass nouns (versus count nouns) like “furniture”: “a kind of things of different kinds made by people to etc.”

  32. Person Unwelcome person Unpleasant person Selfish person Opportunist Backscratcher Synonyms (WordNet)

  33. Cycles • Life-form Genus • Species Life-form (mass noun) Genus (having derivate forms) Species (derivates from genus)

  34. Ontology Vacuum Acceptance Product Acceptance Accountability Social Responsibility Social Investing Accountants Public Accountants Cpas Attorney Cpas Accounting Firms Big Five Accounting Firms Big Six Accounting Firms

  35. Unbalanced derivation Acceptance Product Acceptance Accidents Accident Prevention Aircraft Accidents and Safety Air Traffic Control Hijacking Boating Accidents and Safety Construction Accidents and Safety Electrocutions Falls Firearm Accidents and Safety Household Accidents and Safety Nuclear Accidents and Safety Occupational Accidents Industrial Accidents Occupational Safety Indoor Air Quality Railroad Accidents and Safety Ship Accidents and Safety Lighthouses Swimming Accidents and Safety Drownings Traffic Accidents and Safety Hit and Run Accidents

  36. Duplicated Paths = Classification schema

  37. Tax payers Assets Liabilities Individuals Organizations Debts Loans Assoc. Corporations Split Paradigms in Multiple Taxonomies Tax items