370 likes | 392 Views
Ontology Building across Heterogeneous Databases. Michael N. Huhns Center for Information Technology University of South Carolina. The Fundamental Problem.
E N D
Ontology Building across Heterogeneous Databases Michael N. Huhns Center for Information Technology University of South Carolina
The Fundamental Problem • We would like to arrange for effective and efficient interactions among large numbers of heterogeneous information components: databases, applications, and interfaces • Difficulties are • Components are incomprehensible, inconsistent, and often unknown in advance • We need to enable updates as well as retrievals • The information environment is open • We need to consider process and policy, as well as structure University of South Carolina
Needs and Applications • Heterogeneous database access and management • Information search, retrieval, and fusion • Workflow automation • Agent communication • Information management: consistency • Distributed collaboration • Distance education University of South Carolina
Emerging Solution: A Cooperative Information System Agent Application Application Application Agent Agent Agent Application Agent Agent Agent E-Mail System Agent Agent Workflow System Database System Web System University of South Carolina
User Agent Resource Agent User Agent Resource Agent User Agent Resource Agent Another View of CIS Middleware: Mediators, Brokers, Facilitators, Ontologies, and Registries
(de facto) Standard Agent Types and Architectures Application Program User Interface Agent MCC InfoSleuth CMU RETSINA SRI OAA USC-ISI SIMS & TeamCore Global InfoTek Grid Reply Reg/Unreg (KQML) Reply Query or Update (SQL) Ontology Agent Broker Agent Reg/Unreg (KQML) Mediator Agent Ontology (OKBC) Reg/Unreg (KQML) Registry Agent Mediated Query (SQL) Reg/Unreg (KQML) Schemas (CLIPS) 11179 Registry Mediated Query (SQL) Reply Reply Database Resource Agent Database Resource Agent SQL (JDBC) University of South Carolina
Implementing the Agent Architecture • How to build an agent • How to construct an ontology University of South Carolina
Models for Database #1 Title Phone Name Person coAuthors Document (1,N) (1,N) SSN Per_cent Relational Model Person (SSN , Name, Phone) CoAuthors (SSN, Title, Per_cent) Document (Title) University of South Carolina
Models for Database #2 Title EID Name Employee fillsOut ComplianceForm (1,1) (1,N) SSN Phone Relational Model Employee (EID , Name, SSN) ComplianceForm (Title, EID) University of South Carolina
Thing Class of All Class of All Entity Relations Attributes Person Person Name Attributes Person SSN Employee ID Full-Time Part-Time Full-Time Employee Employee Employee Attributes Attributes Domain Ontology Document Relations Person Document Document Attributes Coauthors ComplianceForm Employee Employee Document Title FillsOut Attributes Part-Time Employee University of South Carolina
Semantic Mappings Common Ontology Application 1 Interface 1 Entity Articulation Axiom 3 Mappings are sentences in some logical language, e.g., KIF, Loom, CLIPS Articulation Axiom 1 Document Person Boat Homemaker Employee Minor Articulation Axiom 4 Articulation Axiom 2 DB1 DB2 Person Employee SSN Name EID Name University of South Carolina
Ontologies and DBs • An ontology specifies the intended meaning of concepts in a database: DB Schema: Table: PartsPrice *stockNo: integer cost: float Ontology: price(x,y) => $ (x’,y’)[automobile_part(x’) & stock_no(x’) = x & retail_price(x’,y’) & magnitude(y’,US_dollars)=y] University of South Carolina
Semantic Translation Semantic Translation by Mappings by Mappings Semantic Translation Semantic Translation by Mappings by Mappings Semantic Translation by Mappings DB1 DB1 DB1 Semantic Translation User Application 1 Application n Agent for Application Agent for Application Common Enterprise-Wide View Agent for Resource Agent for Resource Agent for Resource University of South Carolina
Workflow Automation of Telecommunication Service Provisioning User Interface Agent Transaction Scheduling Agent User + Application Schedule Repairing Agent Schedule Processing Agent ESS ESS . . . Switch DB LFACS DB TIRKS DB University of South Carolina
Example Workflow in Telecommunications Service Request Span in Place? Service Order Create Bill LFACS TIRKS FEPS Switch TIRKS TIRKS NSDB WFA University of South Carolina
Semantic Model for Interface Agent id* date name* phone Service Order Ordered by Customer Orders quantity Circuit type aLocation zLocation University of South Carolina
Dimensions of Heterogeneity: Structure • Schemas and views, e.g., securities are stocks • Specializations and generalizations of domain concepts, e.g., stocks are a kind of liquid asset • Value maps, e.g., S&P A+ rating corresponds to Moody’s A rating • Semantic data properties, sufficient to characterize the value maps, e.g., prices on the Madrid Exchange are daily averages rather than closing prices • Cardinality constraints • Integrity constraints, e.g., each stock must have a unique SEC identifier • Data value ranges, e.g., Price > 0 • Allow or disallow “maybe values” for data University of South Carolina
Dimensions of Heterogeneity: Process • Procedures, i.e., how to process information (e.g., how to decide what stock to recommend) • Preferences for accesses and updates in case of data replication (based on recency or accuracy of data) • Preferences to capture view update semantics • Contingency strategies, e.g., whether to ignore, redo, or compensate • Contingency procedures, i.e., how to compensate transactions • Flow, e.g., where to forward requests or results • Temporal constraints, e.g., report tax each quarter University of South Carolina
Dimensions of Heterogeneity: Policy • Security, i.e., who has rights to access or update what information? (e.g., customers can access all of their accounts, except blind trusts) • Authentication, i.e., a sufficient test to establish identity (e.g., passwords, retinal scans, or smart cards) • Bookkeeping (e.g., logging all accesses) University of South Carolina
Definition • Ontology: a representation of knowledge specific to some universe(s) of discourse • Ontology: an agreement about a shared conceptualization, which includes conceptual frameworks for modeling domain knowledge and agreements about the representation of particular domain theories University of South Carolina
Key Words • Each document is characterized by a set of key words • The union of the sets is the domain of discourse for the documents • Advantages: • simple • domain independent methods exist (can be automated) • good for organizing heterogeneous text • Disadvantages: • not appropriate for data • “this is about X” vs. this is not about X” • key words are not organized University of South Carolina
Alta-Vista“Way-Cool Topic Graph” University of South Carolina
Thesaurus • Organizes key words based on synonyms and antonyms • WordNet: (http://www.cogsci.princeton.edu/~wn/) groups words into synonym sets, and relates the sets via hypernymy/hyponymy, antonymy, entailment, and meronymy/holonymy University of South Carolina
Taxonomies • A hierarchical organization of concepts, based on set-subset relationships. Biologists organize the plant and animal kingdoms using taxonomies University of South Carolina
Ontologies • A semantic net (a generalization of a taxonomy, allowing other relationships than subset) consisting of types of entities, attributes and properties, relations and functions, and constraints hasPart Car Wheel (= #wheels 4) subclass Convertible University of South Carolina
Ontology Development • Bottom-Up from Schemas and Key Words • identify databases • identify names for all tables, fields, and enumerated values (e.g., if value is limited to a primary color “red”, “green”, or “blue”) • form groups of common concepts and assign name to covering concept for each group • iterate; or Extensional View: form classes from instances University of South Carolina
Ontology Development • Top-Down from First Principles (intensional view): a class is defined by a set of membership conditions or properties • Restrictions on Class Formation: • a class must have instances • a class must contain all properties common to the instances in its extension • classification should obey cognitive economy--instances of a class must share some, but not all properties • classification should enable inference of properties based on class membership University of South Carolina
Ontology Development (cont.) • Restrictions on Class Structures: • Completeness--every property must be used in the definition of at least one class • Nonredundancy--a subclass must be defined by at least one property not in any of its superclasses (the result is that a subclass is always a specialization of any of its superclasses, i.e., it has more properties or restrictions, and has fewer instances) University of South Carolina
Tools for Developing Ontologies • Ontolingua and Chimaera (Stanford) • SHOE: Simple HTML Ontology Extension language (U. of Maryland) • JOE: Java Ontology Editor (U. of South Carolina) • IMTS (MCC) • Cyc Unit Editor • UML, ER, and Conceptual Modeling Tools University of South Carolina
Classification Is Difficult! From the ancient Chinese encyclopedia Celestial Emporium of Benevolent Knowledge, “It is written that animals are divided into • belonging to the emperor • embalmed • tame • sucking pigs • sirens • fabulous • stray dogs • included in the present classification • frenzied • innumerable • drawn with a very fine camel-hair brush • et cetera • having just broken the water pitcher, and • that from a long way off look like flies.”
JOE (Editor Mode) University of South Carolina
JOE (Query Mode) Partial Query University of South Carolina
Future Applications • Information gathering, presentation, and management in large, heterogeneous, open environments: Internet and intranets • Energy distribution and management • Electronic commerce • Smart vehicles and smart highways • Inventory management and logistics • Smart houses and buildings • Active, distributed, and intelligent data dictionaries containing • constraints, and constraint enforcement • business rules, and rule processing • business processes, and process enactment • business semantics, and semantics resolution • Cooperative mobile sensing • Software engineering: Interaction-Oriented Programming • Distance learning University of South Carolina
Logistics Domain Ontology name Army Brigade is-part-of is-part-of supports name isa maintains quantity Forward-Support-Battalion Military-Unit ress-code isa consist-of War-Reserves isa is-authorized-to class isa maintains Stock type Main-Support-Battalion Direct-Support-Unit name type quantity has-as-part stored-in Storage isa Stock-Item fsc-code name Mobile-Storage located-in name isa niin University of South Carolina Geographic-Area
X3L8 Taxonomy University of South Carolina
Topic Trees, Ontologies, and Database Schemas MiG29 Weapon price designer Number Person People Terms Air Sea expertIn Mikoyan r73 mig29 sirena Fighter Bomber speed weight ivan artem mikoyan Person DOB Specialty Fighter Speed Weight Price University of South Carolina
Cyc THING COLLECTION INDIVIDUAL OBJECT SITUATION STUFF TYPE TANGIBLE INTANGIBLE OBJECT TYPE TEMPORAL OBJECT STATICSITUATION GROUP TEMPORALSTUFFTYPE TIME INTERVAL TEMPORALOBJECTYPE GROUP TIMEINTERVAL EXISTINGSTUFFTYPE EXISTINGOBJECTYPE EVENT SOMETHINGEXISTING CONFIGURATION FOODGROUPTYPE BIRTHEVENT HOLIDAY TEXTUALMATERIAL STATICSITUATION University of South Carolina