foundations i methodologies knowledge representation n.
Skip this Video
Loading SlideShow in 5 Seconds..
Foundations I: Methodologies, Knowledge Representation PowerPoint Presentation
Download Presentation
Foundations I: Methodologies, Knowledge Representation

Foundations I: Methodologies, Knowledge Representation

75 Views Download Presentation
Download Presentation

Foundations I: Methodologies, Knowledge Representation

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Foundations I: Methodologies, Knowledge Representation Professor Deborah McGuinness TA-Weijing Chen Other lectures from Professor Peter Fox, Professor Joanne Luciano, grad student Jim McCusker, and possibly others from CSCI 6962 - 01, 86933 , CSCI 4969 - 01, 87927 ITWS 6960 - 01, 87198 , ITWS 4969 - 01, 87928 Week 2, September 12, 2011

  2. Review of reading Assignment 1 • Ontologies 101, Semantic Web, e-Science, RDFS, OWL guide • Any comments, questions? • One pass around room on highlights

  3. Contents • Review of methodologies • Elements of KR in semantic web context • And in e-Science • Choices of representation, models • Examples of KR • Encoding and understanding representations • Assignment 1

  4. Semantic Web Methodology and Technology Development Process • Establish and improve a well-defined methodology vision for Semantic Technology based application development • Leverage controlled vocabularies, et c. Adopt Technology Approach Leverage Technology Infrastructure Science/Expert Review & Iteration Rapid Prototype Open World: Evolve, Iterate, Redesign, Redeploy Use Tools Evaluation Analysis Use Case Develop model/ ontology Small Team, mixed skills

  5. KR and methodologies • Procedural Knowledge: Knowledge is encoded in functions/procedures. This can be viewed as hard coded and less flexible. E.g.: function Person(X) return boolean is if (X = ``Socrates'') or (X = ``Hillary'') then return true else return false; OR function Mortal(X) return boolean is return person(X); • Networks: A compromise between declarative and procedural schemes. Knowledge is represented in a labeled, directed graph whose nodes represent concepts and entities, while its arcs represent relationships between these entities and concepts.

  6. KR and methodologies 6 • Frames: Much like a semantic network except each node represents prototypical concepts and/or situations. Each node has several property slots whose values may be specified or inherited. • Logic: A way of declaratively representing knowledge. For example: • person(Socrates). • person(Hillary). • forall X [person(X) ---> mortal(X)] • DL, FOL, HOL

  7. KR and methodologies • Decision Trees: Concepts are organized in the form of a tree. • Statistical Knowledge: The use of certainty factors, Bayesian Networks, Dempster-Shafer Theory, Fuzzy Logics, ..., etc. • Rules: The use of Production Systems to encode condition-action rules (as in expert systems).

  8. KR and methodologies 8 Parallel Distributed processing: The use of connectionist models. Subsumption Architectures: Behaviors are encoded (represented) using layers of simple (numeric) finite-state machine elements. Hybrid Schemes: Any representation formalism employing a combination of KR schemes.

  9. Remember, in any knowledge encoding • Some of the knowledge is lost when it is placed into any particular representation structure, or may not be reusable (e.g. Frames) • So, you may ask something that cannot be answered or inferred • Knowledge evolves, i.e. changes • Knowledge and understanding is very often context dependent (and discipline, language, and skill-level dependent, and …)

  10. And, if you are used to logic • You are working mostly within the world of logic, whereas we are trying to represent knowledge with logic and we are usually dealing with tangible objects, such as trees, clouds, rock, storms, etc. • Because of this, we have to be very careful when translating real things into logical symbols - this can, surprisingly, be a difficult challenge. • Consider your method of representation (yes, we do want to compute with it)

  11. Thus • A person who wants to encode knowledge needs to decouple the ambiguities of interpretation from the mathematical certainty of (any form of) logic. • The nature of interpretation is critical in formal knowledge representation and is carefully formalized by KR scientists in order to guarantee that no ambiguity exists in the logical structure of the represented knowledge.

  12. Representing Knowledge With Objects • Take all individuals that we need to keep track of and place them into different buckets based on how similar they are to each other. Each bucket is given a description based on what objects it contains. • Since the individuals in a given bucket are at least somewhat similar, we can avoid needing to describe every inconsequential detail about each individual. Instead, properties that are common to all individuals in a bucket can just be assigned to the entire bucket at once. Properties are typically either primitive values (such as numbers or text strings) or may be references to other buckets.

  13. Representing Knowledge With Objects • Some buckets will be more similar to each other than others and we can arrange the buckets into a hierarchy based on the similarity. • If all buckets in a branch in the tree of buckets share a property, the information can be further simplified by assigning the property only to the parent bucket. Other buckets (and individuals) are said to inherit that property. • Buckets may have different names: e.g. Classes, Frames, or Nodes • BUT, once we move to (e.g.) DL, not all object rules apply, e.g. cannot override properties • Multiple inheritance is not always obvious to people

  14. Re-enter Semantic Web At its core, the Semantic Web can be thought of as a methodology for linking pieces of structured and unstructured information into commonly-shared description logics ontologies.

  15. Semantic Web Layers,

  16. Elements of KR in Semantic Web • Declarative Knowledge • Statements as triples: {subject-predicate-object} interferometer is-a optical instrument Fabry-Perotis-a interferometer Optical instrumenthas focal length Optical instrument is-ainstrument Instrumenthas instrument operating mode Instrument has measured parameter Instrument operating modehas measured parameter NeutralTemperature is-atemperature Temperature is-aparameter • A query: select all optical instruments which have operating mode vertical • An inference: infer operating modes for a Fabry-Perot Interferometer which measures neutral temperature

  17. Ontology Spectrum Thesauri “narrower term” relation Selected Logical Constraints (disjointness, inverse, …) Frames (properties) Formal is-a Catalog/ ID Informal is-a Formal instance General Logical constraints Terms/ glossary Value Restrs. Originally from AAAI 1999- Ontologies Panel by Gruninger, Lehmann, McGuinness, Uschold, Welty; – updated by McGuinness. Description in:

  18. OWL or RDF or OWL 2 RL? • In representing knowledge you will need to balance expressivity with implementability • OWL (Lite, DL, Full) 1 or 2 and if OWL 2, then which profile? • RDF and RDFS • Rules, e.g. SWRL or OWL 2 RL • You will need to consider the sources of your knowledge • You will need to consider what you want to do with the represented knowledge

  19. The knowledge base • Using, Re-using, Re-purposing, Extending, Subsetting • Approach: • Bottom-up (instance level or vocabularies) • Top-down (upper-level or foundational) • Mid-level (use case) • Coding and testing (understanding) • Using tools (some this class, more over the next two classes) • Iterating (later) • Maintaining and evolving (curation, preservation) (later)

  20. ‘Collecting’ the ‘data’ • Part of the (meta)data information is present in tools ... but thrown away at output e.g., a business chart can be generated by a tool: it ‘knows’ the structure, the classification, etc. of the chart,but, usually, this information is lost storing it in web data would be easy! • Semantic Web-awaretools are around (even if you do not know it...), though more would be good: • Photoshop CS stores metadata in RDF in, say, jpg files (using XMP) • RSS 1.0 feeds are generated by (almost) all blogging systems (a huge amount of RDF data!) • Scraping - different tools, services, etc, come around every day: • get RDF data associated with images, for example: service to get RDF from flickr images • service to get RDF from XMP • XSLT scripts to retrieve microformat data from XHTML files • RSS scraping in use in Virtual Observatory projects in Japan • scripts to convert spreadsheets to RDF • SQL - A huge amount of data in Relational Databases • Although tools exist, it is not feasible to convert that data into RDF • Instead: SQL ⇋ RDF ‘bridges’ are being developed: a query to RDF data is transformed into SQL on-the-fly

  21. More Collecting • RDFa (formerly known as RDF/A) extends XHTML by: • extending the link and meta to include child elements • add metadata to any elements (a bit like the class in microformats, but via dedicated properties) • It is very similar to microformats, but with more rigor: • it is a general framework (instead of an メagreementモon the meaning of, say, a class attribute value) • terminologies can be mixed more easily • GRDDL - Gleaning Resource Descriptions from Dialects of Languages • ATOM - XML-based Web content and metadata syndication format (used with RSS)

  22. Foundational Ontologies Domain independent concepts and relations physical object, process, event,…, participates,… • (Usually) Rigorously defined formal logic, philosophical principles, highly structured • Examples • DOLCE – Descriptive Onotology for Linguistic and Cognitive Engineering • SUMO – Suggested Upper Merged Ontology • CYC Upper Level Ontology BFO – Basic Formal Ontology GFO – General Formal Ontology (developed by Onto Med)

  23. “…and then there was one…” Foundational ontology Geophysics ontology Marine ontology Water ontology Planetary ontology Geology ontology Struc ontology Rock ontology Foundational Ontologies PURPOSE: help integrate domain ontologies Courtesy: Boyan Brodaric

  24. “…a place for everything, and everything in its place…” Foundational ontology shale rock formation lithification Foundational Ontologies PURPOSE: help organize domain ontologies Courtesy: Boyan Brodaric

  25. Problem scenario • Little work done on linking foundational ontologies with geoscience ontologies • Such linkage might benefit various scenarios requiring cross-disciplinary knowledge, e.g.: water budgets: groundwater (geology) and surface water (hydro) hazards risk: hazard potential (geology, geophysics) and items at threat (infrastructure, people, environment, economic) health: toxic substances (geochemistry) and people, wildlife many others… Courtesy: Boyan Brodaric

  26. DOLCE - Descriptive Ontology for Linguistic and Cognitive Engineering

  27. SUMO - Standard Upper Merged Ontology • Physical • Object • SelfConnectedObject • ContinuousObject • CorpuscularObject • Collection • Process • Abstract • SetClass • Relation • Proposition • Quantity • Number • PhysicalQuantity • Attribute

  28. BFO – Basic Formal Ontology Snap comes from a snapshot at any given time

  29. Span comes from spanning time; sometimes considered a 4D description

  30. Using SNAP/ SPAN

  31. SWEET 2.0 Modular Design • Supports easy extension by domain specialists • Organized by subject (theoretical to applied) • Reorganization of classes, but no significant changes to content • Importation is unidirectional Math, Time, Space Basic Science Geoscience Processes Geophysical Phenomena Applications importation

  32. SWEET 2.0 Ontologies

  33. Using SWEET • Plug-in (import) domain detailed modules • Lots of classes, few relations (properties) • Version 2.0 is re-usable and extensible

  34. Mix-n-Match • The hybrid example: • Collect a lot of different ontologies representing different terms, levels of concepts, etc. into a base form: RDF

  35. Mid-Level: Developing ontologies • Use cases and small team (7-8; 2-3 domain experts, 2 knowledge experts, 1 software engineer, 1 facilitator, 1 scribe) • Identify classes and properties (leverage controlled vocab.) • Start with narrower terms, generalize when needed or possible • Adopt a suitable conceptual decomposition (e.g. SWEET) • Import modules when concepts are orthogonal • Review, vet, publish • Only code them (in RDF or OWL) when needed (CMAP, …) • Ontologies: small and modular

  36. Use Case example • Plot the neutral temperature from the Millstone-Hill Fabry Perot, operating in the non-vertical mode during January 2000 as a time series. • Plot the neutral temperaturefrom the Millstone-HillFabry Perot, operatingin thenon-vertical modeduringJanuary 2000as atime series. • Objects: • Neutral temperature is a (temperature is a) parameter • Millstone Hill is a (ground-based observatory is a) observatory • Fabry-Perot is a interferometer is a optical instrument is a instrument • Non-vertical mode is a instrument operating mode • January 2000 is a date-time range • Time is a independent variable/ coordinate • Time series is a data plot is a data product

  37. Class and property example • Parameter • Has coordinates (independent variables) • Observatory • Operates instruments • Instrument • Has operating mode • Instrument operating mode • Has measured parameters • Date-time interval • Data product

  38. Higher level use case • Find data which represents the state of the neutral atmosphere above 100km, toward the arctic circle at any time of high geomagnetic activity • Find data which represents the state of the neutral atmosphereabove100km, toward the arctic circle at anytime of high geomagnetic activity

  39. Extending the KR for a purpose GeoMagneticActivity has ProxyRepresentation GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere) Kp is a GeophysicalIndex hasTemporalDomain: “daily” hasHighThreshold: xsd_number = 8 Date/time when KP => 8 Specification needed for query to CEDARWEB Instrument Parameter(s) Operating Mode Observatory Date/time Return-type: data • Input • Physical properties: State of neutral atmosphere • Spatial: • Above 100km • Toward arctic circle (above 45N) • Conditions: • High geomagnetic activity • Action: Return Data

  40. Translating the Use-Case - ctd. NeutralAtmosphere is a subRealm of TerrestrialAtmosphere hasPhysicalProperties: NeutralTemperature, Neutral Wind, etc. hasSpatialDomain: [0,360],[0,180],[100,150] hasTemporalDomain: NeutralTemperature is a Temperature (which) is a Parameter Specification needed for query to CEDARWEB Instrument Parameter(s) Operating Mode Observatory Date/time Return-type: data Input Physical properties: State of neutral atmosphere Spatial: Above 100km Toward arctic circle (above 45N) Conditions: High geomagnetic activity Action: Return Data FabryPerotInterferometer is a Interferometer, (which) is a Optical Instrument (which) is a Instrument hasFilterCentralWavelength: Wavelength hasLowerBoundFormationHeight: Height ArcticCircle is a GeographicRegion hasLatitudeBoundary: hasLatitudeUpperBoundary: GeoMagneticActivity has ProxyRepresentation GeophysicalIndex is a ProxyRepresentation (in Realm of Neutral Atmosphere) Kp is a GeophysicalIndex hasTemporalDomain: “daily” hasHighThreshold: xsd_number = 8 Date/time when KP => 8

  41. Knowledge representation - visual • UML – Universal Modeling Language • Ontology Definition Metamodel/Meta Object Facility (OMG) for UML • Provides standardized notation • CMAP Ontology Editor (concept mapping tool from IHMC - ) • Drag/drop visual development of classes, subclass (is-a) and property relationship • Read and writes OWL • Formal convention (OWL/RDF tags, etc.) • White board, text file

  42. Representing processes

  43. Is OWL/RDF the only option? No… • SKOS - Simple Knowledge Organization Scheme for Taxonomies • Annotations (RDFa) – for un- or semi-structured information sources • Atom (and RSS) – for representing syndication feeds – structured • More expressive languages IKL, CL, … • Languages aimed at different paradigms – e.g., rule languages

  44. Query • Querying knowledge representations in OWL and/or RDF • SPARQL for RDF and • OWL-QL (for OWL) • XQUERY (for XML) • SeRQL (for SeSAME) • RDFQuery (RDF) • Few as yet for natural language representations

  45. Best practices (some) • Ontologies/ vocabularies must be shared and reused -, bioportal, OOR • Examine ‘core vocabularies’ to start with • SKOS Core: about knowledge systems • Dublin Core: about information resources, digital libraries, with extensions for rights, permissions, digital right management • FOAF: about people and their organizations • SIOC: about communities • DOAP: on the descriptions of software projects • DOLCE seems the most promising to match science ontologies • Go “Lite” as much as possible, then increasing logic - balancing expressibility vs. implementability • Minimal properties to start, add only when needed