1 / 37

The XML-based Enterprise Information Portal Solutions Company

The XML-based Enterprise Information Portal Solutions Company. Extracting Knowledge from XML Documents Using Topic Maps. Eric Freese Director of Professional Services - Midwest Region ISOGEN International/DataChannel Knowledge Technologies 2001 – Austin, TX 7 March 2001. Premise.

ginata
Download Presentation

The XML-based Enterprise Information Portal Solutions Company

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The XML-based Enterprise Information Portal Solutions Company

  2. Extracting Knowledge from XML Documents Using Topic Maps Eric Freese Director of Professional Services - Midwest Region ISOGEN International/DataChannel Knowledge Technologies 2001 – Austin, TX 7 March 2001

  3. Premise • Rules and procedures can be established that allow automated harvesting of information from structured documents (XML) into a knowledge base by using the structure and the relationships between the structural components • Topic maps can be used as the interchange and management model for knowledge bases • New knowledge can be inferred within a knowledge base using defined inference rules

  4. Overview • Late Breaking News • Topic Maps • Knowledge Representation/Semantic Networks • Topic Map Constructs for Semantic Networks • SemanText - Example Application • Conclusions

  5. Late Breaking News • RDF and Topic Maps • XML Topic Maps (XTM)

  6. Topic Maps and Semantic Nets • A Topic Map is a mechanism for describing and representing data about the structure and content of an information set, using topics, associations, and occurrences. • A Semantic Network is a knowledge representation technique consisting of nodes and links.

  7. Topic Maps • ISO/IEC 13250:2000 Document description and processing languages – Topic Maps • TopicMaps.Org – XML Topic Maps (XTM) • Topic maps are optimized for navigation of large amounts of data • They are similar to indexes in the paper publishing world • A topic map can also be compared to a glossary, cross-reference, thesaurus, or catalog

  8. Topics • Topics are the basic building blocks of topic maps • A topic is anything a user wants to describe • A topic can have zero or many links to occurrences within an information set • A topic can be used to aggregate all the information about a subject within the information set • Topics are categorized using topic types • Topics can have multiple types • Types are defined using topics

  9. Family Tree Example Topics

  10. Family Tree Example husband child

  11. Associations • Associations relate topics together • They express a semantic relationship between topics • Association can be defined as an instance of a specific topic • Topics are members of and have roles within associations • Association role types are topics

  12. Family Tree Example is parent of/ is child of is spouse of is sibling of

  13. Occurrences • Occurrences provide links from the topic map into the information set • Occurrences also provide an internal means for describing topics in the topic map • An occurrence can have only one type • Occurrence roles are topics

  14. Topic Scopes and Themes • Themes can be defined which can be used to group topics on a broader scale than types • Themes can also be viewed as filters for topic information • Scopes can be assigned to topic characteristics, associations and occurrences which call the themes into effect • Themes and scopes are used to disambiguate topics

  15. Semantic Network Architecture • A semantic network is drawn as a series of nodes connected by links • Nodes represent objects, concepts, or situations within a specific domain • Links represent relationships between nodes • Specialized computer languages (such as Prolog) have been developed which can model and process the logic within a semantic network • A semantic network can be used as the basis for the development of fact and rules within an expert system

  16. Associative Properties • The links within a semantic network may have the following properties: • Reflexive - topic can have the association applied to itself • Symmetric - association is true no matter the position of the topics – topics are often of the same or related types • Transitive - association can be derived based on other associations

  17. Examples • Reflexive Spouse is married to spouse • Symmetric Husband is married to wife AND Wife is married to husband • Transitive Fathers are parents AND Eric is a father SO Eric is a parent

  18. Semantic Network Relationships • Typically binary – one node at the end of each link • N-ary relationships can be broken down into binary relationships • Austin, Texas is a city in the United States. = • Austin,Texas is a city • Geographic regions (cities) are located in geographic regions (countries) • United States is a country

  19. Topic Maps vs. Semantic Networks • Commonalities between topic maps and semantic networks: • Both are organized into a network of information nodes or modules. • Both allow the user to model links between the nodes. • Both allow the user to attach semantic information to the nodes and the links. • One basic difference: • Topic maps focus on navigation between topics. • Semantic networks focus on the links/associations between the nodes and the knowledge represented by the linked nodes.

  20. Harvesting Knowledge from Structured Information • XML provides a way of attaching semantics to pieces of information through markup • Markup can be used to define or identify topic types • Element names • Attribute values • Associations between different pieces of information can be determined by structural relationships • XPath can be used to denote the structural components

  21. Topic Map Constructs for Semantic Nets • Published Subject Identifiers • Topic Map Templates/Association Templates • Type Hierarchies/Ontologies • Association Types • Association Properties • Association Occurrences • Inference Rules

  22. Published Subject Identifiers (PSIs) • Allows an identifier to be attached to a subject so that it can unambiguously be named and referenced • XTM identifies a core set of PSIs for the main building blocks for topic maps as well as selected association types • Two topics which are related to the same subject are merged automatically http://www.topicmaps.org/xtm/1.0/psi1.xtm#superclass-subclass http://www.topicmaps.org/xtm/1.0/psi1.xtm#superclass http://www.topicmaps.org/xtm/1.0/psi1.xtm#subclass

  23. Templates/Schemas • Define semantics contained within an association • Define constraints on the creation of semantically valid topic map structures • Provide roadmaps for creation of topic map structures • Defined using regular topic maps syntax • Future work may include definition of extents • Cardinality • Time/Date

  24. Templates/Schemas – cont. <topic id="marriage.schema"> <instanceOf><topicRef xlink:href="#association.class"/></instanceOf> <instanceOf><topicRef xlink:href="#schema"/></instanceOf> <baseName><baseNameString>Marriage</baseNameString></baseName> <occurrence> <instanceOf><topicRef xlink:href="#association.property"/></instanceOf> <resourceRef xlink:href="#reflexive"/> </occurrence> <occurrence id="minimum.spouses"> <instanceOf><topicRef xlink:href="#minimum.occurrences"/></instanceOf> <resourceData>2</resourceData> </occurrence> <occurrence id="maximum.spouses"> <instanceOf><topicRef xlink:href="#maximum.occurrences"/></instanceOf> <resourceData>2</resourceData> </occurrence> </topic>

  25. Templates/Schemas – cont. <association> <instanceOf><topicRef xlink:href="#marriage.schema"/></instanceOf> <scope><topicRef xlink:href="#schema"/></scope> <member> <roleSpec><topicRef xlink:href="#spouse"/></roleSpec> <resourceRef xlink:href="#minimum.spouses"/> <resourceRef xlink:href="#maximum.spouses"/> </member> </association>

  26. Type Hierarchies/Ontologies • Hierarchies allow ontologies to be developed by which additional knowledge can inferred simply through hierarchical inheritance • Can use templates to control or enhance the ontology

  27. Type Hierarchies/Ontologies – cont. <topic id="person"> <instanceOf><topicRef xlink:href="#topic.class"/></instanceOf> <baseName><baseNameString>Person</baseNameString></baseName> </topic> <topic id="male"> <instanceOf><topicRef xlink:href="#topic.class"/></instanceOf> <baseName><baseNameString>Male</baseNameString></baseName> </topic> <topic id="eric"> <instanceOf><topicRef xlink:href="#male"/></instanceOf> <instanceOf><topicRef xlink:href="#person"/></instanceOf> <baseName><baseNameString>Eric</baseNameString></baseName> </topic>

  28. Association Types • ISO 13250 implicitly specifies class/instance associations • XTM specifies, through PSIs, class/instance and superclass/subclass • Other examples • Component/object • Member/collection • Portion/mass • Feature/activity • Place/area • Phase/process

  29. Association Properties • Transitivity, reflexivity, symmetry properties can be attached to associations • Allows special processing and understanding to occur when using associations

  30. Association Occurrences • Topic maps center more on topics where other knowledge management schemes concentrate more on associations or relationships between topics • In topic maps, associations can have topics defined which reify them • Reification of associations allows them to have occurrences

  31. Inference Rules • Inference rules allow new topics and associations to be created based on the existence of others • Rules can be stored and managed using topic map syntax

  32. Inference Rules – cont. <association> <instanceOf><topicRef xlink:href="#inference.rule"/></instanceOf> <scope><topicRef xlink:href="#inference.rule.schema"/></scope> <member> <roleSpec><topicRef xlink:href="#inference.rule.condition"/></roleSpec> <topicRef xlink:href="#ir.parent.in.family.N345"/> <topicRef xlink:href="#ir.parent.in.family.N456"/> <topicRef xlink:href="#ir.sibling.in.family.N567"/> </member> <member> <roleSpec><topicRef xlink:href="#inference.rule.statement"/></roleSpec> <topicRef xlink:href="#ir.cousin.N678"/> </member> </association>

  33. SemanText: Using Topic Maps for Knowledge Representation • 100% pure Python system developed to demonstrate the joining of topic maps and semantic networks • Uses tmproc, wxPython, PyXML • Enables creation, modification, querying of topic map structures • Semantic networks structures with entities and relationships • Inference engine built in where user can add rules which create new topic map structures • Development is continuing

  34. Demo

  35. Future SemanText Plans • Implement XTM • Implement scopes, themes • Implement merge – hard vs. soft • Integration with grove-based system to allow point-and-click input from multiple data formats • Hooks to natural language tools • Voice input/output using VoiceML • Graphical output such as VRML or SVG • Textual output such as Open E-book, PalmOS, WML

  36. Conclusions • SemanText demonstrates that information can be harvested using the markup from XML documents in order to build a knowledge base • It demonstrates that the topic map architecture can be used to interchange semantic network information • It also demonstrates that topic maps can be used to feed a semantic network • It demonstrates that topic map syntax can be used to extend the topic map paradigm • Schemas, templates, inference rules

  37. Q & A SemanText available from www.semantext.com Questions or comments welcome at: ISOGEN International/DataChannel 1611 W. County Road B, Suite 204 St. Paul, MN 55113 USA Voice: 1.651.636.9100 - Fax: 1.651.636.9191 eric@isogen.com www.isogen.com - www.datachannel.com

More Related