1 / 39

Deploying Semantic Technologies for Digital Publishing

Deploying Semantic Technologies for Digital Publishing. A Case Study from Logos Bible Software Sean Boisen (sean@logos.com). Slides at: http://semanticbible.org/other/presentations/2007-SemTech/. Outline. Background: application and motivation Scope and Overview Technical Challenges:

annot
Download Presentation

Deploying Semantic Technologies for Digital Publishing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deploying Semantic Technologies for Digital Publishing A Case Study from Logos Bible Software Sean Boisen (sean@logos.com) Slides at: http://semanticbible.org/other/presentations/2007-SemTech/

  2. Outline Background: application and motivation Scope and Overview Technical Challenges: Reification for provenance data Converting legacy data Tools for knowledge extension Future directions

  3. Who Am I? 19 years with BBN Technologies Information extraction, human language technology Scientist, technology manager Semantic Web hobbyist Senior Information Architect at Logos One-man semantic band

  4. The Importance of the Bible as a Semantic Domain The most widely distributed book 35M Bibles and Testaments in 2005 The most widely translated work > 2000 languages 41 languages at www.biblegateway.com Spans 1000s of years of ancient history

  5. Logos Bible Software High-end desktop digital library > 7000 titles Resources in a dozen languages Users in 180 countries Extensive cross-indexing and hyper linking Leading publisher and developer of digital resources for Bible study

  6. Logos Value Digital library with hyperlinked references and citations Information integration for navigation, search Support for original languages Search New content to enrich Bible study

  7. The Bible Knowledgebase (BK) A machine-readable knowledgebase of semantically-organized Bible data In OWL Linked to Biblical texts Search, navigation, visualization Relationships support discovery and exploration Reusable content (unlike prose) Integration framework for library resources (future) Today: named people and places, and their relationships Tomorrow: chronology, events, concepts, non-named things, key terms, topics, …

  8. Approach Build on Semantic Web standards Model the domain rather than annotate texts Layer knowledge: first entities, then relationships Be conservative in what we assert and provide references as evidence Try to avoid philosophy and focus on end-user value

  9. The Semantic Value Proposition Identify and disambiguate entities (beyond names) 30 people named Zechariah Jesus’ disciple: Peter, Simon, Simeon, Cephas … Judah: person, tribe, territory Link reference information to passages for background Provide a rich set of relationships to encourage exploration and discovery Provide consistent cross-resource indexing Leverage third-party tools Provide scalability Avoid reinventing the wheel

  10. User Benefits Disambiguation makes search work better Passage guide displays relevant entities to provide background information Relationships encourage browsing and exploration Visualization makes complex information easier to grasp

  11. Development Tools Ontology development and instance creation with Protégé Legacy data conversion and data merging through XSLT Storage in Sesame Some integration code in Python for loading and querying RDF TBD

  12. Most Important BK Classes • > 60 classes in all (not counting reified relationships) • Many upper classes are not instantiated • General coordination of class names with SUMO • But not true re-use

  13. BK Classes for Places

  14. BK Abstract Classes

  15. BK Instances ~100k triples ~3000 people instances Aaron to Zurishaddai Names (various languages) ~20k passage references for assertions 90 cities, other places Ethnicities, belief systems, languages, social roles, organizations

  16. Major BK Relationships Domain Property Range Member of Group Family Relationships Human Human Knows, collaborates, antagonist, enemy Social role, Ethnicity, Belief (attributes) Region Native, resident, visited place Region Subregion Latitude, longitude, etc. Geolocation data And inverse relationships …

  17. Challenge: Assertions about Properties • Provenance is important to the domain and application • Problem: how to make assertions about properties • <#John.3, isFatherOf, #Peter>: says who? isFatherOf #Peter #John.3 hasFather hasFather isFatherOf #Andrew.1

  18. Reification Merriam-Webster: “to regard (something abstract) as a material or concrete thing “ Model the relationship between instances as an instance itself

  19. Reified Relationships • Solution: make the relationship an object about which we can make assertions • All “simple” properties get more complex #John.3 _parent_ Peter #Peter #John.3 #John.3 _parent_ Andrew.1 #Andrew.1 isFatherOf isSonOf isSonOf hasFather “bible.64.1.42” isFatherOf hasSon hasSon “bible.64.21.15” reference “bible.64.21.16” hasFather “bible.64.21.17”

  20. Some Consequences of Reification Class and property instance overhead 2 simple inverse properties become 4 properties and 1 class Abstract hierarchy of classes of reified relationships Add overhead as well to ontology development, query construction, etc. Symmetric and transitive properties Challenges for reasoning Restrictions come from a combination of properties and reified classes

  21. All binary relationships with appropriate restrictions on their arguments (max 2, range restrictions, etc.) Reified Classes (Family)

  22. Other Reified Classes

  23. Properties Between Reified Properties reif:onsetOf • Beyond OWL • Defined with respect to particular reified classes • Automatically derivable from the ontology owl:inverseOf reif:pairedProperty reif:inverseOf reif:is FatherOf reif:has Father owl:inverseOf reif:codaOf reif:isSonOf reif:hasSon

  24. Reified Relationships: Names Appellations (names) are class instances An Appellation instance has string representations (in various languages) Keeps all the facts about a name (different language versions, pronunciation, literal meaning, etc.) in one place An individual has a (reified) NamingRelation to an Appellation instance Mentions of the individual are properties of the NamingRelation

  25. Reified Names Example “Barnabas”@en bk:Man bk:NameRel bk:Appellation "Βαρναβᾶς"@el isNamedBy hasAppellation rdf:type hasName "Bernabé"@es #Barnabas #Barnabas namedBy Barnabas #NameOf Barnabas hasPhonetic Representation bär'nə-bəs #Barnabas namedBy Joseph #NameOf Joseph hasPronunciation www.libronix.com/ bkaudio/ barnabas.wav #Joseph2.1 #Joseph2.1 namedBy Joseph reference “bible.61.1.16” “bible.61.1.18 Etc. And all the right-to-left equivalents …

  26. Challenge: Converting Legacy Data Strategy: use XSL to generate RDF matching the ontology Legacy XML data organized by name and by person Generate reified relations from simple ones Lookup table for reified inverse properties (but kb query would be cleaner) Both sides of family relationships are defined independently URI Naming Map different XML names to a single URI Generate shared URIs for reified relations like #<personURI>_<relation>_<personURI> RDF merging connects them in the kb Why not owl:sameAs? Additional complexity but no practical benefit for internal-only data

  27. Converting Legacy Data (2) Other OWL data with different URIs and non-reified relations Map entities to common URIs (shared across both legacy datasets) Adopt same URI construction principles Expand out reified relations RDF merge in the kb

  28. Legacy Data Pipeline Biblical People XML … in RDF Sesame (data store) Aaron.xml Aaron.xml Loader XSL Aaron.xml Aaron.xml • Query (SeRQL, SPARQL) • Extract • API • Web service Aaron.xml Aaron.xml Aaron.rdf Aaron.xml BK ontology merge map NTNames (OWL) XSL BK-NTNames Other data (OWL)

  29. Challenge: Maintenance and Extension How to lower the skill threshold for extending the data? Approach: Distinguish different operations Adding new instances of relationships (easy) Adding non-relational attributes (easy) Adding new instances of basic entities (a little harder) Fixing bad data, extending the ontology (hard) Get the core entities right first (enables #1) Develop specialized tools that Are constrained in scope Provide simple choices Hide complications (like reification)

  30. How Do We Deliver Semantics? Part of a consumer software application: not on the open web Not practical to ship an RDF store Likely: combination of Some static results shipped with product Some web service support for dynamic information A web portal with richer search capabilities

  31. Open Architecture Issues Visualization Likely: custom MFC End-user query Likely: at most, templated queries Reasoning Necessary, but …

  32. Future Extensions to BK Place names and related properties Brief descriptions for entities Place people in Biblical eras Narrative role (greetings in epistles, scene participants, background) Key events from narratives Concepts Unnamed things (descriptions, pronouns) Headwords and lexical relationships

  33. References Weaving the New Testament into the Semantic Web, http://semanticbible.org/other/presentations/2006-sbl/Weaving.xhtml Suggested Upper Merged Ontology (SUMO), http://www.ontologyportal.org/ Defining N-ary Relations on the Semantic Web, http://www.w3.org/TR/swbp-n-aryRelations

  34. Other

  35. Future Activities Controlled vocabulary/thesaurus development Automated text classification Citation processing

  36. Grounding Knowledge In Text Enables discovery in two directions Move from text to knowledgebase Move from knowledgebase back to other texts

  37. The Publishers Still thinking about print, not data and representation Not at web scale So manual or semi-automatic markup is practical

  38. Ontology Reuse A great idea in principle, but Often not quite right in practice Our approach: Reuse when it makes sense Don’t require it when it doesn’t

  39. Modeling Challenges Dates are important, but we hardly ever know them But we do know sequencing

More Related