1 / 34

An RDF and XML Database

An RDF and XML Database. John Snelson, Lead Engineer 23 rd October 2013. MarkLogic. DATABASE. SEARCH. APPLICATION SERVICES. Data ≠ Information. Data + Context = Information. Dynamic Semantic Publishing BBC Sports. The Challenge. Goals. Size and Complexity: # of athletes # of teams

berit
Download Presentation

An RDF and XML Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An RDF and XML Database John Snelson, Lead Engineer 23rdOctober 2013

  2. MarkLogic DATABASE SEARCH APPLICATION SERVICES

  3. Data ≠ • Information

  4. Data + • Context = • Information

  5. Dynamic Semantic PublishingBBC Sports The Challenge Goals • Size and Complexity: • # of athletes • # of teams • # of assets (match reports, statistics, etc.) • # of relations (facts) • Rich user experience • See information in context • Personalize content • Easy navigation • Intelligently serve ads (outside of UK) • Manageable • Static pages? Too many, changing too fast • Limited number of journalists • Automate as much as possible

  6. Dynamic Semantic PublishingA Solution XML Database Triple Store • Store, manage documents • Stories • Blogs • Feeds • Profiles • Store, manage values • Statistics • Full-Text search • Performance, scalability • Robustness • Metadata about documents • Tagged by journalists • Added (semi-)automatically • Inferred • Facts reported by journalists • Linked Open Data for real-world facts

  7. Dynamic Semantic PublishingUnderstanding Data played in plays for plays in

  8. Dynamic Semantic PublishingScaling Up

  9. What is RDF? “John” :person4 :place5 :birth-place :first-name :spouse :has-child :birth-place :spouse :has-parent :person5 :has-child :person20 :has-parent

  10. What is RDF? • Schema-less • Triple granularity • Open world assumption • Joins - the cost of granularity RDF

  11. What is Semantics? Datastored in Triples • Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London""London" : isIn : "England"

  12. What is Semantics? Datastored in Triples • Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London""London" : isIn : "England" Rulestell us something about the triples Example: If (A livesIn X) AND (X isIn Y) then (A livesIn Y) Inference: "John Smith" : livesIn : "England"

  13. What is Semantics? Datastored in Triples • Expressed as Subject : Predicate : Object Example: "John Smith" : livesIn : "London""London" : isIn : "England" Rulestell us something about the triples "John Smith" "London" "England" livesIn isIn livesIn

  14. Semantics Architecture GRAPH SPARQL TRIPLE SPARQL XQY XSLT SQL

  15. Triple Index • 3 triple orders • Cached for performance • Works seamlessly with other indexes • Security • 150 bytes per triple on disk • Billions oftriples per host • Scaling out horizontally TRIPLE

  16. RDF Loading RDF

  17. Triples Embedded in Documents … <sem:triple> <sem:subject> http://example.org/kennedy/person12 </sem:subject> <sem:predicate> http://example.org/kennedy/last-name </sem:predicate> <sem:object datatype="http://www.w3.org/2001/XMLSchema#string"> Lawford </sem:object> </sem:triple> …

  18. Content, Data, and Semantics <SAR> </title> Suspicious vehicle near airport <title> Suspicious vehicle… 2012-11-12Z <date> </date> <type> observation/surveillance </type> <threat> <type> suspicious activity </type> <category> suspicious vehicle </category> </threat> <location> <lat> 37.497075 </lat> <long> -122.363319 </long> </location> A blue van with license plate ABC 123 was observed parked behind the airport sign… <description> A blue van… </triple> <predicate> isa </predicate> license-plate <object> </object> IRIID <subject> </subject> <triple> <triple> </subject> value <subject> IRIID <predicate> </predicate> <object> ABC 123 </object> </triple> </description> </SAR>

  19. Content, Data, and Semantics Unstructured full-text <SAR> <description> <title> Suspicious vehicle… <triple> <type> A blue van… <object> <location> <date> <triple> <predicate> ABC 123 <lat> 2012-11-12Z <threat> <long> <subject> value <subject> 37.497075 IRIID -122.363319 <predicate> IRIID <type> observation/surveillance <object> Semantic (RDF) Triples isa <category> Geospatial Data suspicious activity license-plate suspicious vehicle

  20. RDFValues <http://example.org/kennedy/person4> _:blank1 “string value”^^xs:string “bonjour”@fr “2013-04-09”^^xs:date “simple” “987”^^xs:double

  21. Datatype Mapping IRI <http://example.com> sem:iri(“http:// example.com”) Blank Node _:blank1 sem:blank(“…”) Simple Literal “simple” xs:string(“simple”) Language “bonjour”@fr Tagged Literal rdf:langString(“bonjour”, “fr”)

  22. SPARQL select * where { ?person :birth-place ?place; :first-name “John” } • Executed using the triple index • SPARQL 1.0 + much of SPARQL 1.1 • Cost-based optimization • Join ordering and algorithms SPARQL

  23. Executing SPARQL sem:sparql(“ prefix : <http://example.org/kennedy/> select * { ?person :first-name ?first; :last-name ?last; :alma-mater [:ivy-league :true] }”, map:entry(“first”,“John”), (), cts:collection-query(“mycollection”) )

  24. Returning Binding Solutions select * where { ?person :birth-place :place5 } select * where { ?person :birth-place ?place; :first-name “John” }

  25. Solution Results map:map

  26. SPARQL Query Results XML Format sem:query-result-serialize( sem:sparql(“select* { … }”), “xml” )

  27. Returning Triples describe :person4 construct { ?bp :uses-name ?fn } where { ?person :birth-place ?bp; :first-name ?fn }

  28. Triple Results :place0 :uses-name “Ethel”, “Jeffrey”, “Kara” . :place1 :uses-name “Edward”, “James” . :place10 :uses-name “Robert”, “Sheila”, “Stephen” . sem:triple sem:iri

  29. Querying Named Graphs select * from <http://my_graph> where { ?s ?p ?o } collection select * where { graph <http://my_graph> { ?s ?p ?o } }

  30. Restricting The Datasets let $options := “properties” let $query := cts:and-query( cts:directory-query(“/triples/”), cts:element-range-query( xs:QName(“date”),“>”,$date) ) returnsem:sparql(“…”,(),(), $options,$query)

  31. Creating Triples Returning sem:triple values Inserting to a database • sem:triple() • sem:rdf-parse() • sem:rdf-get() • sem:rdf-builder() • sem:rdf-load() • sem:rdf-insert()

  32. Graph Store API • declare function graph-insert( • $graphname as sem:iri, • $triples as sem:triple*, • [$permissions as element(sec:permission)*, • $collections as xs:string*, • $quality as xs:int?, • $forest-ids as xs:unsignedLong*] • ) as xs:string*; • declare function graph-delete( • $graphname as sem:iri • ) as empty-sequence();

  33. Conclusion • Semantics can enhance your data-oriented and search applications. • XQuery and SPARQL work well together. • A combination RDF and XML database simplifies working with the technologies together. • Try MarkLogic 7:http://www.marklogic.com/early-access/

  34. Any Questions?

More Related