1 / 20

RDF languages and storages part 2 - indexing semi-structure data

RDF languages and storages part 2 - indexing semi-structure data. Maciej Janik Conrad Ibanez CSCI 8350, Fall 2004. Outline. Jena storage Indexing techniques. Jena. Implemented in Java One of the most popularly used RDF storages and query engines Supports RDF, RDFS and OWL

pink
Download Presentation

RDF languages and storages part 2 - indexing semi-structure data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RDF languages and storagespart 2 - indexing semi-structure data Maciej JanikConrad IbanezCSCI 8350, Fall 2004

  2. Outline • Jena storage • Indexing techniques

  3. Jena • Implemented in Java • One of the most popularly used RDF storages and query engines • Supports RDF, RDFS and OWL • In memory and persistent storage (Oracle, MySQL, PostgreSQL) • RDQL • Reasoning/inference engine

  4. Jena - storage schema • Previous version used normalized relational DB tables • statements • literals • resources • Taken approach to store triples as (Subject, Predicate, Object) in denormalized tables • Optimization for common statement patterns - grouping of properties

  5. Normalized tables Denormalized Jena - storage „Efficient RDF Storage and Retrieval in Jena2” - Wilkinson et al.

  6. Jena - storage • Do certain trade-off for space and search time • Cluster properties that are likely to be accessed together - optimize for common patterns • Special treatment of reified statements

  7. Jena - graph abstraction • Graph interface is separated from (persistent) triple storage layer • Special support for different types of graphs - optimized for performance • Support operations like add, delete, find.

  8. Jena - query processing • Converting multiple patterns in query into one query to DB • Use DB query optimizer instead of executing multiple queries from Jena level (as it was in Jena1) • Associate a table with pattern (best) or span pattern between tables (requires join operation) • Query may span between different graphs, but it can be optimized only if they are in the same database

  9. What to index? How to index?

  10. Indexing semistructured data • XML cannot be indexed directly as relational DB • Indexing may take advantage of tree structure • depth of node • common path from the root • convert each path to string expression • precalculate the path tree

  11. Indexing semistructured data • Idea is based on Particia’s trie • Index should scale with the growth of data • Path together with leaf is encoded into string -> the Index Fabric „A Fast Index for Semistructured Data” - Brian F. Cooper et al.

  12. A Layered Index „A Fast Index for Semistructured Data” - Brian F. Cooper et al.

  13. Index Fabric • Index is used to accelerate path expressions - mainly for queries that ask for root-to-leaf path • Idea of prefix encoding • xml: <A>alpha<B>beta<C>gamma</C></B></A> • paths: <A>alpha ; <A><B>beta ; <A><B><C>gamma • encoded: A alpha ; A B beta ; A B C gamma • infix (not common): A alpha B beta C gamma • Convert path to string for fast searches • Replace tags with ‘non-terminal’ characters (like in automata)

  14. Index Fabric - raw paths „A Fast Index for Semistructured Data” - Brian F. Cooper et al.

  15. Graphs - how to index? http://www.aisee.com/ Backbone

  16. Graphs - how to index? http://www.aisee.com/ Tree-type - prefixes - tries

  17. 2-index 1-index Graphs - how to index? T-index Path templates „Index Structure for Path Expressions” - Tova Milo, Dan Suciu

  18. Graphs - how to index? http://www.aisee.com/ Landmarks

  19. Indexing - summary • Indexing semistructure data • index fabric - encoding, multilayered • common prefixes - trie structure • backbone - highways between points • landmarks - county division • path templates - precalculated expressions • clustering - grouping by theme access • Indexing such data is NOT easy, solution depends how you want to search the graph

  20. References • „Efficient RDF Storage and Retrieval in Jena2” - Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds • „A Fast Index for Semistructured Data” - Brian F. Cooper, Neal Sample, Michael J. Franklin, Gisli Hjaltason, Moshe Shadmon • „Index Structures for Path Expressions” - Tova Milo, Dan Suciu

More Related