1 / 16

Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish

SW-Store: a vertically partitioned DBMS for Semantic Web data management. Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the WEB with RDF, OQL and SPARQL. Overview .

talia
Download Presentation

Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SW-Store: a vertically partitioned DBMS for Semantic Web data management Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1st 2013 CS 848 – Information Integration on the WEB with RDF, OQL and SPARQL

  2. Overview • The Problem and the Solution • Motivation • Current State of Art - RDF in RDBMS and Property tables • Vertically Partitioned Approach • Column Oriented DBMS for Vertical Partitioning • Benchmarks, Comparisons and Results • SW-Store – Design • System Architecture • Storage System • Query Engine and Query Translation • The rest of it • Conclusion

  3. Motivation • Efficient storage mechanism for RDF triples The easy way : Have a three column schema with subject , property and object as labels Query : Find the authors of books whose title contains the word “Transaction”

  4. Motivation • Efficient storage mechanism for RDF triples The easy way : Have a three column schema with subject , property and object as labels Query : Find the authors of books whose title contains the word “Transaction” “5 way self join”

  5. Property table approach Basic Idea : create tables based on properties as labels • Two approaches • Clustered property table … cluster properties that tend to be defined together • Property class table … cluster based on type property of subjects

  6. Two sides of coin • Advantages: • Significantly reduces subject-subject self joins on triples table • Opens up possibility of attribute typing. • Disadvantages: • Many queries will still need joins as they will access data from multiple tables • Unstructured data – Subjects won’t have all properties defined. • Multivalued attributes.

  7. A simpler alternative : Vertical partitioning Basic Idea: Subject-Object columns for each property. • Advantages: • Effective handling of multivalued attributes • Elimination of null values – heterogeneous records • Only property tables required by a query needs to be read • No clustering algorithms • Fewer unions • But of course, • Number of joins required just exploded!! • Slower inserts

  8. Extending a column oriented DBMS • Basic Idea: store as collections of columns rather than collection of rows • No wastage of bandwidth as projections on data happen before it is pulled into main memory. • Record header is stored in separate columns thus reducing the tuple width and letting us choose different compression techniques for each column. Source: smithal – spatial databases CSCI 8715

  9. Benchmarkand Evaluation Barton Libraries dataset provided by Simile Project at MIT A benchmark set of 7 queries of varying type • Triple Data store • Property tables • Vertically partitioned – row oriented • Vertically partitioned – Column oriented

  10. Results • Property table and vertical partitioning outperforms triple store by a factor of 2-3. • C-Store adds another factor of 10 performance improvement • For Property table, careful selection of column names are required. • Vertical partitioning represents the best case and worst case scenario • Linear scaling for all tested queries

  11. SW-Store – A standalone vertically partitioned database/storage layer • Hybrid storage representation • Single columned • Column oriented sparse compression schemes

  12. Data representation

  13. Query engine and Query Translation • Each column scanned to produce tuples that satisfies all three predicates • Tupleizeoperator becomes merge join over two column vertical partitions • Query translator converts

  14. Overflow table to perform updates • A mechanism to support inserts in a batch. • Additional table in the standard triples schema • Not indexed or read optimized • Properties that appear very small number of times in overflow table are not merged due to cost of merging. • Horizontal “chunks” to improve the efficiency of merging • Disadvantage: • Queries must go to both overflow table and vertical partitions • Merge must be performed – Still expensive

  15. Discussions: • Multivalued attributes can not be implemented. • Overflow table – Significant overhead??? • “Overflow tables might turn out to be useful while adding very rare predicates” – How? • Queries that do not restrict on property values are very rare for RDF applications. -- ? • Potential scalability issues when the number of properties are high? • Queries including unrestricted property problem are removed from the validation dataset. – what would be the impact?What if queries are not restricted to a limited number of properties? Are real world queries like this?

  16. Thank you!

More Related