1 / 22

SW-Store: a vertically partitioned DBMS for Semantic Web data Management

Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2009. The VLDB Journal. SW-Store: a vertically partitioned DBMS for Semantic Web data Management. Group 4 Surabhi Mithal 4282643 Nipun Garg 4282567 http://www-users.cs.umn.edu/~smithal/. Surabhi Mithal

yepa
Download Presentation

SW-Store: a vertically partitioned DBMS for Semantic Web data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Daniel J. Abadi, Adam Marcus, Samuel R. Madden, and Kate Hollenbach. 2009. The VLDB Journal. SW-Store: a vertically partitioned DBMS for Semantic Web data Management Group 4 Surabhi Mithal 4282643 Nipun Garg 4282567 http://www-users.cs.umn.edu/~smithal/ Surabhi Mithal Nipun Garg

  2. Outline • Introduction to Semantic Web • Motivation • Problem Statement • Challenges • Major Contributions • Related Work • Key Concepts • Assumptions • Validation Methodology • Results • Improvements

  3. Introduction to semantic web : An example Asimplified bookstore data (dataset “A”) Source : http://www.w3.org/People/Ivan/CorePresentations/SWTutorial/

  4. EXAMPLE CONT : GRAPH REPRESENATION a:title http://…isbn/000651409X The Glass Palace a:year 2000 a:publisher a:city London a:author a:p_name Harper Collins a:name a:homepage Ghosh, Amitav http://www.amitavghosh.com

  5. Another bookstore data (dataset “F”)

  6. EXAMPLE CONT : GRAPH REPRESENATION http://…isbn/000651409X Le palais des miroirs f:original f:titre f:auteur http://…isbn/2020386682 f:traducteur f:nom f:nom Ghosh, Amitav Besse, Christianne

  7. DATA INTEGRATION ACROSS THE TWO DATASETS : SEMANTIC WEB a:title The Glass Palace http://…isbn/000651409X a:year 2000 a:publisher a:city London a:author Harper Collins a:p_name a:name http://…isbn/000651409X a:homepage Le palais des miroirs f:original Ghosh, Amitav http://www.amitavghosh.com f:titre f:auteur http://…isbn/2020386682 f:traducteur f:nom f:nom Ghosh, Amitav Besse, Christianne

  8. DATA INTEGRATION ACROSS THE TWO DATASETS : SEMANTIC WEB a:title The Glass Palace http://…isbn/000651409X a:year 2000 SAME URI a:publisher a:city London a:author Harper Collins a:p_name a:name http://…isbn/000651409X a:homepage Le palais des miroirs f:original Ghosh, Amitav http://www.amitavghosh.com f:titre f:auteur http://…isbn/2020386682 f:traducteur f:nom f:nom Ghosh, Amitav Besse, Christianne

  9. DATA INTEGRATION ACROSS THE TWO DATASETS :SEMANTIC WEB a:title The Glass Palace http://…isbn/000651409X a:year 2000 a:publisher a:city London a:author Harper Collins a:p_name f:original a:name f:auteur a:homepage Le palais des miroirs Ghosh, Amitav http://www.amitavghosh.com f:titre User of data “F” can now ask queries like: “give me the title of the original” http://…isbn/2020386682 f:traducteur f:nom f:nom Ghosh, Amitav Besse, Christianne

  10. Motivation • Integration and sharing of data across different applications and organizations. • The Semantic Web logical data model is called “Resource Description Framework. • Semantic web concept has issues related to scalability and performance due to the nature of the data. Current data management solutions for RDF scale poorly.

  11. Problem Statement • Input : RDF data in the form of triples <subject,property,object> e.g. The Glass Palace hasAuthor Amitav Ghosh • Output : Efficient storage system for RDF data. • Objective : Improve the query performance for complex real world queries.

  12. Challenges Find all authors of books whose title has the word “Transaction”. 5 way self join!

  13. Major Contributions and Novelty • Introduction of a new concept of vertically partitioning RDF data and use of a column-oriented database to improve performance and increase simplicity. • The performance evaluation of the new and existing techniques with a real world example. • A new column oriented database SW-store is proposed which is based on the above approach.

  14. Related Work– Property tablesHP Laboratories - Jena • Property Clustered Tables and Property Class Tables • Approach 1: A data clustering approach. • Approach 2: Creates clusters based on subject’s type. • Limitations: • Accuracy of Clustering algorithms. • NULLs in data. • Multivalued attributes.

  15. Sample database Too many NULLs Source: - SW-Store: a vertically partitioned DBMS for Semantic Web data management

  16. Key Concepts: Vertical partitioning and Column Oriented Store • Vertical partitioning of data and further storing this vertically partitioned data into a column oriented database. • Subject-object columns for each property. Advantages: • Effective handling of Multivalued attributes. • Elimination of NULLs • The number of unions is less. • Column oriented storage. Advantages: • no wastage of bandwidth as projections on data happen before it is pulled into main memory. • record header is stored in separate columns thus reducing the tuple width and letting us choose different compression techniques for each column.

  17. Key Concepts: sw-store • SW-store is a column oriented DBMS optimized for storing RDF • Single column table for subjects. • Representing Sparse data • Overflow tables

  18. Assumptions • Postgres is assumed to be the best available choice for a row oriented RDBMS because of effective handling of NULLs. • Queries that do not restrict on property values are very rare for RDF applications. • Moderate amount of Insert/Updates on RDF store. • Critique for Assumption: Limited Insert/Update • If the overflow tables get filled rapidly, the batch operation to update the column oriented store will occur more often degrading the performance as a whole.

  19. Validation methodology • Barton Libraries dataset provided by the Simile Project at MIT (http://simile.mit.edu/rdf-test-data/barton). • The benchmark is set of 7 queries which is based on a browsing session of Long well, a UI built by Simile group for querying the library dataset. These queries are executed on: • Triple data store (subject, property, object table with no improvements on Postgres). • Property tables ( on Postgres) • Vertically partitioned data in a row oriented store (Postgres). • Vertically partitioned data in a column oriented store (C- Store).

  20. Validation methodology • Strengths : • Real world data and query scenarios. • Comparison of all the existing techniques the proposed technique. • Weaknesses :- • Avoiding queries involving unrestricted property problem which are particularly prevalent for vertical partitioned scenarios. • Accuracy of clustering for property tables. • Performance may differ when using different underlying databases.

  21. Results • From the results, it is clear that proposed storage scheme outperforms the exiting methods in terms of query time.

  22. Improvements – Spatial Perspective • Schema design- Queries are fired on vertically partitioned tables as well as overflow tables. Owing to the heaviness of spatial data, there should be some spatial indexing like R* TREE or GRID to make these queries faster. • Restrictive nature - Spatial queries are not restricted to only specific “properties” which is an important assumption on their part. • E.g. Landmarks • Tables should be partitioned in a better way rather than just handling one property per table! e.g. Grouping similar properties together based on domain knowledge.

More Related