c store rdf data management using column stores n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
C-Store: RDF Data Management Using Column Stores PowerPoint Presentation
Download Presentation
C-Store: RDF Data Management Using Column Stores

Loading in 2 Seconds...

play fullscreen
1 / 23
travis-travis

C-Store: RDF Data Management Using Column Stores - PowerPoint PPT Presentation

95 Views
Download Presentation
C-Store: RDF Data Management Using Column Stores
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. C-Store: RDF Data Management Using Column Stores Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009

  2. What is RDF data? • RDF (Resource Description Framework) • The data model behind the Semantic Web. • The Semantic Web’s vision is to make Web machine readable. • Represents data as statements of the form <subject, property, object> • To represent the notion "The sky has the color blue" • use the triple < The sky, has the color, blue>.

  3. DBFacebook RDF Graph:Triples make the graph

  4. RDF Data Is Proliferating • Swoogle: Semantic Web Search Engine • Indexes about 2,889,974 Semantic Web documents. • Number of triples could be parsed from all the documents is 699,043,992. • http://swoogle.umbc.edu/ • Simile: MIT Digital Library Data in RDF • More than 50 million triples. • http://simile.mit.edu/

  5. RDF Data Management • Early projects built their own RDF stores. • Trend now towards storing in RDBMSs. • Examines 3 approaches for storing RDF data in a RDBMS

  6. Approach 1: Triple Stores

  7. Approach 2: Property Tables

  8. Approach 3: One-table-per-property Favors Column Store

  9. Comparison Results Synopsis • Triple-store really slow on benchmark with 50M triples. • Property-tables and one-table-per-property approaches are factor of 3 faster. • One-table-per-property with column-store yields another factor of 10.

  10. Querying RDF Data • SPARQL is the dominant language. • Examples: SELECT ?name WHERE { ?x type Person . ?x name ?name } SELECT ?likes ?dislikes WHERE { ?x title “Implementation Techniques for Main Memory Databases”. ?y authorOf ?x . ?y likes ?likes . ?y dislikes ?dislikes }

  11. Translation to SQL over triples is easy

  12. SPARQL  SQL (over triple store) • Query 1 SPARQL: SELECT ?name WHERE { ?x type Person . ?x name ?name } • Query 1 SQL: SELECT B.object FROM triples AS A, triples as B WHERE A.subject = B.subject AND A.property = “type” AND A.object = “Person” AND B.predicate = “name”

  13. Characteristics of Triple Stores • Accessing multiple properties for a resource require subject-subject joins. • Path expressions require subject-object joins. • Can improve performance by: • Indexing each column • Dictionary encoding string data • Ultimately: Do not scale

  14. Property Tables Can Reduce Joins

  15. Characteristics of Property Tables • Complex to design • If narrow: reduces nulls, increases unions/joins • If wide: reduces unions/joins, increases nulls • Implemented in Jena and Oracle • But main representation of data is still triples

  16. Table-Per-Property Approach • Nulls not stored • Easy to handle multi-valued attributes • Only need to read relevant properties • Still need joins (but they are linear merge joins)

  17. Materialized Paths

  18. Accelerating Path Expressions • Materialize Common Paths • Improved property table performance by 18-38% • Improved one-table-per-property performance by 75-84% • Use automatic database designer (e.g., C-Store /Vertica) to decide what to materialize

  19. One-table-per-property  Column-Store • Can think of one-table-per-property as vertical partitioning super-wide property table. • Column-store is a natural storage layer to use for vertical partitioning. • Advantages: • Tuple Headers Stored Separately. • Column-oriented data compression. • Do not necessarily have to store the subject column • Carefully optimized merge-join code

  20. Library Benchmark • Data • Real Library Data (50 million RDF triples) • Data acquired from a variety of diverse sources (some quite unstructured). • Queries • Automatically generated from the Longwell RDF browser. • Details in Abadi’s paper .

  21. Results

  22. Future Work • build a fully-functional RDF database • Extracts and loads RDF data from structured, semi-structured, and unstructured data sources. • Translates SPARQL to queries over vertical schema. • Performs reasoning inside the DB. • Use with biology research.

  23. References • Abadi, Daniel J., Marcus, Adam, Madden, Samuel R., and Hollenbach, Kate. Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB, 2007. • Abadi, Daniel J., Marcus, Adam, Madden, Samuel R., and Hollenbach, Kate. SW-Store: A Vertically Partitioned DBMS for Semantic Web Data Management. In VLDB Journal, 2009.