berlin sparql benchmark bsbm n.
Skip this Video
Download Presentation
Berlin SPARQL Benchmark (BSBM)

Loading in 2 Seconds...

play fullscreen
1 / 23

Berlin SPARQL Benchmark (BSBM) - PowerPoint PPT Presentation

  • Uploaded on

Berlin SPARQL Benchmark (BSBM). Christian Bizer and Andreas Schultz. Presented by: Nikhil Rajguru. Agenda. Need for a benchmark for RDF stores Existing benchmarks Design of BSBM, Dataset generator and query mixes Evaluation results Contributions My work Q&A. Motivation.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Berlin SPARQL Benchmark (BSBM)' - ofira

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
berlin sparql benchmark bsbm

Berlin SPARQL Benchmark (BSBM)

Christian Bizerand Andreas Schultz

Presented by: Nikhil Rajguru

  • Need for a benchmark for RDF stores
  • Existing benchmarks
  • Design of BSBM, Dataset generator and query mixes
  • Evaluation results
  • Contributions
  • My work
  • Q&A
  • A large number of Semantic web applications represent their data as RDF
  • Many RDF stores support the SPARQL query language and SPARQL protocol
  • Need to compare performance of various RDF stores and also traditional Relational DB solutions (SPARQL wrappers)
existing benchmarks
Existing benchmarks
  • SP2Bench
    • Uses a synthetic, scalable version of the DBLP bibliography dataset
    • Queries designed for comparison of different RDF Store layouts

- Not designed towards realistic workloads, no parameterized queries and no warmup

  • DBPediaBechmark
    • Uses DBPedia as the benchmark dataset

- Very specific queries and dataset not scalable

  • Lehigh University Benchmark (LUBM)
    • Compares OWL reasoning engines

- Does not cover SPARQL specific features like OPTIONAL filters, UNION, DESCRIBE, etc.

- Does not employ parameterized queries, concurrent clients and warm-up

main goals of bsbm
Main Goals of BSBM
  • Compare different stores that expose SPARQL endpoints
  • Have realistic use case motivated data sets and Query mixes
  • Test query performance (integration and visualization) against large RDF datasets rather than complex reasoning
bsbm dataset
BSBM Dataset
  • Built around an e-commerce use case
  • Dataset generator
    • Scales to arbitrary sizes (scale factor = # of products)
    • Data generation is deterministic
  • Dataset objects: Product, ProductType, ProductFeature, Producer, Vendor, Offer, Review, Reviewer and ReviewingSite.
bsbm query mix
BSBM Query Mix
  • Simulates how customers browse, review and select items online
  • Operations include
    • Look for products with some generic features
    • Look for products without some specific features
    • Look for similar products
    • Look for reviews and offers
    • Pull up all information about a specific product
    • Find the best deal for a product
experimental setup
Experimental Setup
  • RDF Stores tested
    • Jena SDB
    • Virtuoso
    • Sesame
    • DR2 Server (with MySQL as underlying RDBMS)
  • DELL workstation
    • Processor: Intel Core 2 Quad Q9450 2.66GHz
    • Memory: 8GB DDR2 667
    • Hard disks: 160GB (10,000 rpm)SATA2, 750GB (7,200 rpm) SATA2)
    • OS: Ubuntu 8.04 64-bit
load times sec
Load times (sec)
  • Data loaded as,
    • D2R server: Relational representation of BSBM dataset (MySQL dumps)
    • Triple Stores: N-triples representation of BSBM Dataset

3.6 hr

7.7 hr

13.6 hr

3.3 min

overall run time
Overall Run Time
  • 50 query mixes, 1250 queries in all
  • Test driver and store under test running on the same machine
  • 10 query mixes executed for warm up
average run time per query
Average Run Time Per Query
  • Gives a different perspective on query performance for the stores
  • No data store performs optimally for all query types at all Data set sizes (50K – 25M triples)
  • Sesame best for Queries 1 - 4 but has bad performance for queries 5 – 9
  • DR2 server fastest for queries 6 – 9 but bad for all the lower ones
  • Similar results for Jena SDB and Virtuoso
  • First benchmark to compare stores that implement SPARQL query language and protocol for data access
  • Dataset generator (RDF, XML and Relational representation)
  • First benchmark to test RDF stores with realistic workloads of use case motivated queries
my work
My Work
  • Build a scalable RDF store for storing the Smart Grid data
    • Sensor readings, building information, weather data, Time schedule for each customer
  • Scale to 50000 sensors (20M triples to be loaded every 15mins)
  • Load Fast and slow changing data
my work1
My work
  • Support a range of SPARQL queries on the store
  • Web Portal: (latency ~sec)
    • 100 customers x 100 columns = 10000 triples
  • Schedule trigger: (latency ~min)
    • ~50,000 customers x 5 schedule events per day x 4 triples = 1,000,000 triples
  • Forecast training: (latency ~hrs)
    • 3 years x 365 days x 100 readings x 200 buildings x 2 sensor x 25 columns = 1,095,000,000 triples
thank you
Thank you

Questions ?