Berlin sparql benchmark bsbm
1 / 23

Berlin SPARQL Benchmark (BSBM) - PowerPoint PPT Presentation

  • Uploaded on

Berlin SPARQL Benchmark (BSBM). Christian Bizer and Andreas Schultz. Presented by: Nikhil Rajguru. Agenda. Need for a benchmark for RDF stores Existing benchmarks Design of BSBM, Dataset generator and query mixes Evaluation results Contributions My work Q&A. Motivation.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Berlin SPARQL Benchmark (BSBM)' - ofira

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Berlin sparql benchmark bsbm

Berlin SPARQL Benchmark (BSBM)

Christian Bizerand Andreas Schultz

Presented by: Nikhil Rajguru


  • Need for a benchmark for RDF stores

  • Existing benchmarks

  • Design of BSBM, Dataset generator and query mixes

  • Evaluation results

  • Contributions

  • My work

  • Q&A


  • A large number of Semantic web applications represent their data as RDF

  • Many RDF stores support the SPARQL query language and SPARQL protocol

  • Need to compare performance of various RDF stores and also traditional Relational DB solutions (SPARQL wrappers)

Existing benchmarks
Existing benchmarks

  • SP2Bench

    • Uses a synthetic, scalable version of the DBLP bibliography dataset

    • Queries designed for comparison of different RDF Store layouts

      - Not designed towards realistic workloads, no parameterized queries and no warmup

  • DBPediaBechmark

    • Uses DBPedia as the benchmark dataset

      - Very specific queries and dataset not scalable

  • Lehigh University Benchmark (LUBM)

    • Compares OWL reasoning engines

      - Does not cover SPARQL specific features like OPTIONAL filters, UNION, DESCRIBE, etc.

      - Does not employ parameterized queries, concurrent clients and warm-up

Main goals of bsbm
Main Goals of BSBM

  • Compare different stores that expose SPARQL endpoints

  • Have realistic use case motivated data sets and Query mixes

  • Test query performance (integration and visualization) against large RDF datasets rather than complex reasoning

Bsbm dataset
BSBM Dataset

  • Built around an e-commerce use case

  • Dataset generator

    • Scales to arbitrary sizes (scale factor = # of products)

    • Data generation is deterministic

  • Dataset objects: Product, ProductType, ProductFeature, Producer, Vendor, Offer, Review, Reviewer and ReviewingSite.

Bsbm query mix
BSBM Query Mix

  • Simulates how customers browse, review and select items online

  • Operations include

    • Look for products with some generic features

    • Look for products without some specific features

    • Look for similar products

    • Look for reviews and offers

    • Pull up all information about a specific product

    • Find the best deal for a product

Experimental setup
Experimental Setup

  • RDF Stores tested

    • Jena SDB

    • Virtuoso

    • Sesame

    • DR2 Server (with MySQL as underlying RDBMS)

  • DELL workstation

    • Processor: Intel Core 2 Quad Q9450 2.66GHz

    • Memory: 8GB DDR2 667

    • Hard disks: 160GB (10,000 rpm)SATA2, 750GB (7,200 rpm) SATA2)

    • OS: Ubuntu 8.04 64-bit

Load times sec
Load times (sec)

  • Data loaded as,

    • D2R server: Relational representation of BSBM dataset (MySQL dumps)

    • Triple Stores: N-triples representation of BSBM Dataset

3.6 hr

7.7 hr

13.6 hr

3.3 min

Overall run time
Overall Run Time

  • 50 query mixes, 1250 queries in all

  • Test driver and store under test running on the same machine

  • 10 query mixes executed for warm up

Average run time per query
Average Run Time Per Query

  • Gives a different perspective on query performance for the stores

  • No data store performs optimally for all query types at all Data set sizes (50K – 25M triples)

  • Sesame best for Queries 1 - 4 but has bad performance for queries 5 – 9

  • DR2 server fastest for queries 6 – 9 but bad for all the lower ones

  • Similar results for Jena SDB and Virtuoso


  • First benchmark to compare stores that implement SPARQL query language and protocol for data access

  • Dataset generator (RDF, XML and Relational representation)

  • First benchmark to test RDF stores with realistic workloads of use case motivated queries

My work
My Work

  • Build a scalable RDF store for storing the Smart Grid data

    • Sensor readings, building information, weather data, Time schedule for each customer

  • Scale to 50000 sensors (20M triples to be loaded every 15mins)

  • Load Fast and slow changing data

My work1
My work

  • Support a range of SPARQL queries on the store

  • Web Portal: (latency ~sec)

    • 100 customers x 100 columns = 10000 triples

  • Schedule trigger: (latency ~min)

    • ~50,000 customers x 5 schedule events per day x 4 triples = 1,000,000 triples

  • Forecast training: (latency ~hrs)

    • 3 years x 365 days x 100 readings x 200 buildings x 2 sensor x 25 columns = 1,095,000,000 triples

Thank you
Thank you

Questions ?