benchmarking traversal operations over graph databases
Download
Skip this Video
Download Presentation
Benchmarking traversal operations over graph databases

Loading in 2 Seconds...

play fullscreen
1 / 19

Benchmarking traversal operations over graph databases - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

Benchmarking traversal operations over graph databases. Marek Ciglan 1 , Alex Averbuch 2 and Ladialav Hluchý 1 1 Institute of In f ormatics , Slovak Academy of sciences, Bratislava 2 Swedish Institute of Computer Science Stockholm , Sweden. Overview. Graph data management

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Benchmarking traversal operations over graph databases' - caspar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
benchmarking traversal operations over graph databases

Benchmarking traversal operations over graphdatabases

Marek Ciglan1, AlexAverbuch2 and Ladialav Hluchý1

1Institute of Informatics, Slovak Academy of sciences, Bratislava

2 Swedish Institute of Computer ScienceStockholm, Sweden

overview
Overview
  • Graph data management
  • Graph databases
    • Characteristics
    • Unique features
    • Challenges
  • GDB Benchmarking
    • Motivation
    • Related work
  • Graph traversal benchmark
    • Goals
    • Design
  • Preliminary results

21 November 2011

graph data management
Graph data management
  • Booming area of R&D in recent years
  • Reasons:
    • Increased availability and importance of graph data
    • Natural way for modelling various real world phenomena
      • (networks: social, information, communication)
  • Two dominant data management directions:
    • Distributed graph processing frameworks
      • Mining/processing of large graphs
        • Pregeland clones (Goden Orb, Giraph)
    • Graph databases
      • Persistent management of graph data
        • Neo4J, OrientDB, Dex

21 November 2011

graph databases
Graph databases
  • Property graph data model
    • Graph structure
    • Elements have properties

Node K2

Attr I1: val

Attr I2: val

Attr I3: val

L1

L3

Node K1

Attr I1: val

Attr I2: val

Attr I3: val

Node K4

Attr I1: val

Attr I2: val

Attr I3: val

Node K3

Attr I1: val

Attr I2: val

Attr I3: val

L2

L1

21 November 2011

graph databases1
Graph databases
  • Property graph data model
    • Graph structure
    • Elements have properties
  • Unique feature
    • Graph topology capturing the relations of objects
    • Graph database should be
      • Efficient in exploiting topology
      • Allows for fast traversal
  • Challenges
    • Traditionally – graph processing/traversing done in memory
    • Reasons:
      • Data driven computation
      • Random access pattern for data access

21 November 2011

graph database benchmarking
Graph database benchmarking
  • Motivation
    • Number of emerging graph data management solutions.
    • Which is right one for a specific problem?
    • Fair measurement of performance for distinct use cases.
    • Identify limits – what use cases have good performance.

21 November 2011

graph database benchmarking1
Graph database benchmarking
  • Motivation
    • Number of emerging graph data management solutions.
    • Which is right one for a specific problem?
    • Fair measurement of performance for distinct use cases.
    • Identify limits – what use cases have good performance.
  • Related work
    • Only few works address directly graph databases
      • D. Dominguez-Sal et al:
        • Adoption of HPC benchmark for graph data processing
        • Design of a benchmark suitable for graph database systems
      • GraphBench - basic benchmarking framework implementation

21 November 2011

graph database benchmarking2
Graph database benchmarking
  • Motivation
    • Number of emerging graph data management solutions.
    • Which is right one for a specific problem?
    • Fair measurement of performance for distinct use cases.
    • Identify limits – what use cases have good performance.
  • Traversal operation benchmarking
    • Graph topology – unique feature of the graph databases
    • Test the ability to do:
      • Local traversals (exploring k-hops neighbourhood)
      • Global traversals (traversals of whole graph)
    • Perform traversals in a memory constraint environment
      • (can we deal efficiently with data sets exceeding the physical memory?)

21 November 2011

benchmark design
Benchmark design
  • Fairness
    • Blueprints API – effort to provide common API
      • https://github.com/tinkerpop/blueprints/wiki/
    • Using Blueprints – one implementation of benchmark for all the benchmarked systems
      • Avoid bias of different implementation of benchmark for different systems
    • execution of the same sequence of operations on the same data
      • log operations and their parameters in the first run over the defined data
      • logs are persistent, allowing benchmarks to be rerun on different versions of a product, and the change in performance can thus be measured

21 November 2011

benchmark design1
Benchmark design
  • Data
    • Different data properties / distributions affects benchmark results
      • E.g. dense vs. sparse graphs
    • Ideally, data sets properties similar to those of real world data sets
    • Use: scale free networks with small world properties
      • social networks, the Internet, traffic networks, biological networks, and term co-occurrence networks
      • LFR-Benchmark generator - networks with power-law degree distribution and implanted communities within the network

21 November 2011

benchmark design2
Benchmark design
  • Traversal operations
    • Local traversals
      • Compute local clustering coefficient (2-hops breadth first traversal)
      • 3-hops breadth first traversal
    • Global traversals
      • Compute connected components
        • Incomming / ougoing edges
      • k-iterations of HITS algorithm
  • Memory constraint environment
  • Intermediate results for global traversals operations:
    • Kept in memory
    • Kept as properties on nodes

21 November 2011

benchmark implementation
Benchmark implementation
  • Implemented on top of Blueprints API
  • Test performed on:
    • Neo4J,
    • DEX,
    • OrientDB6 ,
    • Native RDF repository (NativeSail)
    • SGDB (research prototype )
  • Challenge: deal with differences in underlying systems, E.g.:
    • triple stores – naming constraints,
    • some impl. do not support properties on some elements
    • Some impl. do not support iteration over nodes/edges
    • Nodes Ids generation – user provided vs. autogenerated
    • Transaction support / no transactions

21 November 2011

benchmark runs
Benchmark Runs
  • Performed on older hardware:
    • 2G mem
  • Data sets sizes:
    • 1K, 10K, 40K, 50K, 100K, 200K, 400K, 800K, 1M
    • Most systems were not able to load nets with 400K+ edges
      • (constraint: load 10K edges in less than 60 sec.)

21 November 2011

conclusion
Conclusion
  • Extending work on benchmarking graph databases
  • Focusing on graph traversal operations
  • Local/Global traversals
  • Preliminary results:
    • Problem just to load larger datasets into GDBs
    • Stable performance for local traversals with 2-3 hops
      • Suitable for most ego-centric node properties analysis
    • Bad performance for global traversal operations on larger networks

21 November 2011

thank you for your attention
Thankyouforyourattention.

http://ups.savba.sk/~marek/gbench.html

21 November 2011

ad