Benchmarking traversal operations over graph databases
Download
1 / 19

Benchmarking traversal operations over graph databases - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

Benchmarking traversal operations over graph databases. Marek Ciglan 1 , Alex Averbuch 2 and Ladialav Hluchý 1 1 Institute of In f ormatics , Slovak Academy of sciences, Bratislava 2 Swedish Institute of Computer Science Stockholm , Sweden. Overview. Graph data management

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Benchmarking traversal operations over graph databases' - caspar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Benchmarking traversal operations over graph databases

Benchmarking traversal operations over graphdatabases

Marek Ciglan1, AlexAverbuch2 and Ladialav Hluchý1

1Institute of Informatics, Slovak Academy of sciences, Bratislava

2 Swedish Institute of Computer ScienceStockholm, Sweden


Overview
Overview

  • Graph data management

  • Graph databases

    • Characteristics

    • Unique features

    • Challenges

  • GDB Benchmarking

    • Motivation

    • Related work

  • Graph traversal benchmark

    • Goals

    • Design

  • Preliminary results

21 November 2011


Graph data management
Graph data management

  • Booming area of R&D in recent years

  • Reasons:

    • Increased availability and importance of graph data

    • Natural way for modelling various real world phenomena

      • (networks: social, information, communication)

  • Two dominant data management directions:

    • Distributed graph processing frameworks

      • Mining/processing of large graphs

        • Pregeland clones (Goden Orb, Giraph)

    • Graph databases

      • Persistent management of graph data

        • Neo4J, OrientDB, Dex

21 November 2011


Graph databases
Graph databases

  • Property graph data model

    • Graph structure

    • Elements have properties

Node K2

Attr I1: val

Attr I2: val

Attr I3: val

L1

L3

Node K1

Attr I1: val

Attr I2: val

Attr I3: val

Node K4

Attr I1: val

Attr I2: val

Attr I3: val

Node K3

Attr I1: val

Attr I2: val

Attr I3: val

L2

L1

21 November 2011


Graph databases1
Graph databases

  • Property graph data model

    • Graph structure

    • Elements have properties

  • Unique feature

    • Graph topology capturing the relations of objects

    • Graph database should be

      • Efficient in exploiting topology

      • Allows for fast traversal

  • Challenges

    • Traditionally – graph processing/traversing done in memory

    • Reasons:

      • Data driven computation

      • Random access pattern for data access

21 November 2011


Graph database benchmarking
Graph database benchmarking

  • Motivation

    • Number of emerging graph data management solutions.

    • Which is right one for a specific problem?

    • Fair measurement of performance for distinct use cases.

    • Identify limits – what use cases have good performance.

21 November 2011


Graph database benchmarking1
Graph database benchmarking

  • Motivation

    • Number of emerging graph data management solutions.

    • Which is right one for a specific problem?

    • Fair measurement of performance for distinct use cases.

    • Identify limits – what use cases have good performance.

  • Related work

    • Only few works address directly graph databases

      • D. Dominguez-Sal et al:

        • Adoption of HPC benchmark for graph data processing

        • Design of a benchmark suitable for graph database systems

      • GraphBench - basic benchmarking framework implementation

21 November 2011


Graph database benchmarking2
Graph database benchmarking

  • Motivation

    • Number of emerging graph data management solutions.

    • Which is right one for a specific problem?

    • Fair measurement of performance for distinct use cases.

    • Identify limits – what use cases have good performance.

  • Traversal operation benchmarking

    • Graph topology – unique feature of the graph databases

    • Test the ability to do:

      • Local traversals (exploring k-hops neighbourhood)

      • Global traversals (traversals of whole graph)

    • Perform traversals in a memory constraint environment

      • (can we deal efficiently with data sets exceeding the physical memory?)

21 November 2011


Benchmark design
Benchmark design

  • Fairness

    • Blueprints API – effort to provide common API

      • https://github.com/tinkerpop/blueprints/wiki/

    • Using Blueprints – one implementation of benchmark for all the benchmarked systems

      • Avoid bias of different implementation of benchmark for different systems

    • execution of the same sequence of operations on the same data

      • log operations and their parameters in the first run over the defined data

      • logs are persistent, allowing benchmarks to be rerun on different versions of a product, and the change in performance can thus be measured

21 November 2011


Benchmark design1
Benchmark design

  • Data

    • Different data properties / distributions affects benchmark results

      • E.g. dense vs. sparse graphs

    • Ideally, data sets properties similar to those of real world data sets

    • Use: scale free networks with small world properties

      • social networks, the Internet, traffic networks, biological networks, and term co-occurrence networks

      • LFR-Benchmark generator - networks with power-law degree distribution and implanted communities within the network

21 November 2011


Benchmark design2
Benchmark design

  • Traversal operations

    • Local traversals

      • Compute local clustering coefficient (2-hops breadth first traversal)

      • 3-hops breadth first traversal

    • Global traversals

      • Compute connected components

        • Incomming / ougoing edges

      • k-iterations of HITS algorithm

  • Memory constraint environment

  • Intermediate results for global traversals operations:

    • Kept in memory

    • Kept as properties on nodes

21 November 2011


Benchmark implementation
Benchmark implementation

  • Implemented on top of Blueprints API

  • Test performed on:

    • Neo4J,

    • DEX,

    • OrientDB6 ,

    • Native RDF repository (NativeSail)

    • SGDB (research prototype )

  • Challenge: deal with differences in underlying systems, E.g.:

    • triple stores – naming constraints,

    • some impl. do not support properties on some elements

    • Some impl. do not support iteration over nodes/edges

    • Nodes Ids generation – user provided vs. autogenerated

    • Transaction support / no transactions

21 November 2011


Benchmark runs
Benchmark Runs

  • Performed on older hardware:

    • 2G mem

  • Data sets sizes:

    • 1K, 10K, 40K, 50K, 100K, 200K, 400K, 800K, 1M

    • Most systems were not able to load nets with 400K+ edges

      • (constraint: load 10K edges in less than 60 sec.)

21 November 2011


Graph loading elements insertion
Graphloading – elementsinsertion

21 November 2011


Local traversal bfs 3 hops
Localtraversal – BFS 3 hops

21 November 2011


Global traversals connected components
Globaltraversals – connectedcomponents

21 November 2011


Conclusion
Conclusion

  • Extending work on benchmarking graph databases

  • Focusing on graph traversal operations

  • Local/Global traversals

  • Preliminary results:

    • Problem just to load larger datasets into GDBs

    • Stable performance for local traversals with 2-3 hops

      • Suitable for most ego-centric node properties analysis

    • Bad performance for global traversal operations on larger networks

21 November 2011


Thank you for your attention
Thankyouforyourattention.

http://ups.savba.sk/~marek/gbench.html

21 November 2011


Semsets activation spreading over network
SemSets – activation spreading over network

21 November 2011