Loading in 5 sec....

Benchmarking traversal operations over graph databasesPowerPoint Presentation

Benchmarking traversal operations over graph databases

- 115 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Benchmarking traversal operations over graph databases' - caspar

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Benchmarking traversal operations over graphdatabases

Marek Ciglan1, AlexAverbuch2 and Ladialav Hluchý1

1Institute of Informatics, Slovak Academy of sciences, Bratislava

2 Swedish Institute of Computer ScienceStockholm, Sweden

Overview

- Graph data management
- Graph databases
- Characteristics
- Unique features
- Challenges

- GDB Benchmarking
- Motivation
- Related work

- Graph traversal benchmark
- Goals
- Design

- Preliminary results

21 November 2011

Graph data management

- Booming area of R&D in recent years
- Reasons:
- Increased availability and importance of graph data
- Natural way for modelling various real world phenomena
- (networks: social, information, communication)

- Two dominant data management directions:
- Distributed graph processing frameworks
- Mining/processing of large graphs
- Pregeland clones (Goden Orb, Giraph)

- Mining/processing of large graphs
- Graph databases
- Persistent management of graph data
- Neo4J, OrientDB, Dex

- Persistent management of graph data

- Distributed graph processing frameworks

21 November 2011

Graph databases

- Property graph data model
- Graph structure
- Elements have properties

Node K2

Attr I1: val

Attr I2: val

Attr I3: val

L1

L3

Node K1

Attr I1: val

Attr I2: val

Attr I3: val

Node K4

Attr I1: val

Attr I2: val

Attr I3: val

Node K3

Attr I1: val

Attr I2: val

Attr I3: val

L2

L1

21 November 2011

Graph databases

- Property graph data model
- Graph structure
- Elements have properties

- Unique feature
- Graph topology capturing the relations of objects
- Graph database should be
- Efficient in exploiting topology
- Allows for fast traversal

- Challenges
- Traditionally – graph processing/traversing done in memory
- Reasons:
- Data driven computation
- Random access pattern for data access

21 November 2011

Graph database benchmarking

- Motivation
- Number of emerging graph data management solutions.
- Which is right one for a specific problem?
- Fair measurement of performance for distinct use cases.
- Identify limits – what use cases have good performance.

21 November 2011

Graph database benchmarking

- Motivation
- Number of emerging graph data management solutions.
- Which is right one for a specific problem?
- Fair measurement of performance for distinct use cases.
- Identify limits – what use cases have good performance.

- Related work
- Only few works address directly graph databases
- D. Dominguez-Sal et al:
- Adoption of HPC benchmark for graph data processing
- Design of a benchmark suitable for graph database systems

- GraphBench - basic benchmarking framework implementation

- D. Dominguez-Sal et al:

- Only few works address directly graph databases

21 November 2011

Graph database benchmarking

- Motivation
- Number of emerging graph data management solutions.
- Which is right one for a specific problem?
- Fair measurement of performance for distinct use cases.
- Identify limits – what use cases have good performance.

- Traversal operation benchmarking
- Graph topology – unique feature of the graph databases
- Test the ability to do:
- Local traversals (exploring k-hops neighbourhood)
- Global traversals (traversals of whole graph)

- Perform traversals in a memory constraint environment
- (can we deal efficiently with data sets exceeding the physical memory?)

21 November 2011

Benchmark design

- Fairness
- Blueprints API – effort to provide common API
- https://github.com/tinkerpop/blueprints/wiki/

- Using Blueprints – one implementation of benchmark for all the benchmarked systems
- Avoid bias of different implementation of benchmark for different systems

- execution of the same sequence of operations on the same data
- log operations and their parameters in the first run over the defined data
- logs are persistent, allowing benchmarks to be rerun on different versions of a product, and the change in performance can thus be measured

- Blueprints API – effort to provide common API

21 November 2011

Benchmark design

- Data
- Different data properties / distributions affects benchmark results
- E.g. dense vs. sparse graphs

- Ideally, data sets properties similar to those of real world data sets
- Use: scale free networks with small world properties
- social networks, the Internet, traffic networks, biological networks, and term co-occurrence networks
- LFR-Benchmark generator - networks with power-law degree distribution and implanted communities within the network

- Different data properties / distributions affects benchmark results

21 November 2011

Benchmark design

- Traversal operations
- Local traversals
- Compute local clustering coefficient (2-hops breadth first traversal)
- 3-hops breadth first traversal

- Global traversals
- Compute connected components
- Incomming / ougoing edges

- k-iterations of HITS algorithm

- Compute connected components

- Local traversals
- Memory constraint environment
- Intermediate results for global traversals operations:
- Kept in memory
- Kept as properties on nodes

21 November 2011

Benchmark implementation

- Implemented on top of Blueprints API
- Test performed on:
- Neo4J,
- DEX,
- OrientDB6 ,
- Native RDF repository (NativeSail)
- SGDB (research prototype )

- Challenge: deal with differences in underlying systems, E.g.:
- triple stores – naming constraints,
- some impl. do not support properties on some elements
- Some impl. do not support iteration over nodes/edges
- Nodes Ids generation – user provided vs. autogenerated
- Transaction support / no transactions

21 November 2011

Benchmark Runs

- Performed on older hardware:
- 2G mem

- Data sets sizes:
- 1K, 10K, 40K, 50K, 100K, 200K, 400K, 800K, 1M
- Most systems were not able to load nets with 400K+ edges
- (constraint: load 10K edges in less than 60 sec.)

21 November 2011

Graphloading – elementsinsertion

21 November 2011

Localtraversal – BFS 3 hops

21 November 2011

Globaltraversals – connectedcomponents

21 November 2011

Conclusion

- Extending work on benchmarking graph databases
- Focusing on graph traversal operations
- Local/Global traversals
- Preliminary results:
- Problem just to load larger datasets into GDBs
- Stable performance for local traversals with 2-3 hops
- Suitable for most ego-centric node properties analysis

- Bad performance for global traversal operations on larger networks

21 November 2011

SemSets – activation spreading over network

21 November 2011

Download Presentation

Connecting to Server..