Rcq ga rdf chain query optimization using genetic algorithms
This presentation is the property of its rightful owner.
Sponsored Links
1 / 15

RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on
  • Presentation posted in: General

RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms. Introduction. The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools

Download Presentation

RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Rcq ga rdf chain query optimization using genetic algorithms

RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms

EC-Web 2009


Introduction

Introduction

  • The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools

  • Fast query engines are needed for efficient querying of large amounts of data, usually represented using the Resource Description Framework (RDF)

  • Problem: optimizing query paths (the order in which different parts of a query are evaluated)

  • Two-phase optimization (2PO) has already been proposed (Stuckenschmidt et al. 2005) in a Semantic Web context, but a genetic algorithm (GA) appears to be a feasible alternative

EC-Web 2009


Rdf and query paths 1

RDF and Query Paths (1)

  • RDF model is a collection of facts declared using RDF

  • Facts are triples in the form of a node-arc-node link consisting of a subject, a predicate, and an object

  • RDF sources can be queried using SPARQL

  • We consider a subset of SPARQL queries: chain queries, where a query path is followed by performing joins between its subpaths of length 1

    1. PREFIX c: <http://www.daml.org/2001/09/countries/fips#>2. PREFIX o: <http://www.daml.org/2003/09/factbook/factbook-ont#>3. SELECT ?partner4. WHERE { c:SouthAfrica o:importPartner ?impPartner .5. ?impPartner o:country ?partner .6. ?partner o:border ?border .7. ?border o:country ?neighbour .8. ?neighbour o:internationalDispute ?dispute .9. }

EC-Web 2009


Rdf and query paths 2

RDF and Query Paths (2)

Bushy query tree

Right-deep query tree

EC-Web 2009


Rdf query path optimization 1

RDF Query Path Optimization (1)

  • Challenge: determine the right order in which the joins should be computed, hereby optimizing the overall response time

  • Consider a solution space with query paths

  • Solutions are associated with data transmission and processing costs

  • Data processing costs are the sum of all join costs, which are influenced by the cardinalities of each operand and the join method used

  • Neighbouring solutions in solution space can be identified using transformation rules introduced by Ioannidis and Kang (1990)

EC-Web 2009


Rdf query path optimization 2

RDF Query Path Optimization (2)

  • Stuckenschmidt et al. (2005) propose to use 2PO for RDF chain query optimization:

    • Using Iterative Improvement (II), local optima are found by walking through solution space (from random starting points), while only taking steps yielding improvement in solution quality

    • The best local optimum thus found is used as starting point for Simulated Annealing (SA); a walk through solution space is performed, where moves not yielding improvement are accepted with a declining probability

  • We propose to optimize RDF chain queries using a GA, RCQ-GA

EC-Web 2009


Rdf query path optimization 3

RDF Query Path Optimization (3)

  • In a GA, a population of chromosomes (solutions) is exposed to evolution: selection, crossovers, and mutations

  • A GA generally is aware of good solutions faster than 2PO, but tends to spend a lot of time optimizing these already good results before it terminates

  • We adopt the BushyGenetic (BG) algorithm proposed by Steinbrunn et al. (1997) for traditional query path optimization, but stimulate quicker convergence through elitist selection, fitness-based selection, a decreased population size, and tighter stopping conditions

EC-Web 2009


Rdf query path optimization 4

RDF Query Path Optimization (4)

  • Solutions are encoded using an efficient ordinal number encoding scheme, facilitating easy crossover and mutation operations

  • The algorithm iteratively joins two concepts in an ordered list of concepts

  • Result is saved on position of first appearing concept

  • Example:

    • (c1, c2, c3, c4): join 3 and 4

    • (c1, c2, c3c4): join 1 and 2

    • (c1c2, c3c4): join 1 and 2

    • (c1c2c3c4)

  • Encoding: ((3,4),(1,2),(1,2))

EC-Web 2009


Performance 1

Performance (1)

  • We benchmark execution times and solution quality of BG and our adaptation to RDF query environments, RCQ-GA, against those of 2PO

  • The effects of a time limit (1 second) on 2PO and RCQ-GA are also assessed

  • The entire solution space is considered (i.e., bushy query trees are valid options)

  • Each algorithm is tested on chain queries varying in length from 2 to 20 predicates

  • Each experiment is iterated 100 times

  • For now, we focus on a single source: RDF version of CIA World Factbook

EC-Web 2009


Performance 2

Performance (2)

Relative deviation of average execution times from 2PO average

EC-Web 2009


Performance 3

Performance (3)

Relative deviation of average solution costs from 2PO average

EC-Web 2009


Performance 4

Performance (4)

Relative deviation of coefficients of variation of solution costs from 2PO average

EC-Web 2009


Conclusions

Conclusions

  • In optimizing the query path for chain queries in a single-source RDF query execution environment, the performance of a GA compared to 2PO is positively correlated with the complexity of the solution space and the restrictiveness of the environment

  • An appropriately configured GA can outperform 2PO in solution quality, execution time needed, and consistency of solution quality

EC-Web 2009


Future work

Future Work

  • Optimize parameters (e.g., using meta-algorithms)

  • Evaluate performance in a distributed setting

  • Experiment with other algorithms, such as ant colony optimization or particle swarm optimization

EC-Web 2009


Questions

Questions?

  • Feel free to contact:

    Alexander HogenboomErasmus School of EconomicsErasmus University RotterdamP.O. Box 1738, 3000 DR, The [email protected]

EC-Web 2009


  • Login