rcq ga rdf chain query optimization using genetic algorithms
Download
Skip this Video
Download Presentation
RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms

Loading in 2 Seconds...

play fullscreen
1 / 15

RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms - PowerPoint PPT Presentation


  • 114 Views
  • Uploaded on

RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms. Introduction. The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'RCQ-GA: RDF Chain Query Optimization using Genetic Algorithms' - ernst


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
introduction
Introduction
  • The application of Semantic Web technologies in an Electronic Commerce environment implies a need for good support tools
  • Fast query engines are needed for efficient querying of large amounts of data, usually represented using the Resource Description Framework (RDF)
  • Problem: optimizing query paths (the order in which different parts of a query are evaluated)
  • Two-phase optimization (2PO) has already been proposed (Stuckenschmidt et al. 2005) in a Semantic Web context, but a genetic algorithm (GA) appears to be a feasible alternative

EC-Web 2009

rdf and query paths 1
RDF and Query Paths (1)
  • RDF model is a collection of facts declared using RDF
  • Facts are triples in the form of a node-arc-node link consisting of a subject, a predicate, and an object
  • RDF sources can be queried using SPARQL
  • We consider a subset of SPARQL queries: chain queries, where a query path is followed by performing joins between its subpaths of length 1

1. PREFIX c: 2. PREFIX o: 3. SELECT ?partner4. WHERE { c:SouthAfrica o:importPartner ?impPartner .5. ?impPartner o:country ?partner .6. ?partner o:border ?border .7. ?border o:country ?neighbour .8. ?neighbour o:internationalDispute ?dispute .9. }

EC-Web 2009

rdf and query paths 2
RDF and Query Paths (2)

Bushy query tree

Right-deep query tree

EC-Web 2009

rdf query path optimization 1
RDF Query Path Optimization (1)
  • Challenge: determine the right order in which the joins should be computed, hereby optimizing the overall response time
  • Consider a solution space with query paths
  • Solutions are associated with data transmission and processing costs
  • Data processing costs are the sum of all join costs, which are influenced by the cardinalities of each operand and the join method used
  • Neighbouring solutions in solution space can be identified using transformation rules introduced by Ioannidis and Kang (1990)

EC-Web 2009

rdf query path optimization 2
RDF Query Path Optimization (2)
  • Stuckenschmidt et al. (2005) propose to use 2PO for RDF chain query optimization:
    • Using Iterative Improvement (II), local optima are found by walking through solution space (from random starting points), while only taking steps yielding improvement in solution quality
    • The best local optimum thus found is used as starting point for Simulated Annealing (SA); a walk through solution space is performed, where moves not yielding improvement are accepted with a declining probability
  • We propose to optimize RDF chain queries using a GA, RCQ-GA

EC-Web 2009

rdf query path optimization 3
RDF Query Path Optimization (3)
  • In a GA, a population of chromosomes (solutions) is exposed to evolution: selection, crossovers, and mutations
  • A GA generally is aware of good solutions faster than 2PO, but tends to spend a lot of time optimizing these already good results before it terminates
  • We adopt the BushyGenetic (BG) algorithm proposed by Steinbrunn et al. (1997) for traditional query path optimization, but stimulate quicker convergence through elitist selection, fitness-based selection, a decreased population size, and tighter stopping conditions

EC-Web 2009

rdf query path optimization 4
RDF Query Path Optimization (4)
  • Solutions are encoded using an efficient ordinal number encoding scheme, facilitating easy crossover and mutation operations
  • The algorithm iteratively joins two concepts in an ordered list of concepts
  • Result is saved on position of first appearing concept
  • Example:
    • (c1, c2, c3, c4): join 3 and 4
    • (c1, c2, c3c4): join 1 and 2
    • (c1c2, c3c4): join 1 and 2
    • (c1c2c3c4)
  • Encoding: ((3,4),(1,2),(1,2))

EC-Web 2009

performance 1
Performance (1)
  • We benchmark execution times and solution quality of BG and our adaptation to RDF query environments, RCQ-GA, against those of 2PO
  • The effects of a time limit (1 second) on 2PO and RCQ-GA are also assessed
  • The entire solution space is considered (i.e., bushy query trees are valid options)
  • Each algorithm is tested on chain queries varying in length from 2 to 20 predicates
  • Each experiment is iterated 100 times
  • For now, we focus on a single source: RDF version of CIA World Factbook

EC-Web 2009

performance 2
Performance (2)

Relative deviation of average execution times from 2PO average

EC-Web 2009

performance 3
Performance (3)

Relative deviation of average solution costs from 2PO average

EC-Web 2009

performance 4
Performance (4)

Relative deviation of coefficients of variation of solution costs from 2PO average

EC-Web 2009

conclusions
Conclusions
  • In optimizing the query path for chain queries in a single-source RDF query execution environment, the performance of a GA compared to 2PO is positively correlated with the complexity of the solution space and the restrictiveness of the environment
  • An appropriately configured GA can outperform 2PO in solution quality, execution time needed, and consistency of solution quality

EC-Web 2009

future work
Future Work
  • Optimize parameters (e.g., using meta-algorithms)
  • Evaluate performance in a distributed setting
  • Experiment with other algorithms, such as ant colony optimization or particle swarm optimization

EC-Web 2009

questions
Questions?
  • Feel free to contact:

Alexander HogenboomErasmus School of EconomicsErasmus University RotterdamP.O. Box 1738, 3000 DR, The [email protected]

EC-Web 2009

ad