html5-img
1 / 13

Cost Framework for a Heterogeneous Distributed Semi-structured Environment

Cost Framework for a Heterogeneous Distributed Semi-structured Environment. Tianxiao Liu (1)(2) Tuyet-Tram Dang-Ngoc (1) Dominique Laurent (1). (1) ETIS Laboratory University of Cergy-Pontoise Cergy-Pontoise, France (2) Xcalia S.A., Paris, France. June 18 th , 2007.

adia
Download Presentation

Cost Framework for a Heterogeneous Distributed Semi-structured Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cost Framework for a Heterogeneous Distributed Semi-structured Environment Tianxiao Liu (1)(2) Tuyet-Tram Dang-Ngoc (1) Dominique Laurent (1) (1) ETIS Laboratory University of Cergy-Pontoise Cergy-Pontoise, France (2) Xcalia S.A., Paris, France June 18th, 2007 DBMAN 2007

  2. Outline • Motivation • Cost models for heterogeneous data sources • Contributions • Generic language for cost communication • Dynamic cost estimation framework • Conclusion DBMAN 2007

  3. Motivation • Cost-based query optimization • Various execution plans for the same query • Different costs for each plan (execution time, price, communication, etc.) • Cost model used to estimate the cost of candidate plans • Cost formulas: source oriented or operation oriented • Statistics of data sources • Problems in the case of mediation context • Data source autonomy: cost models not available • Integration of various cost models at mediator level • Cost communication between components of the system DBMAN 2007

  4. Cost models for heterogeneous data sources Cost models based on operation implementation Generic cost models Specific methods Adapted Refined Calibration [DKS92] Sampling [ZL98] Cost model by history [ACP96] Operation [GP89] [ML86] [SA82] Adaptive [Zhu95] Extended Applied Calibration [GST96] Flora [Flo96] [Gru96] Operation [CD92] [BMG93] [DOA+94] Wrapper [HKWY97] [ROH99] Applied Hybrid cost model [NGT98] Access Path [GGT96] Operation [AAN01] [MW99] XQuery Self-Learning [ZHJGML05] Known sources Heterogeneous autonomous sources Relational Data sources Object oriented Data souces Semi-structured Data sources DBMAN 2007

  5. Background XLive mediation system and its XQuery evaluation process … … Response Query Result (XML) Query XQuery Mediator Evaluation Equivalent rules Search Strategy Mediator Information Repository Canonization Canonized XQuery XAlgebra Cost-based Optimization Cost information Transformation Mediator operators Modeling Wrapper Information Repository Tree Graph View (TGV) Annotated TGV Annotation Wrapper operators Cost information Wrapper Wrapper Wrapper Relational data source XML data source Web services DBMAN 2007

  6. BackgroundTree Graph View (TGV) An example of XQuery TGV presentation DBMAN 2007

  7. Generic cost model in a mediation context • Design a generic cost model… • Source type: relational, semi-structured, web-service… • Specific methods • Calibration, History… • APIs implemented by the system • Principle: as accurate as possible • …Using cost formulas • Equation systems • Statistics expressed also in the form of equation • Constant values • Existing generic cost model (Disco) • Object Oriented environment • Predefined variables in the language DBMAN 2007

  8. Our proposal: Generic Language for Cost Communication (GLCC) • A language based on XML • Cost formulas and equation systems in the form of MathML • A generic language • No predefined variables • Express different costs for various optimization objectives (time, price…) DBMAN 2007

  9. Dynamic cost estimation framework • Cooperation and communication between different components of XLive • Use execution results (response time) to improve the accuracy of cost models • Cost communication performed in GLCC DBMAN 2007

  10. Overall cost estimation on the mediatorTGV cost annotation • For one or a group of operations in a TGV, annotate with cost information Annotated DBMAN 2007

  11. Overall cost estimation on the mediatorCost Annotation Tree (CAT) • Breadth-first traversal of CAT to associate the execution cost for each node DBMAN 2007

  12. Conclusion and future work • Contributions • First cost-based query optimization framework for XML-based mediation system • Generic language • Suitable for various search strategies • Future work • Cost model validation: Accuracy and performance • Calibrating cost of native XML Data sources • Search Strategy DBMAN 2007

  13. Thanks for your attention! Questions? DBMAN 2007

More Related