1 / 46

ObjectGlobe Open, Secure, and QoS-enhanced Distributed Query Processing

ObjectGlobe Open, Secure, and QoS-enhanced Distributed Query Processing. Donald Kossmann Technical University of Munich http://www3.in.tum.de Joint work with Alfons Kemper (Passau) and others. Outline. Background The ObjectGlobe Lookup Service (Security Aspects) QoS Management Summary.

mead
Download Presentation

ObjectGlobe Open, Secure, and QoS-enhanced Distributed Query Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ObjectGlobe Open, Secure, and QoS-enhanced Distributed Query Processing Donald Kossmann Technical University of Munich http://www3.in.tum.de Joint work with Alfons Kemper (Passau) and others

  2. Outline • Background • The ObjectGlobe Lookup Service • (Security Aspects) • QoS Management • Summary

  3. Query Processing on the Internet • Web servers, relational databases on the Web: centralized or limited query capabilities • Middleware Systems:a great deal of data shipping • Goals of ObjectGlobe: • integrate any kind of data • integrate any kind of query processing capabilities • bring query processing capabilities to the data

  4. Middleware for Query Processing thumbnail wrap_S User-defined operators thumbnail wrap_S Data-Provider A Data-Provider B Heavy data shipping S

  5. Open Query Processing (Step 1) thumbnail wrap_S Load functions Data-Provider B Data-Provider A Fct-Provider S thumbnail wrap_S

  6. Open Query Processing (Step 2) thumbnail wrap_S Load functions Data-Provider B Data-Provider A Fct-Provider S thumbnail wrap_S

  7. Traveling from M to UCB Top N Cycle Provider Route Selection Routenplaner Selection Function Provider flights rental cars Data Provider Data Provider

  8. Open QP with ObjectGlobe • Create an open marketplace for • data providers • cycle providers • function providers • Requirements • wrappers exist for all data of data providers • JVM runs on all cycle providers • fixed interface for operators of function providers

  9. Scenarios • Free Internet: everything is free and available for everybody • Restricted Internet: charge according to usage, quality, and timeliness; restrictions (e.g., age) • Intranet: everything is free and available for „insiders“ • Outsourcing: charge for certain services (e.g., backup, business analyses)

  10. Challenges • Lookup Service • Find the relevant services • Security • Protect data and cycle providers from bad code • Quality of Service • What you pay is what you get

  11. ObjectGlobe Lookup-Service Application /User Provider Browse, Search Register Lookup-Service Authorisation, ... Statistics, Cost Information, ... Execution Engine Parser Optimizer

  12. Description of Services • Providers register RDF or XML documents • There is a pre-defined schema to describe services • Data Providers: • Theme (e.g., Hotel) • Attributes (e.g., rate, location, category) • Access paths and wrappers • Characteristics of the server (e.g., availability) • Information for authorization • Statistics • ...

  13. Function Provider: • Signature (e.g., foo(int, int) -> int) • Information for authorization • Hardware requirements (e.g., 30 MB main memory) • Size of Java byte code • ... • Cycle Provider: • Hardware (e.g., 1 GB main memory) • Location and network connections / bandwidth • Information for authorization • ...

  14. XML Description of a Data Provider <DataProvider> <id> 4711 </id> <theme> <name> Hotel </name> <desc> All hotels you ever want </desc> </theme> <Attribute> <topic> city </topic> <type> string </type> </Attribute> ...

  15. Lookup Query • Data Providers for Hotels that return the City and Rate of each hotel search DataProvider dselect d.uniqueId, d.attr.*where d.theme.name = „hotel“ and d.attr.?.topic = „city“ and d.attr.?.topic = „rate“

  16. Three-tier Architecture • Local Lookup-Servers • Keep copies of meta-data of services that are relevant for a particular organization or subsidary • Evaluate Lookup requests for that organization • Relevance is determined by subscription rules (queries) • Public Lookup-Servers (Backbone) • Store all (public) meta-data • Store subscription rules of local Lookup-Servers • Notify local Lookup-Servers of changes • Users can browse in the public info of the backbone

  17. Three-tier Architecture Client Client Client Client Client Queries Answers New Rules Local LS Local LS Local LS New Rules Answers Updates, Inserts Public Lookup-Server Public Lookup-Server

  18. Processing Lookup Requests • Local Lookup-Servers store meta-data in RDBMS • Translate Lookup request into SQL • Registering new services • Public Lookup-Servers store meta-data in RDBMS • Public Lookup-Servers store rules in RDBMS • Apply filter algorithm using RDBMS in order to find relevant local Lookup-Servers • Deletes and updates of services • Apply filter algorithm to find affected local Lookup-Servers (more complicated, however) • Principle: Map everything to RDBMS

  19. Storing XML Data in an RDBMS <person, id = 4711> <name> Lilly Potter </name> <child> <person, id = 314> <name> Harry Potter </name> </child> </person> <person, id = 666> <name> James Potter </name> <child> 314 </child> </person> 0 person person 4711 666 name child name Lilly Potter i314 James Potter person 314 name Harry Potter

  20. Edge Approach Edge Table Value Table (String) Value Table (Integer)

  21. XML Queries • Find the name of all persons that like to play Quidditch and are younger than 18 yearsselect $nwhere <person> <name> $n </name> <age> $a </age> <hobby> Quidditch </hobby> </person>, $a < 18 • Carry out pattern matching with „document graph“

  22. Translation to SQL SELECT nv.value FROM Edge p, Edge n, Edge h, Value nv, Value hv WHERE p.label = „person“ AND p.target = n.source AND n.label = „name“ AND n.target = nv.id AND p.target = h.source AND h.label = „hobby“ AND h.target = hv.id AND hv.value = „Quidditch“; Works essentially in the same way for the query language of our Lookup service.

  23. Publish & Subscribe Algorithm • Decompose subscription rules and store them in RDMBS of Public Lookup-Servers • SQL Join-Queries in order to match sub-rules with meta-data objects(Recall: meta-data is decomposed, too) • SQL Join-Queries in order to re-construct matching subscription rules from sub-rules

  24. Decomposition of Subscription Rules • Data Providers for Stock Market Information that cost less than 500 Dollars:search DataProvider dwhere d.theme.name = „Stock Market“ and d.cost < 500 • Decomposition into three atomic rules:R1: search Theme t where t.name = „Börse“R2: search DataProvider d where d.cost < 500R3: search R1 a, R2 b where b.theme = a • Store these rules in RDBMS

  25. Matching Result of Join: (R1, O1); (R2, O2)

  26. Re-constructing Subscription Rulesfrom matching atomic sub-rules • Store decompositiongraph in RDMBS • higher-level and atomic rules are vertices • Top-level rules are so-called triggering rules;if they are affected, notify LLS • Walk „bottom up“ through decomposition graph • SQL-Join Query: for each pair of matching rules, find out whether they have a common parent • N.B. the decomposition graph is a binary directed, acyclic graph

  27. Preliminary Experiments • Synthetic benchmark database with 100.000 (different) subscription rules • Oracle 8i used in the Public Lookup Server Batch updates are crucial

  28. Summary • Basic Principle: decompose rules and data • Advantages: • Generic, independent of schema • Very easy to implement, no administration needed • Exploit query capabilities of RDBMS • Need not worry about document boundaries • Finding common sub-rules is trivial • Disadvantage: • Sub-optimal query performance (many Joins)but probably sufficient, if updates are batched

  29. Related Work • Lookup Services: Jini, UDDI, Plug & Play • Publish & Subscribe: • IR world • SIFT (Stanford) • XFilter (Berkeley) • LeSelect (INRIA) • Continuous Queries (Niagra, ...) • Storing and Indexing XML Data: ...

  30. Outline • Background • The ObjectGlobe Lookup Service • (Security Aspects) • QoS Management • Summary

  31. Security Requirements in ObjectGlobe • Protection of Data and Cycle Providers • Secure Communication • use SSL connections (authenticated and encrypted) • Authentication of Clients • passwords / certificates • digitally signed requests (query subplans) • Authorization control • data/cycle providers are autonomous • but register user privileges in lookup service

  32. Security of Data/Cycle Providers Secure sandbox Class loader Internal class loader Query 1 ObjectGlobe runtime system Class loader Query 2 Internet Class loader Query 3

  33. Privileged Built-inOperatorsfor Disk or Network Access Internal operator sandbox tmpfile external operator

  34. QoS Management • State of the Art: best-effort • Goal: users should be able to constrain • Cost of execution • Running time • Quality of the results • Initial approach (to get a feeling) • extended query optimization • Admission control • Monitoring and plan adaptions at execution time • Real solution: ???

  35. Quality Parameters • Cost of execution • $ • Running time • First tuple, last tuple, Nth tuple • Quality of the results • Number of results • Coverage: Number (or %) of data sources queried • Staleness of data • Cost as a function of coverage (-> Mariposa) • Cost as a function of #wheels (Mercedes)

  36. Quality of Service-Parameters Response time Desired space for query plans Cost (€) max max Completeness min

  37. Extended Query Optimization Bottom-up dynamic programming query optimizer, standard costing etc., and the following extensions • Generate alternatives for each operator • Consider classes of equivalent providers • Extended Pruning, Heuristics for choosing a Winner • Enumerate „incomplete“ UNIONs • Initialize QoS-Accounts

  38. Query Optimization: Quality of Service-Considerations Cost illegal QEP R Q P Completeness 40%

  39. cost timeO timeN cost timeO timeN cost timeO timeN wrap_S host=A.com QoS-Annotated Query Plan display host=client QoS Accounts host=client thumbnail host=A.com host=B.com host=B.com scan scan host=A.com host=B.com

  40. Optimization: Open Questions • Revisit heuristics to choose winning plan • Dynamic heuristics depending on workload and/or feedback • Reverse engineering a plan • How much data should a plan read if the cost should be $5.00? • Does query optimization matter?

  41. Admission Control & Monitoring • Admission Control: • Check assumptions of optimizer • Carried out at plan instantiation time for each plan fragment (set of operators at one site) • Monitoring: • Predict quality of results at the end of execution • Carried out by special Monitoring operators • Take actions if violations are detected • ECA rules specify actions

  42. at the end of pipelines are non-blocking / low cost above „receive“ ops keep statistics for predictions differentiate between „open“ and „next“ phase Communicate with each other for liveliness Monitoring Operators Join monitor receive send send monitor monitor A B

  43. Plan Adaptions • General: Abort, Restart / Reoptimize • Response Time Violation: • compressConnection • movePlan (w/wo state) • increasePriority • removeTempResults, ... • Coverage / Result Quality Violation: • addSubPlan • Cost Violation: • movePlan, decreasePriority, ...

  44. ECA Rules for Adaptions if cost is highand coverageis lowthen abort if cost is high and coverage is high then delResults if rt is high and costis lowand networkis criticalthen compress

  45. Plan Adaptions: Open Questions • What is the right mix of actions? • What are the right thresholds for the rules? • How to avoid the „Schweinezyklus“? • How to draw the right conclusions from the statistics produced by Monitoring? • What is the right granularity of actions?Plan vs. Operator vs. Tuple

  46. Project Status • First demo presented at SIGMOD 99 • Travel information • Four Web data sources (hotels, sights, train conns) • One function provider (travel routes, top N) • Three cycle providers (two in Europe, one in US) • Online-Demo: http://db.fmi.uni-passau.de/projects/OG • Current work: more experiments • Problem: getting data from Web sources is sloooow

More Related