1 / 30

Parallel Processing with Autonomous Databases in a Cluster System

Parallel Processing with Autonomous Databases in a Cluster System. Stéphane Gançarski 1 , Hubert Naacke 1 , Esther Pacitti 2 , Patrick Valduriez 3 1 LIP6, University Paris 6, Paris, France FirstName.LastName@lip6.fr 2 IRIN, Nantes, France Esther.Pacitti@irin.univ-nantes.fr

dolf
Download Presentation

Parallel Processing with Autonomous Databases in a Cluster System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Processing with Autonomous Databasesin a Cluster System Stéphane Gançarski1, Hubert Naacke1, Esther Pacitti2, Patrick Valduriez3 1LIP6, University Paris 6, Paris, France FirstName.LastName@lip6.fr 2IRIN, Nantes, France Esther.Pacitti@irin.univ-nantes.fr 3IRIN, Nantes, France Patrick.Valduriez@inria.fr

  2. TCP/IP • • • • • Cluster of PC Application Service Provider ASP context app1 app2 DBMS DB User site

  3. Potential benefits • For the users • No system administration • High availability • Security • For the provider • Centralized management of apps and databases • Use of a cluster => reduced cost • Economy of scale • new services used by every app.

  4. Challenge for the ASP • Exploit the cluster architecture • To obtain good performance/cost through parallelism • Apps can be update-intensive (≠ search engine) • Without hurting app and database autonomy • Apps and databases should remain unchanged These are conflicting objectives

  5. Solutions readily available • Solution 1: TP monitor • Replicate databases at multiple nodes to increase parallelism • Needs to interface applications to TP monitor • Solution 2: Parallel DBMS (shared disk, shared cache or shared nothing) • Requires heavy migration • Hurts database autonomy

  6. Our (optimistic) approach • Trade consistency for performance • Capture apps profile and consistency requirements • Replicate apps and databases at multiple nodes • Without any change • Use consistency requirements to perform load balancing • Detect and repair inconsistencies • Using database logs

  7. Outline • Cluster architecture • Replication model • Transaction model • Execution model • Future work

  8. Directory Conceptual architecture Request (user, app) authentification authorization connection app 1 connection querying DBMS DB

  9. Internet Application load balancer app app app 1 2 n Transaction load balancer Directory Preventive replication manager DBMS DBMS DBMS DBMS DB DB DB DB Conflicts manager Cluster Architecture

  10. Outline • Cluster architectures • Replication model • Transaction model • Execution model • Future work

  11. Symmetric replication Client Client Read or update Read or update Replication Master Master EMP EMP • Increase performance and availability • But may introduce inconsistencies

  12. Update propagation to replicas • Synchronous: all replicas are updated within the same transaction (2PC) • Replicas always consistent • But does not scale up • Asynchronous: each replica is updated (refreshed) in a separated transaction. We support 2 variants: • Preventive (new solution) : transactions are shortly delayed and ordered based on their timestamp so there is no conflict • Optimistic : most efficient but can create • conflicts to resolve • divergence to control

  13. Outline • Cluster architectures • Replication model • Transaction model • Execution model • Future work

  14. ? ? Example T2 • Case 1 • T1 and T2 data independent or commutative • T1 changes sent to N2 • T2 changes sent to N1 T1

  15. Example Q1 T2 • Case 2 • T1 and T2 perform conflicting updates • Conflict prevention • Conflict detection and resolution • Priority-based • Resolution mode • Dirty read from Q1 • Abort or compensation T1 ? ?

  16. Execution rules • Request profile (for any query or transaction) • stored procedure + parameter values • user id, priority, access control rules • Transaction profile • conflict class : data it may read or write • compatibility with other trans. (disjoint or commutative) • Integrity constraints • Max-table-change : {(Rel, max-#tuple)} • Max-tuple-change : {(Rel, {(att, max-value)})} • Query requirements • precision level, tolerated divergence, …

  17. Execute Trans. Data placement Load Transaction processing Trans (from app) Generate run-time policy Execution rules Trans exec. plan preventive replic. optimistic replic.

  18. Node 1 Stock[1,30,10] Node 2 Stock[1,30,10] Decr(1,15) Decr(1,10) 1 Stock[1,15,10] Stock[1,20,10] 2 Q = n - 1 Q = n -1 3 After synchronization [1,5,10] Q = n Example • Stock(item, quantity, threshold) • Decrease item id by q units • procedure Decr(id, q) • UPDATE Stock • SET quantity = quantity – q • WHERE item = id; • How many item to renew ? • query Q: • SELECT count(item) • FROM Stock • WHERE quantity < threshold Commutative updates parallel processing 1 2 Query tolerates imprecision Query with 100% precision 3

  19. Outline • Cluster architectures • Replication model • Transaction model • Execution model • Future work

  20. Execution model • Problem statement: given the cluster’s load and data placement, and transaction T’s execution plan, find the optimal node to run T • Cost function includes the cost of synchronizing replicas • Step 1: select data access method and replication mode (preventive or optimistic) • Step 2: select best node among those supporting the access method selected at step 1 and run T

  21. Load balancing with optimistic replication • Choice of the node is based on • data placement • node consistency • {Rel, Δ#tuplemax, {(att, Δvaluemax)}} • synchronization cost to meet consistency requir. • apply T’s such that node consistency after applying T’s  requirements • transaction execution cost • normalized estimated response time • node load: (load-avg, {(running T’s, elapsed-time)})

  22. Q2 Q1 T2 T1 ? ? Execution example N2 N1 consistent nodes

  23. T1 Execution example Q2 Q1 T2 T1 imprecision = 1 trans to sync = T1 N2 N1

  24. T2 Execution example Q2 Q1 T2 T1 imprecision = 1 trans to sync = T2 N2 N1

  25. Q1 Execution example Q2 Q1 T2 T1 N2 N1

  26. Q2 T1 Execution example Q2 Q1 T2 T1 N2 N1

  27. Experiments • Implementation • LIP6 cluster (Oracle8i/Linux) • benchmarking with TPC-C : 500MB - 2GB • Interconnection network : 1 GB/s • 5 nodes • Objectives • measure benefit on transaction response time • measure benefit on load balancing for transactions with low consistency requirements

  28. Validation for hot spot load • Incoming load with periodic hot spot : • 10 simultaneous tra nsaction requests • Each request lasts T/4, per period of T.

  29. Hot spot load : results • X: number of nodes : from 1 to 4 • Y: avg response time during hot spot • Benefit on response time • factor of 2 (with 4 nodes) • Even better • if sync starts earlier • improve low-load detection (hot spot end) • if sync faster than original trans • using log to get the update set of a trans.

  30. Future work • Validation by simulation up to 64 nodes • measure scale-up • measure directory access contention • Implement divergence control • capture user & transaction profile (semi-automatic) • generate execution rules (by inference or statistics) • improve node precision (n dimensions) • Implement conflicts resolution/detection

More Related