1 / 54

金 泰勇 九州大学大学院システム情報研究院 知能システム専攻

Inter-Transactional Parallelism for Persistent Distributed Shared Virtual Memory - Implementation and Performance -. 金 泰勇 九州大学大学院システム情報研究院 知能システム専攻. Outline. Introduction Overview of WAKASHI Network Of Workstations Persistent Distributed Shared Virtual Memory

bessie
Download Presentation

金 泰勇 九州大学大学院システム情報研究院 知能システム専攻

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inter-Transactional Parallelism for Persistent Distributed Shared Virtual Memory- Implementation and Performance - 金 泰勇 九州大学大学院システム情報研究院 知能システム専攻

  2. Outline • Introduction • Overviewof WAKASHI • Network Of Workstations • Persistent Distributed Shared Virtual Memory • GeneralizedDistributedLockProtocol • Algorithm • Related Work • Evaluation (M-OO7 benchmark) • Cost-basedDistributedTransaction Coordinator • Architecture and Algorithm • Related Work • Evaluation ( TPC-C benchmark) • Conclusion and Future Work

  3. Introduction (1) • ShusseUo - An Object Database Management Group (ODMG) compliantObject Database System • OQL Compiler • ODL Pre-Processor WARASA • ODMG Object Model • Persistent Object Manipulation Language • (C++ Binding) INADA • PersistentDistributedSharedVirtual Memory WAKASHI • Transaction Management Operating System

  4. Introduction (2) • Network Of Workstations(NOW) NOW Local Area Network CPU CPU … CPU Disk Disk Disk Workstation Workstation Workstation

  5. Introduction (3) • Characteristics of NOW [Berkeley NOW group] • Better performance for sequential application than an individual workstation • Most of the sequential applications can be divided into several independent parts • These parts are able to be executed in parallel at NOW. • Better price/performance for parallel applications than Massively Parallel Processors (MPP) We utilized NOW as the hardware environment of Database server

  6. T1a T1b T1 T1c T2a T2 T2b Introduction (4) • Two Transactional Parallelisms Inter-Transactional Parallelism NOW Intra-Transactional Parallelism

  7. Introduction (5) • Distributed Shared Virtual Memory (DSVM) [Kai Li 1989, Princeton University] DSVM Space … Memory Memory Memory • Hardware Level DSVM • [DASH, KSRI] • Software Level DSVM • [MUNIN, TreadMark, WAKASHI] Disk Disk Disk Workstation Workstation Workstation

  8. Introduction (6) • Persistent Distributed Shared Virtual Memory • Transaction integrated DSVM • All of DSVM access are included into the transactions • Data in DSVM space is maintained persistent • The problems which PDSVM has to face • Utilize the resource efficiently • Decrease the cost of the communication exchanging among different sites Two main factors to evaluate the communication cost. • Message Size • Number of the messages Cost( n KB message) < n ×Cost( 1 KB message )

  9. Introduction (7) • PDSVM implemented at WAKASHI DSVM Mapping DSVM Mapping DSVM Mapping DSVM Mapping Disk Mapping Disk Mapping Primary Site Mirror Site Primary Site Mirror Site Mirror Site Mirror Site

  10. Write(p) Read(p) Swap p out to disk Swap p into memory Swap p into memory Primary Read Primary Write Introduction (8) • PDSVM Data Access patterns and cost at Primary Site P

  11. Read(p) Write(p) remote_page_lock page_transfer remote_page_lock page_transfer Swap p out to disk Swap p out to disk Swap p out to disk Swap p into memory Primary Site Mirror Site Introduction (8) • PDSVM Data Access patterns at Mirror Site

  12. Generalized Distributed Lock (GDL) Protocol

  13. Generalized Distributed Lock Protocol of WAKASHI • Lock Release transaction begin transaction end w(p) Primary site Mirror site transaction begin transaction end transaction begin transaction end r(p) r(p) Message Type: • Remote Page Lock • Page Transfer • All Page Lock Release • Remote Page Lock Forward

  14. Generalized Distributed Lock Protocol of WAKASHI • Lock Retain transaction begin transaction end w(p) Primary site Mirror site transaction begin transaction end transaction begin transaction end r(p) r(p)

  15. Generalized Distributed Lock Protocol of WAKASHI • Retain Mode ( Commit Mode and Abort Mode) Lock Release Read Lock Retain Mode Read Lock Retain Write Lock Write Lock Retain Transaction Commit or Transaction Abort Retain Mode: LRL_LRL, RLRT_LRL, WLRT_LRL LRL_RLRT, RLRT_RLRT, WLRT_RLRT

  16. Generalized Distributed Lock Protocol of WAKASHI • Attach retain mode to transaction Transaction_Begin( <h1, mode_1>, <h2, mode_2>…); … … READ(h1, p1); … READ(h2, P2); … WRITE(h3, P3); Transaction_Commit(); • Retain modes are decided when a transaction begins • Using <HID, RETAIN_MODE > to attach retain mode to a heap • Attached retain modes are just valid on the pages accessed during the transaction

  17. Generalized Distributed Lock Protocol of WAKASHI • Related Work • Lazy Release Consistency (LRC) Protocol [Rice University, 1992] • Locks are managed by a lock manager • At the client programs, locks are processed by two kinds of primitive: Acquire and Release • Lock are not released immediately when locks are released at client program • From their measurement result, LRC performs better than common release consistency protocol at some applications • In LRC, the lockprimitives are explicitly set by client programmer • LRC is designed for the distributed parallel computing applications

  18. Generalized Distributed Lock Protocol of WAKASHI • Related Work • Cache Consistency Protocol at Client-Server Database Architecture • Caching 2 Phase Lock (C2PL) Approach [Franklin 1992] • The lock should be granted when a cache is to be accessed • The lock should be released when a transaction ends • Callback (CB) Approach [Wisconsin Univ. 1992, 1997] All of the caches remain valid until a callback message comes • Callback Read: When a cache is to be updated, the callbacks are sent to other clients where the caches are READ • Callback All When a cache is to be accessed, the callbacks are sent to other clients which are holding the caches that are conflict with the access • Difference with GDL • Architectures are different • GDL supports more lock processing modes

  19. Private Module Private Module Private Module User User User Module Assembly (7Level) … Shared Module … Mega Module … … Sub Module Sub Module Sub Module Composite Parts Atomic Parts Evaluation of GDL • Multi-User OO7

  20. Evaluation of GDL • TransactionType • Read Only: Traverse module without any update • Update: Traverse with update each atomic parts • Operation Configuration Vector (OCV) • <Pr, Pw, Sr, Sw> • Pr/Pw is the probability of read/write operations occurring at private modules • Sr/Sw is the probability of read/write operations occurring at shared modules • OCV Types • Read Only <50, 0, 50, 0> • 10% Update <45, 5, 45, 5> • 50% Update <25, 25, 25, 25>

  21. Evaluation of GDL • Testbed Ethernet-100M bit Ultra5 Ultra5 Ultra5 Ultra5 Ultra5 Ultra5 Ultra5 Super-Sparc (400Mhz) Disk IBM DJNA (22G) Main Memory 128M

  22. Evaluation of GDL • Non-Clustering Plan Ultra5 Ultra5 Ultra5 Ultra5 Ultra5 Ultra5 Database • All of modules are located in 1 heap • The heap is located at 1 site

  23. Evaluation Result • Read Only

  24. Evaluation Result • 10%Update

  25. Evaluation Result • 50%Update

  26. Ultra5 Evaluation of GDL • Clustering Plan Ultra5 Ultra5 Ultra5 Ultra5 Ultra5 Ultra5 Shared Private Private Private Private Private Private Module Module Module Module Module Module Module • Each Module are located in 2 heaps (ReadOnly, Update) • Private Modules are distributed in all of the sites • Shared Module is allocated at 1 site.

  27. Evaluation Result • Read Only

  28. Evaluation Result • Number of the messages at Read Only

  29. Evaluation Result • 10%Update

  30. Evaluation Result • Number of the messages at 10% Update

  31. Evaluation Result • Number of the remote page lock messages at 10% Update

  32. Evaluation Result • Number of the remote page lock forward messages at 10% Update

  33. Evaluation Result • 50%Update

  34. Evaluation Result • Number of the messages at 50% Update

  35. Evaluation Result • Number of the remote page lock messages at 50% Update

  36. Evaluation Result • Number of the remote page lock forward messages at 50% Update

  37. Cost-based Distributed Transaction Coordinator

  38. Transactions Cost-based Distributed Transaction Coordinator • Transaction Coordinator • Utilize all of the workstations efficiently • Execute the transactions with lower cost • Transaction Cost is decided by • The type of transaction • The site where the transaction runs Transaction Coordinator Workstation Workstation Workstation Workstation

  39. T1 T2 T3 T4 Architecture Functionalities of Execute Element Transaction Pool • Execute the coordinated transactions Transaction Scheduler Database Distribution Manager Load Information Manager Cost-based Transaction Coordinator • Collect the load information of the executed transactions Execute Element Manager Adapter Dispatcher • Feedback the load information of the executed transactions to transaction coordinator Dispatcher Dispatcher Dispatcher Dispatcher Execute Element Dispatcher Execute Element Execute Element • Transaction Placement Policy: • Decide how to coordinate the transaction when it is submitted to TC • Transaction Scheduling Policy: • Decide which blocked transaction in Transaction Pool is executed • at which site when a transaction is finished at an Execute Element

  40. Cost-based Approaches • Cost-based Approach - 1 • Static approach(CTC-Static) • Static Coordinator Description File(SCDF) T  S • T is the type ID of transaction • S is the ip address of the host where the transaction t is executed. • Transaction Placement Policy • Select an idle EE for the submitted transaction according to SCDF • Transaction Scheduling Policy • Select the next transaction which is blocked also according to SCDF

  41. Cost-based Approaches • Cost-based Approach - 2 • Transaction Priority Oriented approach(CTC-TPOA) • Transaction Placement Policy Look through all of the EEs to find an idle EE to execute the transaction submitted • Transaction Scheduling Policy Look through all of the EEs to find an idle EE to execute the blocked transaction whose arrival time is the earliest.

  42. Cost-based Approaches • Cost-based Approaches -3 • Low-Cost Oriented approach(CTC-LCOA) • Priority Value (PV) PV(t, s) = Cost(t,s) – Preemption Factor(t) • Cost(t,s)is the cost for executing t at host s. • If a transaction whose arrival time is later than that Of t is coordinated in prior to t, the Preemption Factor of t is increased by k • Transaction Placement Policy It is the same with CTC-TPOA • Transaction Scheduling Policy For the site where the EE has finished a transaction, look through all of the blocked transaction and coordinate the transaction whose PV is the lowest to the site Distribution of Active EEs is fixed in CTC-LCOA

  43. Related Work • Degree Multi-Programming (DMP) based Algorithm [ObjectStore, 1991] • Limit the Multi-Programming Level (number of the concurrent transactions) • Feedback based Algorithm • Throughput Feedback based Algorithm [VLDB, 1991] resource contention aided algorithm • Conflict Ratio Feedback based Adaptive Transaction Scheduling Algorithm [VLDB, 1992] data contention aided algorithm • Resource Contention • The current available resource does not satisfy the required resource • Data Contention • The excessive lock conflicts degrades the performance significantly

  44. Customers Districts 3,000 1+ History 1+ 10 Order-Line Order Warehouse 5-15 Stocks 100,000 0-1 New-Order 100,000 Items Evaluation (1) • TPC-C benchmark model • TPC-C is an Online Transaction Processing Benchmark • Database Scheme

  45. Evaluation (2) • Transaction Type • New-Order(n/a) • Payment(43%) • Order-Status(4%) • Delivery(4%) • Stock‐Level(4%) The measured throughput of New-Order (MQTH) is reported as performance result.

  46. Evaluation (3) • Testbed Coordinator Site 16 ×Execute Element Site Ethernet-100M bit Ultra5 Ultra5 Ultra5 …… Ultra5 Ultra5 Super-Sparc (400Mhz) Disk IBM DJNA (22G) Main Memory 128M

  47. Evaluation (4) • MQTH Result

  48. Evaluation (5) • Rate of Primary Accessed Pages

  49. Evaluation (6) • Distribution of Active EEs (MPL=32)

  50. T1,T2,T2,T2,T2,T3 Evaluation (7) • Why the distribution of Active Execute Elements in CTC-Static is unbalanced? T1→S1 T2→S2 T3→S3 Transaction Coordinator SCDF EE EE EE EE Execute Element EE EE EE EE EE EE Execute Element Execute Element Execute Element Execute Element EE Execute Element MPL=3 S1 S2 S3

More Related