1 / 20

OLTP on Hardware Islands

OLTP on Hardware Islands. Danica Porobic , Ippokratis Pandis*, Miguel Branco, Pınar Tözün , Anastasia Ailamaki. Data-Intensive Application and Systems Lab, EPFL *IBM Research - Almaden. Hardware topologies have changed. Core. Core. Core. high. Core. Core. S MP. (~100ns).

telyn
Download Presentation

OLTP on Hardware Islands

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OLTP on Hardware Islands Danica Porobic, Ippokratis Pandis*, Miguel Branco, Pınar Tözün, Anastasia Ailamaki Data-Intensive Application and Systems Lab, EPFL *IBM Research - Almaden

  2. Hardware topologies have changed Core Core Core high Core Core SMP (~100ns) Should we favor shared-nothing software configurations? low CMP (~10ns) variable SMP of CMP (10-100ns) Core Core Core Core Island Variable latencies affect performance & predictability

  3. The Dreaded Distributed Transactions • Shared-nothing OLTP configurations need to execute distributed transactions • Extra lock holding time • Extra sys calls for comm. • Extra logging • Extra 2PC logic/instructions • ++ • At least SE provide robust perf. • Some opportunities for constructive interference (e.g. Aether logging) Shared-Nothing Throughput Shared-Everything % Multisite Transactions Shared-nothing configurations not the panacea!

  4. Deploying OLTP on Hardware Islands • Shared-everything • Single address space • Easy to program • Random read-write accesses • Many synchronization points • Sensitive to latency • Shared-nothing • Thread-local accesses • Distributed transactions • Sensitive to skew HW + Workload -> Optimal OLTP configuration

  5. Outline Introduction Hardware Islands OLTP on Hardware Islands Conclusions and future work

  6. Multisocket multicores <10 cycles 50 cycles Core Core Core Core Core Core Core Core 500 cycles L1 L1 L1 L1 L1 L1 L1 L1 L2 L2 L2 L2 L2 L2 L2 L2 L3 L3 L1 Memory controller L3 Memory controller Inter-socket links Inter-socket links Inter-socket links Inter-socket links Communication latency varies significantly

  7. Placement of application threads Counter microbenchmark TPC-C Payment 8socket x 10cores 4socket x 6cores Unpredictable 39% 40% 47% ? ? ? ? ? ? ? ? ? ? ? ? OS Spread Island Islands-aware placement matters OS Spread Island

  8. Impact of sharing data among threads Counter microbenchmark TPC-C Payment – local-only 8socket x 10cores 4socket x 6cores Log scale 18.7x (vs 8x) 516.8x (vs 80x) 4.5x Fewer sharers lead to higher performance

  9. Outline Introduction Hardware Islands OLTP on Hardware Islands Experimental setup Read-only workloads Update workloads Impact of skew Conclusions and future work

  10. Experimental setup • Shore-MT • Top-of-the-line open source storage manager • Enabled shared-nothing capability • Multisocket servers • 4-socket, 6-core Intel Xeon E7530, 64GB RAM • 8-socket, 10-core Intel Xeon E7-L8867, 192GB RAM • Disabled hyper-threading • OS: Red Hat Enterprise Linux 6.2, kernel 2.6.32 • Profiler: Intel VTune Amplifier XE 2011 • Workloads: TPC-C, microbenchmarks

  11. Microbenchmark workload • Single-site version • Probe/update N rows from the local partition • Multi-site version • Probe/update 1 row from the local partition • Probe/update N-1 rows uniformly from any partition • Partitions may reside on the same instance • Input size: 10 000 rows/core

  12. Software System Configurations 1 Island Shared-everything 24 Islands Shared-nothing 4 Islands

  13. Increasing % of multisite xcts: reads No locks or latches Messaging overhead Contention for shared data Fewer msgs per xct Finer grained configurations are more sensitive to distributed transactions

  14. Where are the bottlenecks? Read case 4 Islands 10 rows Communication overhead dominates

  15. Increasing size of multisite xct: read case Physical contention More instances per transaction All instances involved in a transaction Communication costs rise until all instances are involved in every transaction

  16. Increasing % of multisite xcts: updates No latches 2 round of messages +Extra logging +Lock held longer Distributed update transactions are more expensive

  17. Where are the bottlenecks? Update case 4 Islands 10 rows Communication overhead dominates

  18. Increasing size of multisite xct: update case More instances per transaction Increased contention Efficient logging with Aether* Shared everything exposes constructive interference *R. Johnson, et al: Aether: a scalable approach to logging, VLDB 2010

  19. Effects of skewed input Few instances are highly loaded Larger instances can balance load Contention for hot data Still few hot instances 4 Islands effectively balance skew and contention

  20. OLTP systems on Hardware Islands Thank you! • Shared-everything: stable, but non-optimal • Shared-nothing: fast, but sensitive to workload • OLTP Islands: a robust, middle-ground • Runs on close cores • Small instances limits contention between threads • Few instances simplify partitioning • Future work: • Automatically choose and setup optimal configuration • Dynamically adjust to workload changes

More Related