1 / 20

SLO-aware Hybrid Store

SLO-aware Hybrid Store. Priya Sehgal , Kaladhar Voruganti, Rajesh Sundaram 8 th March 2013 MSST 2012: http://storageconference.org/2012/Papers/21.Short.2.SLOAware.pdf. Introduction. What is SLO? Service Level Objective Specification of application requirements

sarai
Download Presentation

SLO-aware Hybrid Store

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SLO-aware Hybrid Store Priya Sehgal, Kaladhar Voruganti, Rajesh Sundaram 8th March 2013 MSST 2012: http://storageconference.org/2012/Papers/21.Short.2.SLOAware.pdf University Day 2013

  2. Introduction • What is SLO? • Service Level Objective • Specification of application requirements • Technology-independent • Examples • Performance : Average I/O latency, throughput • Capacity • Reliability • Security, etc. University Day 2013

  3. Problem and Motivation • Assumptions: • SLO = Latency (ms) • W1 and W2 working on different data sets • SSD tier used for read caching only • Problems: • SLO inversion • SLO violation • Sub-optimal SSD Utilization W1 on vol1, SLO=5ms W2 on vol2, SLO=10ms HyS1 SSD Array (Read Cache) HDD Array (Permanent Store) Hybrid Store Need: Bring SLO-awareness to SSD caching (read) in HyS University Day 2013

  4. Solution – Trailer Per-workload cache partitioning and dynamic cache sizing University Day 2013

  5. Architecture W1 Controller Communicate w/ upper layer Master Controller SSD size increase/decrease 4 W2 Controller 2 5 3 Wn Controller 6 6 Store Hybrid Store 1 SSD size SSD size Monitoring daemon Eviction Engine1 Eviction Engine2 SLO Stats Observed latency Expected Latency HyS SSD stats SSD hit% SSD Utilization Eviction Engine n University Day 2013

  6. Controller Design Space Dimensions Un-partitioned cache vs. Partitioned Static vs. Dynamic Partition Cache size decrement decision: max threshold vs. min-max range University Day 2013

  7. When to decrease the cache size? • Decrease partition size as soon as SLO is met • Oscillations due to unnecessary evictions • Decrease when observed latency goes below a minimum threshold • Range of operation (min and max threshold) • Helps stabilize cache size and avoid unnecessary evictions • Range varies depending upon the SLO and cache size • Percentage conformance University Day 2013

  8. Decrease Partition Size Problem Decrease in partition size leading to SLO violations University Day 2013

  9. Insights on range of operation University Day 2013

  10. a. SLO Target = 99 percentile b. SLO Target = 90 percentile c. SLO Target = 80 percentile d. SLO Target = 75 percentile University Day 2013

  11. Error-aware Feedback Controller (EAFC) Kpinc , Kpdec Controller (Kp * e) Kp : 0 or Kpinc or Kpdec Cwindow szold * (1+ kp * e) Eviction Engine szold * (1 - kp * e) SLO range Error (e) ∑ SSD utilization %, SSD hit % 75th percentile Observed Latency SLO violation Within SLO range N N e = lmin - lobs Y Y e = lmax- lobs e = 0 University Day 2013

  12. Test Description • Hybrid Store prototype • 1 workload per volume • 1 workload: All I/Os coming to a volume • HDD Space: 1TB • SSD Space: 160GB • RAM Size: 16 GB • Test1 – SPECsfs 2008-like: • No. of threads = 20 • Load/thread = 250 IOPS  5000 IOPS • Total dataset size = 600 GB • Target Latency = 8ms, 3ms University Day 2013

  13. SPECsfs 2008 (SLO target = 8ms) Amount of SSD read cache required for meeting 8ms latency = 15 GB University Day 2013

  14. SPECsfs 2008 (SLO target = 3ms) Amount of SSD read cache required for meeting 3ms latency = 35 GB (2.3x more than 8ms target latency) University Day 2013

  15. SPECsfs 2008 Tests • Filer Setup same as before • Test1: • No. of threads = 20 • Load/thread = 250 IOPS  5000 IOPS • Total dataset size = 600 GB • Target Latency = 8ms, 3ms • Test2: • No. of threads = 20 • Load/thread = varying from 100, 200, 250  2000, 4000 and 5000 IOPS (each run for 5 hrs) • Target latency = 3ms University Day 2013

  16. SPECsfs 2008 Test 2 (Varying loads) University Day 2013

  17. Conclusion • Insights • It is not necessary to cache whole WSS to meet certain latency targets • 75th percentile SLO conformance yields close to optimal SSD size meeting average SLO almost all the time • A cache sizer needs only few 100 history points to set the appropriate SSD size  light-weight • Objectives Met • SLO met close to 100% • With close to optimal amount of SSD • Improving SSD utilization • Without much computation and memory overheads University Day 2013

  18. University Day 2013

  19. Backup Slides University Day 2013

  20. References [1] J.Chang and G.S. Sohi, “Cooperative cache partitioning for chip multiprocessors”, In Proc. ICS’07, 2007. [2] S.Kim, D. Chandra, and Y.Solihin, ”Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture”, In Proc. PACT’04, 2004. [3] R.Lyer, “QoS: A framework for enabling qos in shared caches cmp platforms”, In Proc.ICS’04, pp.257-266, 2004. [4] H. S. Stone, J. Turek, and J.L. Wolf, “Optimal partitioning of cache memory”, IEEE Transactions on Computers., 41(9), 1992. [5] G. E. Suh et al., “Dynamic partitioning of shared cache memory”, In Journal of Supercomputing, 28(1), 2004. [6] M. K. Qureshi and Y.N. Patt, “Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches”, In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. [7] PawanGoyal, et al., “CacheCOW: QoS for Storage System Cache”, In Eleventh International Workshop on Quality of Service, 2003. University Day 2013

More Related