1 / 25

ANALYZING STORAGE SYSTEM WORKLOADS

ANALYZING STORAGE SYSTEM WORKLOADS. Paul G. Sikalinda, Pieter S. Kritzinger {psikalin, psk}@cs.uct.ac.za, DNA Research Group Computer Science Department University of Cape Town, and Lourens O. Walters. Lourens.Walters@s1.com Mosaic Software Rondebosch Cape Town Republic of South Africa. 2.

hea
Download Presentation

ANALYZING STORAGE SYSTEM WORKLOADS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ANALYZING STORAGE SYSTEM WORKLOADS Paul G. Sikalinda, Pieter S. Kritzinger{psikalin, psk}@cs.uct.ac.za, DNA Research GroupComputer Science DepartmentUniversity of Cape Town, and Lourens O. Walters.Lourens.Walters@s1.comMosaic SoftwareRondeboschCape Town Republic of South Africa.

  2. 2 Presentation Outline Introduction Motivation and Objectives Storage Systems Storage System Workloads The Storage System Workload Analyzed Statistical Methodology Workload Analysis Results Conclusions Future Work

  3. 3 Introduction The DNA Group specializes, among other things, in using theory, formal methods and software tools in the: – specification of … – design of … – modelling of … – building of … – security of … – *workload analysis of … – correctness analysis of … – performance analysis of … concurrent computing systems (CCS).

  4. 4 Introduction(cont’d) ANALYZING STORAGE SYSTEM WORKLOADS

  5. 5 Introduction (cont’d) PROCESSOR RQ ANALYZING STORAGE SYSTEM WORKLOADS RP Start Address Operation Type Request Size Timestamps Etc. 5

  6. 6 Motivation and Objectives A lot of effort is being spent in improving the I/O subsystem because it is a bottleneck in current computer systems. -In design, performance and correctness evaluation of storage systems the workload modelling is an important component. Common assumption not correct: -Uniform distribution of start addresses, -Exponential inter-arrival times. Therefore storage system workload analysis should be done to come up with correct models.

  7. 7 Motivation and Objectives(cont’d) -Designing storage systems. -Designing I/O optimization techniques (read caching, write caching, pre-fetching, I/O parallelism, I/O rescheduling) to improve performance. -Understanding application behavior and requirements. -Deciding to pool storage system resources (SSPs). -Implementing intelligent storage systems. etc.

  8. 8 Motivation and Objectives(cont’d) Our aim was to analyze storage system workloads in terms of inter-arrival times, sizes and “seek distances” of I/O requests andprovide statistics for these parameters to be used to: (a) derive models for storage system evaluation and (b) design optimization techniques (read caching, I/O parallelism etc. )

  9. Path to host Host/Bus adapter Path to cache Cache Path to controller Array controller Path to disks Disk drives 9 Storage Systems Enterprise Storage System (ESS)

  10. 10 Storage Systems(cont’d) ESS are powerful disk storage systems with the following capabilities: -High performance*, -Large capacity and availability -Protection against physical drive failure can be provided using RAID methods. *But can not still match the processor speeds because of mechanical processes in the disk drives.

  11. Application Software I/O request Operating System File System I/O request Disk System 11 Storage System Workloads I/O Request Servicing and workload classification: -Logical Workloads (File System Workloads) -Storage System Workloads (Physical I/O Traffic)

  12. 12 Storage System Workloads (cont’d) Workload Parameters: -Logical Volume Number *Start Address (seek distances) *Request Size Operation Type (i.e., read or write) *Time Stamp (inter-arrival times)

  13. 13 The Storage System Workload Analyzed We analyzed inter-arrival times, request sizes, and ”seek distances” of I/O requests from a system running a web search enginedeviation. Got the I/O trace files from Storage Performance Council (SPC). (http://www.storageperformance.org)

  14. 14 Statistical Methodology Visual Techniques: -Histogram and -ECDF graphs. Key Data Statistics -Sample mean, -Variance and standard deviation, -Coefficient of skew, kurtosis, and variation, -Five number data summaries (minimum, lower quartile, median, upper quartile, maximum). -Lower and upper outlier limits

  15. 15 Results 1: inter-arrival times (µm)

  16. 16 Results 1: inter-arrival times -Highly variable data. Range (126, 100100 microseconds) -Coefficient of kurtosis shows that the distribution is heavy tailed.

  17. 17 Results 2: Request sizes (bytes)

  18. 18 Results 2: Request sizes Distribution peaks – 8192 (60%), 16384(10%), 24576 (9%) and 32768 (20%). Reason: OS Filesystem Block - 8192 bytes

  19. 19 Results 3: Seek distances (blocks)

  20. 20 Results 3: Seek distances -The distribution of seek distances is symmetrical.

  21. 21 Conclusions (1) Analyzing storage system workloads is necessary to properly model the workloads: To model Web inter-arrival time, Weibull, lognormal, beta, gamma, exponential probability density functions should be considered. To model Web data size and seek distance using probability mass function is more appropriate. *We intend to use the models in simulations of ESS.

  22. 22 Conclusions (cont’d) (2) The analysis results are useful when designing optimization techniques of storage system. E.g., -Cache management block size – 8192 bytes. -I/O rescheduling and background tasking would be ideal for the workload. -The storage system handling the workload we analyzed can be optimized to handle the symmetrical behavior*. *The results are not broadly applicable.

  23. 23 Conclusions (cont’d) (3) Other conclusions: -Request sizes influenced by filesystem in use. -Seek distances are not always uniform distributed. *In summary, we have provided statistics about the parameters for the storage system workload that we analyzed and have shown how we can use them to derive models and design I/O optimization techniques.

  24. 24 Future Work Rigorously find a probability density function matching a given data set of inter-arrival times. - Analyze the storage system workloads in terms of other parameters (e.g., logical volume numbers and operation types)

  25. 25 THANK YOU FOR YOUR ATTENTION! ?

More Related