Motivation and Objectives
Storage System Workloads
The Storage System Workload Analyzed
Workload Analysis Results
The DNA Group specializes, among other things, in using theory, formal methods and software tools in the:
– specification of …
– design of …
– modelling of …
– building of …
– security of …
– *workload analysis of …
– correctness analysis of …
– performance analysis of …
concurrent computing systems (CCS).
ANALYZING STORAGE SYSTEM WORKLOADS
ANALYZING STORAGE SYSTEM WORKLOADS
A lot of effort is being spent in improving the I/O subsystem because it is a bottleneck in current computer systems.
-In design, performance and correctness evaluation of storage systems the workload modelling is an important component.
Common assumption not correct:
-Uniform distribution of start addresses,
-Exponential inter-arrival times.
Therefore storage system workload analysis should be done to come up with correct models.
-Designing storage systems.
-Designing I/O optimization techniques (read caching, write caching, pre-fetching, I/O parallelism, I/O rescheduling) to improve performance.
-Understanding application behavior and requirements.
-Deciding to pool storage system resources (SSPs).
-Implementing intelligent storage systems.
Our aim was to analyze storage system workloads in terms of
“seek distances” of I/O requests
andprovide statistics for these parameters to be used to:
(a) derive models for storage system evaluation and
(b) design optimization techniques (read caching, I/O parallelism etc. )
Path to cache
Path to controller
Path to disks
Enterprise Storage System (ESS)
ESS are powerful disk storage systems with the following capabilities:
-Large capacity and availability
-Protection against physical drive failure can be provided using RAID methods.
*But can not still match the processor speeds because of mechanical processes in the disk drives.
I/O Request Servicing and workload classification:
-Logical Workloads (File System Workloads)
-Storage System Workloads (Physical I/O Traffic)
-Logical Volume Number
*Start Address (seek distances)
Operation Type (i.e., read or write)
*Time Stamp (inter-arrival times)
We analyzed inter-arrival times, request sizes, and ”seek distances” of I/O requests from a system running a web search enginedeviation.
Got the I/O trace files from Storage Performance Council (SPC). (http://www.storageperformance.org)
Key Data Statistics
-Variance and standard deviation,
-Coefficient of skew, kurtosis, and variation,
-Five number data summaries (minimum, lower quartile, median, upper quartile, maximum).
-Lower and upper outlier limits
-Highly variable data. Range (126, 100100 microseconds)
-Coefficient of kurtosis shows that the distribution is heavy tailed.
Distribution peaks – 8192 (60%), 16384(10%), 24576 (9%) and 32768 (20%).
OS Filesystem Block
- 8192 bytes
-The distribution of seek distances is symmetrical.
(1) Analyzing storage system workloads is necessary to properly model the workloads:
To model Web inter-arrival time, Weibull, lognormal, beta, gamma, exponential probability density functions should be considered.
To model Web data size and seek distance using probability mass function is more appropriate.
*We intend to use the models in simulations of ESS.
(2) The analysis results are useful when designing optimization techniques of storage system. E.g.,
-Cache management block size – 8192 bytes.
-I/O rescheduling and background tasking would be ideal for the workload.
-The storage system handling the workload we analyzed can be optimized to handle the symmetrical behavior*.
*The results are not broadly applicable.
(3) Other conclusions:
-Request sizes influenced by filesystem in use.
-Seek distances are not always uniform distributed.
*In summary, we have provided statistics about the parameters for the storage system workload that we analyzed and have shown how we can use them to derive models and design I/O optimization techniques.
Rigorously find a probability density function matching a given data set of inter-arrival times.
- Analyze the storage system workloads in terms of other parameters (e.g., logical volume numbers and operation types)