Issues and challenges in the performance analysis of real disk arrays
Download
1 / 32

Issues and Challenges in the Performance Analysis of Real Disk Arrays - PowerPoint PPT Presentation


  • 79 Views
  • Uploaded on

Issues and Challenges in the Performance Analysis of Real Disk Arrays. Elizabeth Varki, Arif Merchant, Jianzhang Xu and Xiaozhou Qiu Presented by:Soumya Eachempati. RAID. Redundant Array of Inexpensive Disks Redundancy and Striping Capacity Speed (Parallelism)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Issues and Challenges in the Performance Analysis of Real Disk Arrays' - toni


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Issues and challenges in the performance analysis of real disk arrays

Issues and Challenges in the Performance Analysis of Real Disk Arrays

Elizabeth Varki, Arif Merchant, Jianzhang Xu and Xiaozhou Qiu

Presented by:Soumya Eachempati


Issues and challenges in the performance analysis of real disk arrays
RAID Disk Arrays

  • Redundant Array of Inexpensive Disks

  • Redundancy and Striping

    • Capacity

    • Speed (Parallelism)

    • Availability of storage systems

  • Multiple disks, large caches, array controllers - implement load-balancing, request coalescing, adaptive prefetching.


Why performance modeling
Why - Performance Modeling Disk Arrays

  • Storage systems are slow.

  • Any improvement in the performance would escalate the system performance for I/O intensive applications.

  • Metrics used:

    • Response time

    • Throughput

    • Queue length


Approaches
Approaches Disk Arrays

  • Analytic - quick and dirty, less expensive

  • Simulation - more accurate, lot of effort

  • So which method will you choose?

  • Authors develop analytic model that includes array controller optimizations


Previous work
Previous Work Disk Arrays

  • Disk array performance

  • Efficient Caching algorithms

  • This paper models the parallelism of disk arrays and effects of caching on performance.

  • Presents a Baseline model with known parameter values.

  • Extraction of array controller optimizations using the baseline model and workloads that isolate specific optimizations.




Key components
Key Components Disk Arrays


Some assumptions
Some Assumptions Disk Arrays

  • M jobs generate synchronous requests.

  • At-most one request from each stream at the disk array.

  • Each job spends some time at the CPU before submitting request to the disk array.

  • Read only and write only requests stream.

  • No queueing at the CPU/terminal.


Read model
Read Model Disk Arrays


Write model
Write Model Disk Arrays


Response time
Response time Disk Arrays

  • Array_response_time[m] = cache_response_time[m] + disks_response_time[m]

  • Array_throughput[m] = m / (CPU_delay + array_response_time[m])

  • Cache_queue[m] = cache_response_time[m] * array_throughput[m] (Little’s law)


Response time1
Response time Disk Arrays

  • Arrival theorem states that the number of jobs “seen” by a job upon arrival at a subsystem is equal to the mean queue length at the subsystem computed when the network has one less job.

  • The response time of the arriving job is equal to the sum of the arriving job’s service time plus the time required to service the jobs seen ahead.


Baseline model parameters
Baseline model parameters Disk Arrays

  • Disk Service time

  • Parallelism Overhead

  • Disk access probability

  • Cache parameters

  • Write model input parameters.


Disk service time
Disk Service time Disk Arrays

  • Very important to get an accurate estimate.

  • Seek distance - disk scheduling policy and location of disk IO requests.


Disk positioning time
Disk positioning time Disk Arrays

Degree of sequentiality has minimal effect for disk_queue > 3

Disk positioning time linear function of disk_queue length for disk_queue < 3 for simplicity.


Cyclic dependancy
Cyclic dependancy Disk Arrays

  • Disk service time depends on disk queue length which again depends on disk service time.

  • Assume disk queue length is an input parameter.

  • If CPU_delay = 0 then disk_queue = M * disk_access_probability.


Parameters
Parameters Disk Arrays

  • Parallel_overhead = max_position_time - disk_position_time

  • Disk_access_probability = cache_miss_probability * num_diskIOs_per_request / stripe_width

  • Cache_hit = 1 - (read_ahead_miss * re_reference_miss)

  • Cache_service_time = request_size / cache_transfer_rate.

  • Once destage_threshold is reached, writes data to all the disks at a time.

    • µ = stripe_width / 2 * disk_service_time


Array controller optimizations
Array controller Optimizations Disk Arrays

  • Access coalescing policy

  • Redundancy based load balancing policy

  • Adaptive prefetching

  • Simplest baseline model - small random read requests with CPU_delay = 0.



Setup
Setup Disk Arrays

  • Trace of all IO activity at the device driver level.

  • Trace contains

    • I/O submission and completion times.

    • Logical addresses

    • Sizes of requests


Optimizations
Optimizations Disk Arrays

  • Disk I/O accesses to contiguous data on the same disk coalesced into a single disk IO.

  • Example : 96KB striped across 3 disks with stripe unit size = 16KB.

  • Redundancy load balancing

    • Single request distributed to both the disk and its mirror. (transfer time reduction)

    • 2 different requests to the disks and its mirrors.

      (queue length reduction)


Policy effects
Policy effects Disk Arrays

  • Access coalescing and load balancing effect number of disk IO requests per request.

  • Large request sizes > stripe_width * stripe_unit_size

  • Num_diskIOs_per_request = stripe_width (transfer time reduction)

    = stripe_width / 2 for disk_queue_length reduction.


Access coalescing
Access Coalescing Disk Arrays

  • Random workloads - caching is eliminated.

  • Using large request sizes both the policies can be evaluated > stripe_width / 2

  • Adaptive prefetching: Sequential read requests - workload.

  • Explicit read-ahead is turned off.

  • Re-reference hit is zero.


Modeling adaptive prefetching and load balancing
Modeling Adaptive prefetching and load balancing Disk Arrays

  • Num_disIOs_per_request = stripe_width for request_size > stripe_width * stripe_unit_size

  • Ceiling(requests_size / stripe_unit_size)


Load balancing
Load Balancing Disk Arrays

  • Workload consists of same-sized requests.


Write back caching
Write-back caching Disk Arrays

  • Size at-most 2 stripes.

  • Large requests - bypass the cache.


Validation
Validation Disk Arrays

  • Model performs better for random than sequential Ios

  • Random Reads - 3.7%

  • Random writes - 5.5%

  • Sequential reads - 8.8%

  • Sequential writes - 8.0%

  • Adaptive prefetching works well for disk queue length < 4.

  • Disk writes and disk reads similar performance.


Challenges
Challenges Disk Arrays

  • Disk parameters disk queue length affects the caching policies as well.

  • Paucity of data.(Bus transfer rate 35% higher)

  • Performance as a function of disk array controller optimizations, array caching policies, workload distributions along with disk service time, the stripe unit size and stripe width.

  • Better Approximation of disk queue length.


Thank you
THANK YOU Disk Arrays

Questions?