Optimal File Caching and Replication Policy Module for Data Grid Middleware

Shared Disk File Caching that Account for Delays in Space Reservation, Transfer and Processing PI: Ekow J. Otoo with Frank Olken Arie Shoshani Donghui Guo (Postdoc)

Goals: • To develop a policy advisory module (PAM) for a • coordinated optimal file caching and replication in • distributed data repositories • Efficient processing of file requests on distributed datasets • accessed through Storage Resource Managers (SRMs). Areas of Application: • Middleware components that manage storage and access data on Data Grids. • Example projects: Particle Physics Data Grid (PPDG) and • Earth Science Grid (ESG).

Tier 0 CERN Tier 1 FNAL RAL IN2P3 Univ. A Lab. B Tier 2 Departmental 1 2 3 N Desktop Application Context Example: Multi-tier Model of Dataset Distribution

Processing Node Data Accesses at a Single Site Multiple Clients Using a Shared Disk for Accessing Remote MSS Mass Storage System File requests Network Shared disk

Advisory objects Backward wake up messages: S3 notifies S1 of cached file S1 wakes up by Arrival S2 wakes up S1 S3 wakes up S2 S4 wakes up S1 Exception messages: S1 sends exceptions to S2 S2 send exception to S3 S1 sends exceptions to S4 Legend RQ – Request Queue MQ – Message Queue DRQ – Delayed Request Queue Advisory object Request for service object (wakes up server) Simulation Model of an SRM Replacement Server (S2) Caching Server (S3) MQ2 MQ3 RQ2 RQ3 stage/transfer request Caching request file was cached Processing Server (S4) MQ4 Admission Server (S1) MQ1 Arrival Process RQ4 RQ1 processing request DRQ start/stop Suspend/resume Administrative functions

Advisory objects Backward wake up messages: S3 notifies S1 of cached file S1 wakes up by Arrival S2 wakes up S1 S3 wakes up S2 S4 wakes up S1 Exception messages: S1 sends exceptions to S2 S2 send exception to S3 S1 sends exceptions to S4 Legend RQ – Request Queue MQ – Message Queue DRQ – Delayed Request Queue Advisory object Request for service object (wakes up server) Flow of Objects and Messages in the Model queue not full queue not full exception or wake up msg Replacement Server (S2) Caching Server (S3) MQ2 MQ3 RQ2 RQ3 stage/transfer request Caching request exception or wake up msg file was cached queue not full exception or wake up msg Processing Server (S4) MQ4 Admission Server (S1) MQ1 Arrival Process RQ4 RQ1 processing request DRQ start/stop Suspend/resume Administrative functions

Role of the Policy Advisory Module Two Principal Components • A disk cache replacement policy • Evaluates which files are to be replaced when • space is needed • Admission Policy for File Requests • Determines which request is to be processed next • E.g. may prefer to admit requests for files alreadyin Cache Work completed concerns disk cache replacement policies which we focus on next.

Main Results on Caching Policies • Popular caching algorithms, such as LRU, LRU-K are inappropriate • for disk caching over wide-area-network. • Remote access cost, transfer cost, and rate of requests impact • caching policies. • We have developed an optimal replacement policy is based on a • cost-beneficial function computed at time t0 as where K is number of backward references retained, is the cost of accessing a file and is the size of file at time • We proved analytically that this policy is optimal.

Main Results (Cont.) Two new practical implementations were developed: - Maximum Inter-arrival Time with K backward references (MIT-K) - Least Cost Beneficial with K backward references (LCB-K) • The main measure we targeted is “average cost-per-file reference” • because it is the important cost in a wide-area network • LCB-K and MIT-K are shown to give best metric measure of • average cost-per-file reference • Verified behavior of these practical algorithms by using workloads • of MSS file accesses from Jefferson’s Laboratory as well as • synthetic workloads

Comparison of Hit Ratios Replacement Policies: • RND: Random • LFU: Least Frequently Used • LRU: Least Recently Used • MIT-K: Maximum Inter-Arrival Time based on last K references • LCB-K: Least Cost Beneficial based on last K references

Comparison of Byte Hit Ratios

Comparison of Average Cost Per Reference

Scatter Plot of Sample of Generated Workload

Comparison of Average Cost Per Reference

Future Work • Complete the implementation of admission policy algorithms • Evaluate performance under different combinations of • admission and cache replacement policies • Model and evaluate the performance of multiple site SRMs • under different network configurations • Determine impact on policies of global information, • s opposed to only local information, on performance. • Extend model to include failure detection and recovery • Add PAM to SRM implementations

Comparison of Times to Evaluate Replacement

Optimal File Caching and Replication Policy Module for Data Grid Middleware

Optimal File Caching and Replication Policy Module for Data Grid Middleware

Presentation Transcript

File Systems and Disk Management

File Systems and Disk Management

File Systems and Disk Layout

Caching in Distributed File System

Shared Space

UNIX File and Directory Caching

Automated File Transfer and Storage Management Concepts for Space

GPFS: A Shared-Disk File System for Large Computing Clusters

File Systems and Disk Management

File Systems and Disk Management

Shared Space

Disk File Organization

File Systems and Disk Layout

File and Disk Maintenance

Caching Architectures and Graphics Processing

File and Disk Maintenance

MEMS and Caching for File Systems

Caching for File Systems

File Systems and Disk Management

Shared Space