1 / 14

File Caching with SSD Arrays

File Caching with SSD Arrays. Wei Yang. Motivation. We are curious No immediate needs, but future needs Caching (only) analysis job inputs SSD has limited write cycles Other goals, see the last slide File level caching Conventional LFU/LRU algorithms

cece
Download Presentation

File Caching with SSD Arrays

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File Caching with SSD Arrays Wei Yang US ATLAS Distributed Facility Workshop University of California, Santa Cruz

  2. Motivation • We are curious • No immediate needs, but future needs • Caching (only) analysis job inputs • SSD has limited write cycles • Other goals, see the last slide • File level caching • Conventional LFU/LRU algorithms • can not capture ATLAS analysis jobs data usage pattern (if there is such a pattern) • Sub-file level caching would be great! But book keeping is hard • We search for caching algorithm • Out-Bytes > In-Bytes under ATLAS workload • Use LRU, but based on days/weeks/months job usage pattern US ATLAS Distributed Facility Workshop University of California, Santa Cruz

  3. Setup 1: Caching based on File Access Frequency Cache miss! forward to HD storage Analysis jobs visit SSD cache first Fill the cache Xrootd monitoring stream • A table records access • Frequency of all files • Rotate columns to maintain • N days of records US ATLAS Distributed Facility Workshop University of California, Santa Cruz

  4. Setup 2: Caching based on Historic File Access Info Cache miss! forward to HD storage Analysis jobs visit SSD cache first Fill the cache Xrootd monitoring stream to UCSD collector • Record every file access as • event like info • save to ROOT files for later • analysis US ATLAS Distributed Facility Workshop University of California, Santa Cruz

  5. Hardware of the SSD Box • Dell 610 • 8-core 2.4 Ghz • 24GB • Intel dual X520 10Gb NIC • LSI SAS 9200-8e (support TRIM) • RHEL 6 x86_64 • Xrootd • SSD Array • Dell MD1220 • 12x OCZ Talos 960GB MLC SSDs, total ~11TB • Non-raid to support TRIM. • Xrootd take care to gluing them together as a single space US ATLAS Distributed Facility Workshop University of California, Santa Cruz

  6. 6-month plot as of 2012-11-12 File Access Freq. Alg. Net data sink, not cache 3-hour plot 2012-11-05 Cache brings in ~200GB/hour Sept 1 The box can deliver Can the caching algorithm deliver? Algorithm: Bytes-read/file size > 110% during the last 5 days, prioritized by this ratio and up to 200GB/hour Lack of jobs US ATLAS Distributed Facility Workshop University of California, Santa Cruz

  7. GB/hour from SSD + HDD GB/hour from SSD GB/hour to SSD Ceiling of 10Gb NIC Lost monitoring data from HDD Lack of jobs for the last 4 days UCSD collector dead US ATLAS Distributed Facility Workshop University of California, Santa Cruz

  8. Simulate the Cache with Historic Data For a given caching algorithm, what do we want to learn from those historic data? Day 0: Size of all files read Bytes read from SSD+HDD Bytes read from SSD Cache size required for day [-x, -1] day –n -n+1 -1 0 Cache size required for [-x+1, 0] - = New data to cache US ATLAS Distributed Facility Workshop University of California, Santa Cruz

  9. Algorithm: every files during the last N days. US ATLAS Distributed Facility Workshop University of California, Santa Cruz

  10. Algorithm: every files during the last N days Cache hit rate = Byte from SSD/Bytes from SSD+HDD US ATLAS Distributed Facility Workshop University of California, Santa Cruz

  11. Algorithm: Bytes-read/file size > 110% during the last 5 days US ATLAS Distributed Facility Workshop University of California, Santa Cruz

  12. Analyzing the Historic Data • Try to find a way to identify • data worth caching. • So far, not much success • Worth caching US ATLAS Distributed Facility Workshop University of California, Santa Cruz

  13. Do the jobs tend to open the same file in a short time window? • If some, we may not have a chance to cache • File that worth caching • Access time (open) scatter over several hours – cacheable • But “scattering over several hours” doesn’t mean the file worth caching US ATLAS Distributed Facility Workshop University of California, Santa Cruz

  14. Next Step • So far focusing on making it a good cache • More work to be done • Should also look at • Asking Panda for input files lists of coming jobs • Possibility of sub-file level caching • How much can the cache speed up analysis jobs? • All files are in SSD cache • Normal caching --- some files in SSD cache, some are not US ATLAS Distributed Facility Workshop University of California, Santa Cruz

More Related