slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Distributed Archival Caching System More flexible than a traditional stand-alone archiver PowerPoint Presentation
Download Presentation
Distributed Archival Caching System More flexible than a traditional stand-alone archiver

Loading in 2 Seconds...

play fullscreen
1 / 9

Distributed Archival Caching System More flexible than a traditional stand-alone archiver - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

SLASH Scalable Lightweight Archival Storage Hierarchy Paul Nowoczynski Pittsburgh Supercomputing Center pauln@psc.edu. Distributed Archival Caching System More flexible than a traditional stand-alone archiver Attach archiver to Lemieux via quadrics Incorporate off-the-shelf hardware into mix

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Distributed Archival Caching System More flexible than a traditional stand-alone archiver' - miach


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
SLASHScalable Lightweight Archival Storage HierarchyPaul NowoczynskiPittsburgh Supercomputing Center pauln@psc.edu
slide2
Distributed Archival Caching System
    • More flexible than a traditional stand-alone archiver
      • Attach archiver to Lemieux via quadrics
      • Incorporate off-the-shelf hardware into mix
      • Minimize tape reads
  • Completely user-mode, all I/O’s into the archiver namespace are intercepted by a client-side library.
  • Allows parallel I/O into separate systems while maintaining everything in a single namespace.
  • Cooperative Cache – nodes may read data from one another
  • No database needed
slide3
Caveats
    • Works best with put/get requests
      • Edits or files opened without O_TRUNC are problematic because cache filesystems are independent
    • Not a parallel filesystem
      • Multiple nodes cannot write same file simultaneously
  • Features
    • Portability – no kernel modifications needed.
    • Archiver system can be modularly expanded without large investment
    • Metadata is held within the tree - not in a separate database
    • Convenient for monitoring and logging of I/O operations
    • Data pre-staging with coordination of system scheduler
    • Intelligent data migration to minimize thrashing of archiver disk
    • MD5 sum “receipts” for uploaded files
    • Good at automated file migration
archiver task offloading
Archiver Task Offloading
  • Several Archival Duties may be handled by Slash allowing for greater flexibility and increased performance of the archival system
    • File Caching
      • Data cache management handled by low cost Slash Cache Nodes
    • Data Applications
      • Run on cache nodes instead of archiver cpus.
    • Namespace Exporting
      • Namespace may be owned by an external metadata controller.
    • In the near Future..
      • Will be able to control the millions of archived small files which currently plague DMF
      • Slash will own disk MSPs for various sized files – everything will be treated as an object store (caches, DMF, slash MSPs)
how it works
How it works
  • Intercepted Read I/Os (i.e. open()) contain RPCs which determine cache status and location of data.
    • When possible, data is read directly from the cooperative instead of from DMF/HPSS.
  • Write I/Os are handled locally by cache node and asynchronously migrated to the archiver at a convenient time.
slash write

Open(“foo”) O_WRONLY

Open(“foo”) RPC

1

2

Tag file with slash attributes

Open & Write to local file..on close() register with migration service

3

RPC REPLY

5

4

Slash Write

Cache Node

Metadata

Server

Write()

… New data is written to cache then sent to archiver at some later time ( after close() )

… Multi-threaded and multi-process opens of the same file are supported

slide7

Open(“foo”) O_RDONLY

Open(“foo”) RPC

1

2

Get slash attributes

Open file referred by the server

3

5

RPC REPLY

4

Slash Read

Cache Nodes

Metadata

Server

Read()

… Requested data may reside on local cache, remote cache or archiver

… RPC requests cache status/location info from metadata server

slash benchmark
Read and Write Throughput from Supercomputer to Slash Archiver via TCSIO/Quadrics

Type[123] are represent different cache node hardware architectures

This slide provides baseline measurements

The test reads and writes 10 gigabyte data files from the supercomputer

No disk was involved on the supercomputer

Slash Benchmark

Raw Tcsio / QSW Performance (no disk I/O)

Tcsio/Slash Write Bandwidth (all I/O is written to local cache node disk)

Tcsio/Slash Read Bandwidth

cached vs non cached i o
Upload and download between archiver namespace and supercomputer

Up to six cache nodes were used

4 SLCNS for 4 streams

6 SLCNS for 8 and 16 streams

Blue line represents I/O directly into the archiver’s disk

Scalability of blue is poor due to concatenation of disk volume group.

Uploads favored faster Cache nodes

Type 1 Cache nodes handled more streams

Explains performance drop for 16 readers since the uploaded data was used for the read test.

Data shows that system scales as nodes are added

Cached vs. Non-cached I/O

Cached vs. Non-Cached Writes

Cached vs. Non-Cached Reads