Silo Can we implement our own Amazon Glacier API?

SiloCan we implement our own Amazon Glacier API? Author: Steven Murray (steven.murray@cern.ch)

Scope and contents of this talk What is the scope and contents • Why think about Silo? • Amazon Glacier • Implementing the Amazon Glacier API • Mapping CASTOR concepts to Amazon Glacier What is NOT the scope • How our own private Glacier instance would fit within the catalogue of DSS services Silo - 2

Why think about Silo? • Silo is currently vaporware so it can do anything and does not crash • Tape drives need their own staging area • The storage and processing of tape metadata should be separate from disk-cache metadata • A system focusing on tape storage is easier to make efficient than a system focusing on both a disk-cache and tape storage Silo - 3

A disk cache is not a staging area Once a file is uploaded to a staging area the user can no longer access it. Once a file has been downloaded from a staging area it will be deleted as soon as possible. Silo - 4

Why a staging area? • Avoid the shoe-shining of tapes • Hold back enough data to warrant a tape mount • Disk is multi-stream, tape at CERN is not • There are thousands of disk-cache hard drives and around a hundred tape drives • If disk cannot supply an individual file to tape at tape-drive speed then a disk staging-area is required to allow multiple disk streams to write files concurrently and then give mutually exclusive access to tape • Likewise if a disk cannot sink an individual file from tape at tape-drive speed… • A multilane staging can prevent one or more inefficient users from hogging a tape drive by allowing efficient users to overtake Silo - 5

How do we stage today? • The CASTOR disk cache acts as both a disk cache and a tape staging area • CASTOR gives priority disk to tape streams • CASTOR does not give priority to tape to disk streams • CASTOR does not explicitly limit the number of tape streams per disk Bandwidth is limited by the network More streams = less bandwidth per stream Disk Disk to disk Lower priority Disk Disk to tape Higher priority Disk Disk to disk Lower priority Disk Tape to disk Lower priority Silo - 6

How big could a staging area be? • Worst case “back of an envelope” calculation • Assume a simple scheduling algorithm • 2 spindles per tape drive for Pedaling Whilst the client writes to one spindle the tape drive reads from the other and vice versa • × 2 spindles for Raid 0 It will take two spindles in raid 0 to saturate a tape drive • × 2 spindles for Two data lanes Inefficient users should not hog tape drives, efficient users should be able to overtake inefficient users • Assume 80 tape drives always running in parallel • Total number of spindles = 80 × 8 = 640 • Assume 3 Terabyte drives • Size of stager area = 640 × 3 T ≈ 2 Petabytes Silo - 7

What is Amazon Glacier? • A cold storage data archival solution • A REST-based web service • Clients GET, POST, PUT and DELETE resources identified by URLs • Has four types of resources: • Archive – a user file with a unique ID generated by Glacier • Vault – a user named container of archives • Job – either take an inventory listing of a vault or retrieve an archive • Notification-configuration – jobs can take up to 4 hours to complete • The Amazon Glacier API is a potential wrapper to encapsulate all of the tape software including a tape staging area • If we wrapped our tape software within the Amazon Glacier API then other tiers could replace us with the real Amazon Glacier Silo - 8

Amazon Glacier Namespace • The Amazon Glacier namespace is simpler than S3 • Users create named vaults • Users cannot name their archives • Amazon Glacier generates the IDs of archives • Users can give a once off description of an archive in no more than 1024 characters • A vault inventory can be taken and it includes giving back the 1024 character description of each archive • Users specify access control to their vaults via the Amazon Identity Access Management (IAM) service • Users cannot specify quality of service or quotas Silo - 9

Simplifications implied by theAmazon Glacier namespace – part 1 • Users cannot update files • Amazon Glacier always returns a new ID for an uploaded archive • A user cannot update an archive once it is uploaded • A user cannot recreate an archive with the same ID • Users cannot specify ACLs at a finer granularity than a vault because they cannot specify any pattern matching expressions to identify a subset of the archives within a vault Silo - 10

Simplifications implied by theAmazon Glacier namespace – part 2 • Users cannot request inventory jobs of finer granularity than a vault • There is only one level of inheritance for ACLs: from vault to archive • There is no nesting of vaults and therefore no nesting of ACLs Silo - 11

Buzzword mayhem I am going to talk about S3, so.. • Amazon Glacier • An archiveis a file stored in Amazon Glacier • An archive has an ID generated by Amazon Glacier • A vault is a container of archives • A vault has a name given by the user • Amazon S3 • An object is a file stored in Amazon S3 • An object has a name given by the user • A bucket is a container of objects • A bucket has a name given by the user Silo - 12

How to access Amazon Glacier? • Directly through the Amazon Glacier API • In my opinion this is not targeted at end users but rather end-user applications that can remember the generated archive IDs • Indirectly through the Amazon S3 API • In my opinion this API as seen through client applications is intended for direct usage by end users Silo - 13

Direct access to Amazon Glacier • Users specify the name of the vault • Users have to remember the opaque IDs of their archives • Users can request the inventory of a vault • A vault inventory can take up to 4 hours • A vault inventory is meant for disaster recovery and infrequent namespace reconciliation Silo - 14

Indirect access to Amazon Glacier via Amazon S3 • Users can define lifecycle rules that transition objects to the Glacier storage class • Users can list files in real time • Users cannot access or delete S3 objects with the Glacier storage class using the Amazon Glacier API • Users are not tempted to use the Amazon Glacier API • Users cannot specify or see the destination vault • Users do not see the archive IDs Silo - 15

S3 and bucket lifecycles • A bucket lifecycle can have up to 1000 rules • A rule has • Either a relative or absolute time • A prefix used to identify a group of objects by name • An action: either a storage class transition or an expiration • Amazon S3 has three storage classes • Standard • Reduced redundancy • Glacier • Transitions to the Glacier storage class are one way • Users do not see vault names or archive IDs • Files are temporarily restored from Glacier in order to be accessed through S3 • An expiration deletes objects whether they are archived in Amazon Glacier or not Silo - 16

Retrieval paradigms • Retrieval from Amazon Glacier • Job oriented • Client creates a retrieval job • Client queries job status • Client downloads output of completed job • Retrieval from Amazon Glacier via S3 • Object (file) oriented • Client creates a restore job • Client polls the properties of the object • Client downloads the object once it’s contents are available Silo - 17

Storing a file <= 100MB to Glacier Client Glacier Account ID can be a dash ‘-’ POST /ACCOUNT_ID/vaults/VAULT_NAME/archives File contents x-amz-archive-id: ARCHIVE_ID Major issue • Amazon Glacier does not specify how to redirect • HTTP Expect 100-Continue is NOT sent by the Amazon Java SDK • Further investigations could include: • Section 8.2.4 of RFC 2068 Hypertext Transfer Protocol -- HTTP/1.1 • Client Behaviour if Server Prematurely Closes Connection Silo - 18

Subliminal message! • The Amazon Glacier API assumes the client file is safe once it has been uploaded • Amazon can make this assumption because an uploaded file is stored in three separate storage centers • To keep Silo simple here at CERN we will not be able to make the same assumption • We will have to modify the Amazon Glacier API to address this issue Silo - 19

Storing a file > 100MB to Glacier Client Account ID Glacier POST /-/vaults/VAULT_NAME/multipart-uploads x-amz-multipart-upload-id: MULTIPART_ID Byte ranges can come out of order and in parallel For each part: PUT /-/vaults/VAULT_NAME/multipart-uploads/MULTIPART_ID Content-Range: bytes 0-16777215/* Expect: 100-continue Part of file Body (part of body) sent if told to continue Redirection can be applied here POST /-/vaults/VAULT_NAME/multipart-uploads/MULTIPART_ID x-amz-archive-id: ARCHIVE_ID Silo - 20

Retrieving a file from Glacier Client Glacier POST /-/vaults/VAULT_NAME/jobs x-amz-job-id: JOB_ID Loop until job output is ready for download: GET /-/vaults/VAULT_NAME/jobs/JOB_ID Description GET /-/vaults/VAULT_NAME/jobs/JOB_ID/output Description File contents Silo - 21

Storing a file to Glacier via S3 Client S3 PUT /ObjectName Host: BucketName.s3.amazonaws.com File contents 200 OK PUT /?lifecycle Host: BucketName.s3.amazonaws.com • S3 explicitly supports the “Expect: 100-continue” • S3 has 3 storage classes: standard, reduced redundancy and Glacier • An S3 object must be transitioned to the Glacier storage class Silo - 22

Retrieving a file from Glacier via S3 Client S3 POST /ObjectName?restore Host: BucketName.s3.amazonaws.com File contents 202 Accepted HEAD /ObjectName Host: BucketName.s3.amazonaws.com x-amz-restore: STATUS OF RESTORE GET /ObjectName Host: BucketName.s3.amazonaws.com File contents Silo - 23

Amazon Glacier and security • Signing • Client and Amazon Glacier share a secret key • Client authenticates messages by signing them • String to sign = name of hash algorithm + request date + credential scope + canonical form of message header • Signature = Hash-based message authentication code generated from string to sign + secret key + credential scope • Message body sent without encryption • HTTPS • Both header and body encrypted Silo - 24

Amazon Glacier clients • Amazon does NOT provide a client application • Amazon provides an SDK for the following platforms, but… • Android (but Glacier is NOT supported • iOS (but Glacier is NOT supported) • Java (Glacier IS supported) • .NET (Glacier IS supported) • Node.js (Glacier IS supported) • PHP (Glacier IS supported) • Ruby (Glacier IS supported) • No support from Amazon for a C/C++ client Silo - 25

Why CMake? • Because everybody else (EOS) is using it • Has a simple but powerful language • Calculates C/C++ header file dependencies itself – no other tool required • Explicitly covers: • Configuration • Building • Installing • Packaging Silo - 26

Cmake syntax • Commands • Flow control • Regular expressions CMakeLists.txt # Split the silo library source files into test and non-test files file (GLOB SILO_LIB_SRC_FILES_ALL silo/*.cpp silo/exception/*.cpp silo/parser/*.cpp silo/utils/*.cpp) foreach (SRC_FILE ${SILO_LIB_SRC_FILES_ALL}) get_filename_component (SRC_NAME ${SRC_FILE} NAME) if (${SRC_NAME} MATCHES ".*Test.cpp$") set (SILO_LIB_SRC_FILES_TST ${SILO_LIB_SRC_FILES_TST} ${SRC_FILE}) else () set (SILO_LIB_SRC_FILES_NTST ${SILO_LIB_SRC_FILES_NTST} ${SRC_FILE}) endif () endforeach () Silo - 27

CMake configuration step CMakeLists.txt set (SILO_VERSION_MAJOR 1) set (SILO_VERSION_MINOR 0) ... configure_file ( "${CMAKE_CURRENT_SOURCE_DIR}/silo/version.hpp.in" "${CMAKE_CURRENT_SOURCE_DIR}/silo/version.hpp" @ONLY) version.hpp.in namespace silo { const uint32_t c_versionMajor = @SILO_VERSION_MAJOR@; const uint32_t c_versionMinor = @SILO_VERSION_MINOR@; } // namespace silo Silo - 28

Why the Apache HTTP server? • Mature technology • Decided to write an Apache module in C++ because of performance and group knowledge • Features of interest • Concrete resource management (files, memory, etc.) through the use of resource pools attached to the lifespans of the server, connections and requests • Database connection cache via the DBD module • Hides chunked encoding • API support for adding module specific configuration directives • Transparently provides HTTPS via the SSL module • Bucket brigades (avoids copying memory) Silo - 29

Wrapping the Apache HTTP server • Reduce dependency on Apache HTTP server API • Provide seams for CppUnit Silo - 30

Resource oriented dispatcher Only after the dispatch logic has either returned a HTTP response object or thrown an exception does the silo code start to construct the actual response message for the client switch(m_r->method_number) { case M_DELETE: return resource.httpDelete(); case M_GET: return resource.httpGet(); case M_POST: return resource.httpPost(); case M_PUT: return resource.httpPut(); default: throw exception::BadRequest(EXCEPTION_LOCALE, std::string("Unexpected HTTP method: ") + m_r->method); } Silo - 31

CppUnit • 51 implementation classes (excluding test and mock) • 93 Unit tests • 23 Test classes • 5 Mock classes • MockCatalogue • MockHttpInputStream • MockLog • MockResource • MockTmpFileFactory The most complex test is the mock upload of a file from the client to the local disk of the httpd daemon Silo - 32

The Silo vaporware prototype A Silo prototype would be 90% CASTOR Central server private to Silo Simplified NS VDQM Non-CASTOR Scheduler VMGR Tape server Staging area and client interface rfiod rtcpd taped Apache httpd Read / write Glacier API Read / write rmcd Read / write Disk server module Disk Bridge Fork and exec readtp writetp Transfer requests Silo - 33

CASTOR file classes • Tape storage-class • Specifies the number of copies to be stored on tape • A file has one and only one file class • Created by power users • Standard users tag their files with file classes Silo - 34

CASTOR tape pools • Simply a list of tapes (can span multiple libraries) • Created by tape operations • Used to control collocation • Used to store different copies in different buildings • Used to direct small files to the most suitable tape drives Silo - 35

CASTOR migration routes • Decide the destination tape pool of a file based on the file’s: • File class • Copy number • File size (big or small) • Created by tape operators • Decouples file classes from tape pools Silo - 36

Migration routes within Silo? CASTOR Silo Tape pool Tape pool Tape pool Users specify in the namespace of the disk cache the vault they wish to store their files to Silo stores the migration routes from of files based on vault, copy number, and file size (big or small) The namespace of the disk cache must remember the archive ID of file stored in Silo File class = Vault Silo - 37

Conclusions and future – part 1 • Silo does not exist besides code developed to investigate parts of the Amazon Glacier API • A Silo prototype would nearly be a complete CASTOR system • Implementing the Amazon Glacier API as an Apache HTTP server module requires a lot of manpower • The Amazon Glacier API is the right direction because it enforces a simple namespace and non real-time downloads, but • It does not include the the Amazon Identity and Access Management (IAM) API • It does not explicitly specify redirection for file uploads • It does not specify how we manage our tape pools • It assumes a staging area that is 100% safe which we cannot • There is no official C/C++ client library Silo - 38

Conclusions and future – part 2 • In my opinion we need to replace HTTP and modify the Amazon Glacier API so that it no longer relies on the staging area being 100% reliable • We will have to write our own client library • In my opinion the idea of Amazon to separate the Amazon IAM API from the Amazon Glacier API is a good one, but we still need to implement this separate access control / account management module • Architecture meetings will now take place every Wednesday Silo - 39

Silo Can we implement our own Amazon Glacier API?