1 / 42

Making Cloud Storage Provenance-Aware

Making Cloud Storage Provenance-Aware. Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer. Harvard School of Engineering and Applied Sciences. The Cloud. Next generation computing environment Cheap: Pay as you go Provision resources (storage, CPU) on a need basis

gale
Download Presentation

Making Cloud Storage Provenance-Aware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Cloud Storage Provenance-Aware Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

  2. The Cloud • Next generation computing environment • Cheap: Pay as you go • Provision resources (storage, CPU) on a need basis • Provides illusion of infinite resources • Companies with large batch oriented tasks can get results quickly • Cloud providers • Amazon Web Services (AWS) • Google AppEngine • Microsoft Azure Making a cloud Provenance-Aware - TaPP'09

  3. Provenance for the Cloud • As apps move to the cloud, so will the data • Amazon hosts scientific data for free • However, most cloud services are not designed to store provenance • Why Provenance? • Debug Application Results • Validate Data Sets • Improve Search Results • Regulatory Compliance Making a cloud Provenance-Aware - TaPP'09

  4. Provenance Properties • We identified the following properties • Read Correctness • Causal Ancestry Ordering • Queryable Making a cloud Provenance-Aware - TaPP'09

  5. Read Correctness • Data must be what is described by provenance • Provenance accurately describes the data object • Mechanisms • Atomicity: At storage time, both provenance and data should be stored or neither should be stored • Consistency: At retrieval time, data returned should be consistent with provenance Making a cloud Provenance-Aware - TaPP'09

  6. Causal Ancestry Ordering • The provenance and data of an ancestor object must be recorded in the provenance system • No dangling references Making a cloud Provenance-Aware - TaPP'09

  7. Efficient Query • Provenance must be accessible to users who want to verify properties of their data or simply be aware of its lineage • If provenance is not readily accessible, the provenance is of questionable value. Making a cloud Provenance-Aware - TaPP'09

  8. Goal • How do we design protocols around current cloud services such that these properties are satisfied? • Setting • Provenance-Aware Storage system (PASS) tracks and collects provenance • Primarily considered AWS • Used 3 services: S3, SimpleDB, SQS Making a cloud Provenance-Aware - TaPP'09

  9. Outline • Introduction • PASS Background • Protocol 1: Standalone S3 • Protocol 2: S3 + SimpleDB • Protocol 3: S3 + SimpleDB + SQS • Analysis • Conclusion and Status Making a cloud Provenance-Aware - TaPP'09

  10. Provenance-Aware Storage System • Observes system calls that applications make and captures relationships between objects • P: read A • Generates record: P depends on A • Cache the record • P: write B • Generates record: B depends on P • Store both ‘B depends on P’ and ‘P depends on A’ • Mirrors data locally and caches provenance till we need to send it to AWS Making a cloud Provenance-Aware - TaPP'09

  11. Outline • Introduction • PASS Background • Protocol 1: Standalone S3 • Protocol 2: S3 + SimpleDB • Protocol 3: S3 + SimpleDB + SQS • Analysis • Conclusion and Status Making a cloud Provenance-Aware - TaPP'09

  12. Simple Storage Service (S3) • Object Store: sizes from 1byte to 5GB • Object’s identified by URI • SOAP or REST interface • Operations: • PUT, GET, HEAD, COPY, DELETE • PUT: store an object and its metadata (2KB limit) • HEAD: retrieves metadata of an object • Cost: data storage + bandwidth + num ops • Eventual consistency Making a cloud Provenance-Aware - TaPP'09

  13. Application PASS S3 Architecture 1: Standalone S3 User System Prov+Data Making a cloud Provenance-Aware - TaPP'09

  14. PUT:(Prov >1KB) OK S3 Protocol 1: Standalone S3 PASS Making a cloud Provenance-Aware - TaPP'09

  15. PUT:Data OK S3 Protocol 1: Standalone S3 PASS Making a cloud Provenance-Aware - TaPP'09

  16. Properties Making a cloud Provenance-Aware - TaPP'09

  17. Outline • Introduction • PASS Background • Protocol 1: Standalone S3 • Protocol 2: S3 + SimpleDB • Protocol 3: S3 + SimpleDB + SQS • Analysis • Conclusion and Status Making a cloud Provenance-Aware - TaPP'09

  18. SimpleDB • Service providing database functionality • Data model: items described by attribute-value pairs • 256 attrs maximum, name/value < 1KB • Operations: PutAttributes, Query, QueryWithAttributes, and SELECT • Query returns items • QueryWithAttributes returns both items and attributes • Cost: bandwidth + storage + num ops + machine hrs Making a cloud Provenance-Aware - TaPP'09

  19. Application PASS Architecture 2: S3 + SimpleDB User System Data Prov SimpleDB S3 Making a cloud Provenance-Aware - TaPP'09

  20. PUT:(rec > 1KB) OK PutAttrs+ OK SimpleDB S3 Protocol 2: S3 + SimpleDB PASS Making a cloud Provenance-Aware - TaPP'09

  21. PUT:Data OK SimpleDB S3 Protocol 2: S3 + SimpleDB PASS Making a cloud Provenance-Aware - TaPP'09

  22. Properties Making a cloud Provenance-Aware - TaPP'09

  23. Outline • Introduction • PASS Background • Protocol 1: Standalone S3 • Protocol 2: S3 + SimpleDB • Protocol 3: S3 + SimpleDB + SQS • Analysis • Conclusion and Status Making a cloud Provenance-Aware - TaPP'09

  24. Simple Queuing Service (SQS) • Distributed Messaging System • Queues are identified by URL • Operations: SendMessage, ReceiveMessage, DeleteMessage • VisibilityTimeout: • Message will not be available for x seconds after a ReceiveMessage • Limits: 8KB message size, max 10 msgs can be received Making a cloud Provenance-Aware - TaPP'09

  25. Application PASS Architecture 3: S3 + SimpleDB + SQS User System Queue1 Prov Data SimpleDB S3 Making a cloud Provenance-Aware - TaPP'09

  26. PUT: Temp copy OK COPY OK SndMsg+ OK PutAttrs+ OK RecvMsg+ SimpleDB S3 Protocol 3: S3 + SimpleDB + SQS PASS Commitd SQS Making a cloud Provenance-Aware - TaPP'09

  27. DEL:CPY OK DelMsg+ SimpleDB S3 Protocol 3: S3 + SimpleDB + SQS PASS Commitd SQS Making a cloud Provenance-Aware - TaPP'09

  28. Idempotency • SimpleDB, S3, and SQS are idempotent • If a commit daemon crashes, comes back up and processes a transaction again, there will not be errors Making a cloud Provenance-Aware - TaPP'09

  29. Properties Making a cloud Provenance-Aware - TaPP'09

  30. Outline • Introduction • PASS Background • Protocol 1: Standalone S3 • Protocol 2: S3 + SimpleDB • Protocol 3: S3 + SimpleDB + SQS • Analysis • Conclusion and Status Making a cloud Provenance-Aware - TaPP'09

  31. Analysis • Extracted provenance by running three workloads • Linux compile • Blast • Provenance challenge • Compute cost to store and query provenance • Number of ops • Bandwidth Making a cloud Provenance-Aware - TaPP'09

  32. Storage Cost P1 = S3 P2 = S3 + SimpleDB P3 = S3 + SimpleDB + SQS Making a cloud Provenance-Aware - TaPP'09

  33. Query Cost • Dump the provenance of a given object • Ran it on all objects for statistical significance • Find all the files that were outputs of blast. • Find all the descendants of files derived from blast. Making a cloud Provenance-Aware - TaPP'09

  34. Query results Making a cloud Provenance-Aware - TaPP'09

  35. Outline • Introduction • PASS Background • Protocol 1: Standalone S3 • Protocol 2: S3 + SimpleDB • Protocol 3: S3 + SimpleDB + SQS • Analysis • Conclusion and Status Making a cloud Provenance-Aware - TaPP'09

  36. Conclusions • Identified the properties that need to be satisfied for storing provenance in the cloud • Presented various protocols for storing provenance and data on the cloud • Costs of storing provenance is reasonable Making a cloud Provenance-Aware - TaPP'09

  37. Status • System almost ready • Plan to submit it to Symposium on Operating Systems Principles (SOSP) • Really hard to drive up the cost • Jan Bill = $1.95 • Feb Bill = $9.38 Making a cloud Provenance-Aware - TaPP'09

  38. Extra Making a cloud Provenance-Aware - TaPP'09

  39. Protocol 1: Standalone S3 On file close: • Convert the provenance into attribute-value pairs as required by S3 • If (sizeof(record) > 1KB) • Store the record in a separate S3 object • Replace attribute-value pair with pointer to this object • Upload the file using PUT: • Arguments: object, attributes Making a cloud Provenance-Aware - TaPP'09

  40. Protocol 2: S3 + SimpleDB On file close: • Convert the provenance into attribute-value pairs as required by SimpleDB • Additonal record: md5sum of (file contents + version) • If (sizeof(record) > 1KB) • Store the record in a separate S3 object • Replace attribute-value pair with pointer to this object • Issue PutAttributes: store the provenance • One item per version (= One PutAttributes) per version of the object • Upload the file to S3 using PUT Making a cloud Provenance-Aware - TaPP'09

  41. Protocol 3: S3 + SimpleDB + SQS (I) • Log phase: Log data on a queue • Store a copy of the file in a temporary location on S3 • Allocate a transaction id (uuid) • Split provenance into chunks of 8KB and enqueue them on an SQS queue • Tag each message with the transaction ID • One additional record that has a pointer to the temp S3 object Making a cloud Provenance-Aware - TaPP'09

  42. Protocol 3: S3 + SimpleDB + SQS (II) • Commit phase: move data from SQS to S3 and SimpleDB • ReceiveMessage: get messages from the queue and assemble the packets • Store the provenance in SimpleDB using PutAttributes call • Take care of overflows • Execute an S3 COPY and copy the object from its temporary location to permanent • Delete Messages from SQS • Delete temporary file copy Making a cloud Provenance-Aware - TaPP'09

More Related