1 / 33

Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10)). Kiran -Kumar Muniswamy -Reddy, Peter Macko , and Margo Seltzer Harvard School of Engineering and Applied Sciences. Outline. Introduction Background Provenance System Property Architecture & Protocol

shelley
Download Presentation

Provenance for the Cloud (USENIX Conference on File and Storage Technologies(FAST `10))

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Provenance for the Cloud(USENIXConference on File and Storage Technologies(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard School of Engineering and Applied Sciences

  2. Outline • Introduction • Background • Provenance System Property • Architecture & Protocol • Evaluation • Conclusion & Comment

  3. Introduction • Problem to Solve • Implement a provenance aware storage system in current cloud stores ( use Amazon )

  4. Background(1/3) • Provenance • Data has two critical components • What it is ( contents ) • Where it came from ( ancestry ) • The provenance is the description of how the object was derived. • The metadata that describes the history of an object • Why use provenance? • Use case – Slogan Digital Sky Survey (SDSS) • Debug Experimental Results • Detect and Avoid Faulty Data Propagation • Improving Text Search Result • Security

  5. Background(2/3) • Provenance can be abstract defined as a directed acyclic graph ( DAG ) • Nodes • objects : files, processes, tuples, data sets, etc • Have attributes • Command line arguments • Name and Version number • Edges • Indicate a dependency between the objects

  6. I3 Patient Brain Death Notification is based on I4 I5 I1 Donor Data Request I8 Decision Request Donation Decision Donor Data I9 Data Collection Request is caused by is response to is response to is based on is caused by is justified by I7 I6 I2 Blood Test Request Blood Test Result is based on Blood Test Request Justification Report is response to is caused by

  7. Background(3/3) • Eventual Consistency • A weaker form of data consistency • During a sufficient long period of time, and no updates are sent, we can expect that all replicas in system will be consistent

  8. Provenance System Property(1/2) • Provenance Data Coupling • An object and its provenance must match • The provenance must accurately and completely describe the data • Multi-object Causal Ordering • The causal relationship among objects • A system must ensure that an object’s ancestors and their provenance are persistent before making the object itself persistent

  9. I3 is based on Patient Brain Death Notification I4 I5 I1 Donor Data Request I8 Decision Request Donation Decision is based on Donor Data I9 Data Collection Request is caused by is response to is response to is based on is caused by is justified by I7 I6 I2 Blood Test Request Blood Test Result Blood Test Request Justification Report is response to is caused by

  10. Provenance System Property(2/2) • Data Independent Persistence • Ensure a system retain an object’s provenance, even if the object is removed • Efficient Query • Be accessible to users who want to access or verify provenance properties of their data

  11. Architecture(1)

  12. Architecture(2) – S3 • Simple Storage Service(S3) • Amazon’s storage service • An object store where the size of objects can range from 1 byte to 5GB • With each objects, clients can store up to 2KBof metadata • Use SOAP or REST API • PUT, GET, HEAD, COPY, DELETE

  13. Architecture(3) - SimpleDB • SimpleDB • An Amazon’s service that provides the functionality of indexing and querying data • Data model consist items that are described by <attribute,value> pairs • Each item can have 256 <attribute,value> pairs • Each attribute name and value can be as large as 1KB

  14. Architecture(4) - SQS • Simple Queueing Service • Distributed messaging system that allows users to exchange messages between various distributed components in their systems • 8KB limit of the size of the message • In this paper, SQS is used as a write-ahead log(WAL)

  15. Architecture(5) -- PASS • Provenance-Aware Storage System • A storage system that automatically collects , stores., manages, and provides search for provenance • Monitor system calls • Generate provenance and sending both provenance and data to PA-S3fs

  16. Architecture(6) – PA-S3fs • Provenance Aware S3 File System • Caches data and provenance on the client to reduce traffic to S3 • Send data and provenance to the cloud

  17. Protocol(1)

  18. Protocol(2) • Protocol 1 ( P1 ) • Standalone Cloud Store • Map each file to an S3 object and store the provenance as a separate S3 object • Provenance object • Named with a uuid • Contain the name of primary object • Primary object metadata • Version number and uuid

  19. Protocol(3) Client S3 • P1 does not support data coupling • But can detect decoupling • Query is inefficient • Need retrieve all provenance PUT:Provenance OK PUT:Data OK

  20. Protocol(4)

  21. Protocol(5) • Protocol 2 ( P2 ) • Cloud store with a cloud database • Store provenance as one SimpleDB item • If item is larger than 1KB SimpleDB limit • store provenance as S3 object • save the pointer in attribute-value

  22. Protocol(6) Client S3 • Provide efficient provenance queries • Does not support data coupling PUT: Prov > 1KB OK SimpleDB BatchPUTAttributes: Prov OK PUT:Data OK

  23. Protocol(7) • Protocol 3 ( P3 ) • Cloud store with Cloud Database and Messaging Service • Use SQS as a write-ahead log (WAL) • 8KB limit • Store large objects as temporary S3 objects , and record the pointer in WAL • Commit daemon • Read the log records • Assemble all the records belonging to a transaction • Ignore the records if the client crash

  24. Client S3 PUT: Temp data copy OK SQS SendMessage: Prov Commitd S3 OK RecvMessage PUT:Prov>1KB OK SimpleDB BatchPUTAttributes OK S3 Copy:Data OK Delete:Msg Delete:temp OK OK

  25. Protocol(9)

  26. Evaluation(1) • Workload • CVSROOT nightly backup • IO intensive • 240 operations • Blast • Mix of compute and IO operations • Provenance tree has a depth of 5 • 10773 operations • Challenge • Mix of compute and IO operations • Provenance tree has a depth of 11 • 6179 operations

  27. Evaluation(2) EC2 instance Local machine

  28. Evaluation(3) • Query performance • Q1 • Retrieve all the provenance ever recorded • Q2 • Retrieve the provenance of all version of one object • Q3 • Find all files that were directly output by Blast • Q4 • Find all the descendants of files derived from Blast

  29. Evaluation(4)

  30. Conclusion • Definition of properties that provenance systems must exhibit • Design and implementation of three protocols for storing provenance and data on the cloud • All three protocols have reasonable overhead in time and minimal financial overhead

  31. Comment • Economy • Provenance can not increase profit directly • Customer loyalty • Security • Provenance can ensure correctness of files • But it may contain sensitive information

  32. THE END

More Related