racs a case for cloud storage diversity n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
RACS: A Case for Cloud Storage Diversity PowerPoint Presentation
Download Presentation
RACS: A Case for Cloud Storage Diversity

Loading in 2 Seconds...

play fullscreen
1 / 33

RACS: A Case for Cloud Storage Diversity - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

RACS: A Case for Cloud Storage Diversity. Paper by Hussam Abu-Libdeh Cornell University Lonnie Princehouse Cornell University Hakim Weatherspoon Cornell University ( 1st ACM Symposium on Cloud Computing, SoCC 2010, Indianapolis, Indiana, USA, June 10-11, 2010. ACM 2010 ) . Presented by

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'RACS: A Case for Cloud Storage Diversity' - adah


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
racs a case for cloud storage diversity

RACS: A Case for Cloud Storage Diversity

Paper by

Hussam Abu-Libdeh Cornell University

Lonnie Princehouse Cornell University

Hakim Weatherspoon Cornell University

(1st ACM Symposium on Cloud Computing, SoCC 2010, Indianapolis, Indiana, USA, June 10-11, 2010. ACM 2010)

Presented by

Chayutra Pailom CS Department, UTSA

what is this topic about

What is this topic about?

  • Motivation: Diversify cloud storage providers
  • Key issues: The risk on being service on solely particular cloud storage provider
  • Solutions: Strip data across vendors
  • Comparisons: Overhead expense and vendor mobility
  • Possible improvement: Avoid vendor lock-in, reduce the cost of switching providers and better tolerate provider outages or failures.
  • Number of References: 39 references
agenda
Agenda

Terminology

Background

Concepts

Evaluation

Conclusion

related terminology

Related Terminology

  • RAID (Redundant Array of Independent Disks)

RAID 0 (Striped) Data is spread across all drive volumes. Drive volumes must be the same size. If one drive fails, all data is lost

RAID 5 Data is spread across all drives, as is error-correcting parity information. If one drive fails, data can be reconstructed from the remaining drives.

RAID 1 (Mirrored) Data is copied to two drives simultaneously. Mirrored drive volumes must be the same size. If one drive fails, data can still be accessed from the mirrored drive.

Picture: http://www.microcenter.com/random_access/newsletters/06_newsletters/0306/in_the_lab.html

slide5

Related Terminology

  • RAID 5 (Striping With Parity)

http://www.vectortech.co.th/technic_002.php

http://www.dataclinic.com/hard-disk-raid-recovery.htm

related terminology1

Related Terminology

  • RAID (Redundant Array of Independent Disks)

Take 1 minute to take this visualization for better understanding

http://www.acnc.com/raidedu/0

slide7

Related Terminology

  • Vendor lock-in
    • A situation in which a cloud customer is “stuck” to his/her current cloud vendor due to the complexity in switching to another cloud vendor.
    • you could “unlock” your business from your current cloud vendor – but that not without “sacrifices,” in term of time, money and effort
  • RACS
    • Redundant Array of Cloud Storage
    • Use RAID 5 as a prototype
    • Disk = Cloud Storage

http://www.cloudbusinessreview.com/2011/05/03/how-to-avoid-cloud-vendor-lock-in.html

slide8

Related Terminology

  • Error Correction Code
    • Use Reed-Solomon error correction code.
    • Oversampling a polynomial constructed from the data.
    • The polynomial is evaluated at several points, and these values are sent or recorded.
    • Sampling the polynomial more often than is necessary makes the polynomial over-determined.
    • As long as it receives "many" of the points correctly, the receiver can recover the original polynomial even in the presence of a "few" bad points.
    • Used in many different kinds of commercial applications, for example in CDs, DVDs and Blu-ray Discs.

Credit: http://simple.wikipedia.org/wiki/Reed-Solomon_error_correction

slide9

Related Terminology

  • Erasure coding (human understandable version???)
    • The key idea behind a Reed-Solomon code is that the data encoded is first visualized as a polynomial.
    • The code relies on a theorem from algebra that states that any k distinct points uniquely determine a polynomial of degree at most k-1.
    • The sender determines a degree k − 1 polynomial, over a finite field, that represents the k data points. The polynomial is then "encoded" by its evaluation at various points, and these values are what is actually sent.
    • During transmission, some of these values may become corrupted. Therefore, more than k points are actually sent.
    • As long as sufficient values are received correctly, the receiver can deduce what the original polynomial was, and decode the original data.

Credit: http://simple.wikipedia.org/wiki/Reed-Solomon_error_correction

slide10

The authors mentioned that cloud storage is more reliable than hard drives so they leave the readers of this paper assumed less failure of the data and not mentioned that much!

agenda1
Agenda

Terminology

Background

Concepts

Evaluation

Conclusion

background

Background

  • The Cloud Storage Market
    • Cloud storage providers expose simple interfaces to developers. Amazon S3's data model provides at namespaces (buckets) into which named objects can be uploaded for later retrieval.
    • Other storage services can be mounted as network file systems. There is no widely agreed-upon standard interface, but S3's REST API has been adopted by smaller providers and by the open-source Eucalyptus server software.
background1

Background

  • The Cloud Storage Market
    • These interfaces differ, but are similar enough to be considered interchangeable. Storage providers are forced to compete on price rather than by offering unique services.
    • Cloud storage is a highly competitive market. These are simplified pricing schemes for the top two cloud storage providers; Amazon S3 and Rackspace.
introduction

Introduction

Pricing scheme of different cloud storage providers

background2

Background

  • Vendor lock-in
    • Expensive to switch.
    • Subject to possibility of data loss.
    • For business, it is very big issue; go cloud or not.
    • Ways to fix, e.g. Economical approach spread the data across multiples providers.
    • But high storage and bandwidth cost.
    • Use this paper’s approach!
background3

Background

  • Why should we diversify…
    • Outages and operational failures.
    • Economic failures.
    • By striping data across providers and adding appropriate redundancy, clients can tolerate outages and operational failures, as well as adapt to changes in the economic matters.
    • Avoid vendor lock-in!
background4

Background

  • Incentives for Redundant Striping
    • RACS uses Reed-Solomon error correcting codes to tolerate failures without data loss. Starting with m equal-size disks of original data, we fill k additional disks with redundant data. Any combination of m disks (data or redundant) is sufficient to reconstruct the original data. We write (m, n) to indicate that there are n = m + k total disks, any m of which are sufficient to reconstruct all original data

Credit: http://pubs.0xff.co/posters/racs.pdf

background5

Background

  • Incentives for Redundant Striping (cont.)
    • The choice of parameters m and n is a trade-off: Overhead for data storage and write operations is increased by the ratio n : m. Interestingly, read operations are not significantly more expensive, since only m disks must be read under normal operating condition
    • RAID-5 uses a similar strategy to tolerate up to one failure in an array of hard disks.
agenda2
Agenda

Terminology

Background

Concepts

Evaluation

Conclusion

redundant array of cloud storage

Redundant Array of Cloud Storage

RACS single-proxy architecture

redundant array of cloud storage1

Redundant Array of Cloud Storage

RACS single-proxy architecture

slide23

Redundant Array of Cloud Storage

  • The goal of RACS is slightly different than RAID-5. Cloud storage is assumed to be much more reliable than hard disks, so data loss prevention is a much less compelling reason to use error correcting codes.
  • RACS lowers the cost of switching providers, e.g., as a result of economic failure.
    • Only 1/m of all data needs to be moved to leave a vendor.
    • By reducing the impact of vendor lock-in, RACS increases the leverage of customers when negotiating contracts with cloud providers.
  • RACS is implemented as an HTTP proxy with the same interface as Amazon S3.
redundant array of cloud storage2

Redundant Array of Cloud Storage

Multiple RACS proxies coordinate their actions using ZooKeeper

agenda3
Agenda

Terminology

Background

Concepts

Evaluation

Conclusion

evaluation

Evaluation

  • Internet Archives Trace
    • Trace represents 18 months of activity on the Internet Archive’s FTP site.
    • What are the estimated costs associated with storing library type content such as the Library of Congress and the Internet Archive on the cloud?
    • What is the cost of changing storage providers for large organizations? What is the added cost of a DuraCloud-like (replicate data across multiples cloud providers)
    • What is the cost of avoiding vendor lock-in using RACS? And what is the added cost overhead of using RACS (depends on the coding configuration = additional data to be written)?
evaluation1

Evaluation

Inbound and outbound data transfers in the Internet Archive trace for Individual IA data transfers

Read/write requests in the Internet Archive trace for Individual IA read/write requests

evaluation2

Evaluation

  • Cost of Moving to The Cloud

Estimated monthly and cumulative costs of hosting on the cloud with different storage providers

evaluation3

Evaluation

  • Cost of Switching Vendors

The cost of switching the Internet Archive’s storage provider for various configurations

agenda4
Agenda

Terminology

Background

Concepts

Evaluation

Conclusion

conclusion

Conclusion

  • The commoditization of cloud services has brought with it the characteristics of an economy, good and bad.
  • RACS: A simple application of technology to change the structure of a market.
  • Erasure coding is applied to different types of failure (economic) than in the storage systems.
  • Microbenchmarks and larger experiments are simulated for the real-world traces.
  • RACS enables cloud storage customers to explore trade-offs between overhead and mobility.
sample references

Sample References

[16] J. Bloemer, M. Kalfane, M. Karpinski, R. Karp, M. Luby,

and D. Zuckerman. An XOR-based erasure-resilient coding

scheme. Technical Report TR-95-048, The International

Computer Science Institute, Berkeley, CA, 1995.

[30] D. Patterson, G. Gibson, and R. Katz. The case for RAID:

Redundant arrays of inexpensive disks. In Proc. of ACM

SIGMOD Conf., pages 106–113, May 1988.

[38] M. Vrable, S. Savage, and G. M. Voelker. Cumulus:

Filesystem backup to the cloud. Trans. Storage, 5(4):1–28,

2009.