1 / 24

Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

An Examination of Cloud Storage Architectures for Scalable Internet and Cloud Computing Applications. Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management Professor Kaiser. Agenda. What are Cloud Computing and Cloud Storage? Why Cloud Storage?

jessie
Download Presentation

Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Examination of Cloud Storage Architectures for Scalable Internet and Cloud Computing Applications Joel Christner (jec2160) COMS-E6125 Spring 2010 Web Enhanced Information Management Professor Kaiser

  2. Agenda • What are Cloud Computing and Cloud Storage? • Why Cloud Storage? • How Cloud Storage Works • Comparison of Cloud Storage Architectures • Summary

  3. What Are Cloud Computing and Cloud Storage?

  4. Terms and Definitions • “General term for anything that involves delivering hosted services over the Internet. These services are broadly divided into three categories: Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS).” • http://www.searchcloudcomputing.com • Generally accepted “public cloud” definition • Cloud storage is a component of cloud computing, and is an elastic, on-demand, scalable platform for data storage and retrieval

  5. Architectures • Three forms of clouds (architectures): • Public cloud – infrastructure owned by an external entity, usage-based billing • Private cloud – infrastructure owned by the company themselves, chargeback • Hybrid cloud – public cloud with isolated resources and secure connectivity (virtual data center extension)

  6. Attributes • Attributes of clouds: • Virtualized – abstract logical from physical resources to enable mobility (VM migration), abstract logical from physical access and integration points • Elasticity – virtualization enables elasticity (add or remove processors, memory, disk capacity) • Scalability – virtualization as abstraction via management and access middleware enables simplicity in dynamically adding or removing resources • Pay-as-you-Grow – built using commodity components (low cost) that can be added or removed as capacity needs change (virtualization and abstraction)

  7. Why Cloud Storage?

  8. Computing Evolution • Computing has shifted over the last three decades: centralized -> distributed -> centralized Mainframes withdumb terminals Distributed computingand the workgroup Centralized andconsolidated data centers

  9. Storage Evolution • Storage has followed this shift as well but remains the most costly element of any enterprise • Mainframe – costly, single-vendor, but completely monolithic yielding easiest data management and protection (single point) • Personal computers – cheap, modular, fully distributed, difficult to manage and protect data • Workgroup servers – cheap, modular, still distributed data, difficult to manage and protect • Centralized servers and storage networks – expensive, modular, simpler data management and protection

  10. Storage Network Fabrics • Organizations are still struggling to consolidate their data (data is still distributed) but the following storage network fabrics are in use today: • Storage area network (SAN) – block volume access over a shared network (Fibre Channel, Internet SCSI, Fibre Channel over Ethernet) • Network attached storage (NAS) – filesystem protocol access over a shared network (Common Internet File System, Network File System, all of which use IP and generally Ethernet) • Contrast with DAS (directly-attached storage)

  11. Storage Capital Cost Elements • Capital cost elements • Disk (in the workstation, in the server, in shared storage arrays) • Over-provisioned capacity (idle until used) • Storage array controllers (providing volume management and value-added capabilities over shared disk) • Storage network infrastructure (FC/Ethernet switches, HBAs/NICs, multipathing, failover software) • Data protection hardware and software (backup application, servers, tape libraries and automation, tapes) • Software licenses (snapshots, replication) • Vendors = high profit margin = high cost

  12. Storage Operational Cost Elements • Operational cost elements • Real estate (storage systems are large) • Facilities (power, space, cooling) • Failed component replacement (tapes, drives) • Off-site storage (Iron Mountain) • Volume provisioning, allocation, resizing, data migration, and ongoing management • People (salaries, benefits)

  13. Benefits of Cloud Storage? • Uses commodity components, eliminating the most costly elements of traditional storage capital costs • Virtualization and abstraction eliminate the most costly and time-consuming elements of traditional storage operational costs • Eliminates scalability issues associated with existing storage arrays (max drives, max capacity) • Public cloud storage enables pay-as-you-grow capacity • Private cloud storage enables chargeback • Hybrid cloud storage enables near public cloud storage cost with private cloud performance and security • Store virtually anything (user information, image files, documents, dynamic page structure, binaries, code files, anything) flexibly and at the lowest cost

  14. How Cloud Storage Works

  15. Cloud Storage System Components • Access software • Integrated or binary (emulating SCSI) • Access via HTTP RESTful APIs or SOAP • Control servers • Core of the system with databases (or NoSQL) • Holds consumer authentication credentials • Manages registration/removal of metadata/storage servers • Management interface • Metadata servers • Stateless, caching to scale control servers • Consumer authentication, session key mgmt • Object location management, • Read/write request routing amongst storage servers • Storage servers • Handles read/write requests (GET/PUT/POST/DELETE)

  16. Cloud Storage System Architecture 1..n Metadata Servers scale-out and statelessIO optimized for metadata Server Load Balancing ApplicationServers AuthenticationSession Keys N+N Control ServersHA, no scale-out Locate Object Read/Write Request Routing HTTP RESTful API SOAP API Read/Write 1..n Storage Servers scale-out Capacity optimized for data storage

  17. Why More Scalable? • Metadata servers scale the control servers through caching where appropriate • Brick-based approach to adding IO or storage capacity – simply add more metadata servers or storage servers • SLB provides load-balancing, scale, and HA for metadata servers • Metadata servers provide load-balancing for storage servers • Storage servers may have a replication policy for data high availability (copy objects across storage servers)

  18. Not Infinitely Scalable, But… • Traditional enterprise storage systems have a host of scalability limitations: • Number of trays behind the controller • Number of disks behind the controller • Number of connected hosts • Number of network interfaces • Number of configurable volumes, snapshots • Cloud storage system scalability is limited by: • Number of IOPS for the control server and offload percentage via metadata servers • Control server database capacity for metadata, object location • Number of metadata servers behind an SLB • IOPS capacity per metadata server • Storage capacity per storage server • In general, cloud storage is considered multiple orders of magnitude more scalable than traditional enterprise storage

  19. Raw $/GB Comparison • Traditional midrange enterprise storage (such as EMC’s Clariion) averages approximately $8/GB in capital costs alone • Scales to hundreds of TBs

  20. Comparable Capacity using Commodity Components • http://blog.backblaze.com/2009/09/01/petabytes-on-a-budget-how-to-build-cheap-cloud-storage/ • $8K – 67TB, or roughly $0.80/GB (1/10 cost) • Need more capacity? Add more bricks!

  21. Challenges with Cloud Storage • Access methods • Enterprise applications expect SCSI access to underlying disk infrastructure and overlay block devices with their own file systems • Cloud storage systems expose capacity via programmatic APIs (RESTful, SOAP), requiring translation • Not an issue for home-grown applications • Security • Cloud storage systems, particularly in public clouds, do not encrypt data • Even if cloud provider encrypted data, data remains vulnerable due to chain of custody when cloud provider owns the key material • Others • Performance for raw block device access vs cloud storage systems is lacking, particularly in public and virtual private cloud scenarios due to WAN bandwidth, latency, and packet loss • Cloud storage systems use replication for high availability but provide no snapshots for enterprise backup systems

  22. Summary

  23. Summary • Cloud storage architectures decrease the capital and operational expenses of today’s enterprise and Internet businesses • Cloud storage eliminates the majority of complexity and limitations associated with traditional storage (capacity limits, data migration, volume management) • Cloud storage virtually eliminates the system-level scalability limitations associated with traditional storage • Cloud storage has a series of challenges that limit its applicability in existing application environments, but remains a good fit in homegrown application environments • Innovation in the cloud storage space will improve usability (translation appliances and software), security, and performance

  24. TCP FIN

More Related