1 / 18

The Pros & Cons of Content Addressed Storage

The Pros & Cons of Content Addressed Storage. Arun Taneja Founder & Consulting Analyst. Something Must Be Done!. Current Data Protection Environment. Data Tsunami No Backup Windows Cost of Downtime Increasing Regulations and Compliance Requirements

robbinsd
Download Presentation

The Pros & Cons of Content Addressed Storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Pros & Cons ofContent Addressed Storage Arun Taneja Founder & Consulting Analyst

  2. Something Must Be Done! Current Data Protection Environment • Data Tsunami • No Backup Windows • Cost of Downtime Increasing • Regulations and Compliance Requirements • Data Protection Technology at Break Point

  3. FCIP iSCSI NDMP DAFS iFCP RDMA CAS GRID TOE RAIN SATA SMI-S SAS Many New Technologies to The Rescue

  4. Concept whereby the address of an object is computed from the content of that object Definition Advantages Disadvantages • Location Independence • Authenticity • Simplified Indexing • Scalability to Exabytes • Load Balancing • Elimination of Duplication • New and Unfamiliar • May Require Changes to Applications • May Require Procedural Changes • May Require Abandoning Existing Applications What is CAS?

  5. CAS vs Networked Storage • SAN & NAS Use File Systems to Place and Locate Data (/abc/xyz/acme.doc) • Hierarchical • Difficult to Scale Beyond TBs • Application Determines if Duplication of Object Exists • Indexing can Become Complicated

  6. Algorithm Applied to the Object’s Content File Portion of a file Directory or file system Unique 128-bit Coding Results (160-bits for Avamar) Object (File, FS, Dir) How is CAS Done? 128-bit hash unique to that object (eg. MD5)

  7. What Can CAS Be Used For? • Archival Storage • Backup and Restore • Disaster Recovery • Content Management

  8. Backup and Restore/DR Archive/Content Mgmt • Lack of Authenticity • Media/Technology Changes • Tape Environmental Issues • Poor Access Times • TCO Expensive • Slow Queries from Large Reps • Centralized Indexing • Application Performance • Generates Tons of Data 10:1 • Backup Windows • No Guarantee if Data is Recoverable • DR Expensive • DR: Potential Consistency Issues Issues with Existing Architectures

  9. Methods for Keeping More Data Online • Bigger Primary Storage • Compression of Data • Hierarchical Storage Architectures • Data Normalization: Finding Subsets of Data That are Common and Storing Them Only Once • No Limit on the Effective Compression Ratio • Indexing Systems Super Critical

  10. Commonality Factoring Using CAS • Fixed Size Atomics for Database • Variable Size Atomics for File Systems • CAS Algorithms Used to Calculate CA for Each Subset • Data Structures Needed to Reconstruct from Atomics • Above Data Kept with Atomics Data

  11. CAS Example: Avamar • CAS Applied to BU/Restore, Archive and DR (initial focus BU/R) • Focus on Data Reduction • Typical Secondary to Primary Ratio is 10:1 • Avamar Claims 1.2 to 1 • Never Do Full + Incremental Backups, Only SnapUps

  12. CAS Example:Avamar Systems Architecture • Distributed Backup Repository • Peer-to-Peer RAIN Architecture • Each Node has Uniform and Consistent View of Repository • Clients can Request Services from any Node • Data Striped Across Nodes (similar to RAID) • No Single Point of Failure • Requires Agent on Each Client System

  13. CAS Archival Example:EMC Centera CA of CDF Returned Centera Application CDF CA of CDF store CDF XML Calculate CA and extract metadata metadata C-clip CA store file Blob API Source: EMC

  14. Due to Architecture Due to CAS • No LUNs to Create or Manage • No Volumes to Create or Manage • Flat Addressing, Simple Indexing • Content Authentication • One Copy of Blob Stored • RAIN=Non-disruptive Scalability • No Reconfigs Required • No Technology Obsolescence • Policy-based Storage of Blobs • Application Modification CAS Advantages: EMC Centera

  15. CAS Players Data Center Technologies Persist Technologies

  16. CAS Futures: What's Needed? • Flexible Scaling Capabilities • Integration with File Interfaces • Easy API-free Application Integration • Integrated Indexing

  17. CAS +’s CAS -’s • Many Aspects are Untested • May Require New Procedures/Tools • Disruptive Technology • Not Good Enough for High Performance Primary Needs • Location Independence • Authenticity • Eliminate Redundancy • Simplify Indexing • Simplify Management • Improve Scalability • Single System Image of Repository Summary

  18. No Wholesale Changes! Taneja Group Recommendations • Absolutely Test Out CAS Systems but… • Apply to a Project at a Time (consider the disruptive factor) • Keep a Fallback Position (run systems in parallel) • Test Out Recoverability Regularly • Keep in Mind…More Solutions Coming

More Related