1 / 34

Diligent Technologies An IBM Company

Diligent Technologies An IBM Company. Mukesh Singh & Ron Herrmann singhm@us.ibm.com 973-271-4284 Diligent Technologies - an IBM Company. The Enterprise De-duplication Company. “Six Storage Companies to Watch” (July 2006). Top 10 Hot Storage Startup! (March 2005).

Download Presentation

Diligent Technologies An IBM Company

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Diligent TechnologiesAn IBM Company Mukesh Singh & Ron Herrmann singhm@us.ibm.com 973-271-4284 Diligent Technologies - an IBM Company

  2. The Enterprise De-duplication Company “Six Storage Companies to Watch” (July 2006) Top 10 Hot Storage Startup! (March 2005) “…ProtecTIER represents a breakthrough that will enable enterprise customers to fundamentally alter their use of disk for data protection and archiving.”Curtis Preston, GlassHouse Technologies – July 2006

  3. Agenda • Vision • Data Center Data Protection Market—Key Challenges • Changing the game: Protect More Store Less • Company Background • Market drivers • ProtecTIER Overview • Case Studies • De-duplication market landscape • Competition/Qualification • Summary

  4. Company Overview • Launched September 2002 • Acquired EMC R&D facility, talent and resources • Investors: • Matrix Partners • Accel Partners • Gemini Israel Funds • Moshe Yanai • ProtecTIER Data-Center Focus • Early Products: • VTF Mainframe- acquired from EMC • VTF Open- launched in 03 • ~200 Enterprise Data Center customers • Global Distribution Partners: HDS SUN Overland • ~30 VAR’s globally • Acquired by IBM on April 17, 2008! 4

  5. Data Agnostic Enterprise De-Duplication Performance 400MB/s Software Only Capacity 1PB Simple and Non-Disruptive 100% Data Integrity The Origins of ProtecTIER… • Early 2003, six mathematics PhD’s initiate research into massively scalable algorithms for filtering redundant data from data-storage infrastructure • Mid 2004 mission accomplished: HyperFactor is completed and systemization into ProtecTIER platform commences • Mid 2005 ProtecTIER is announced • Today, ProtecTIER is deployed in dozens of world’s largest data-centers, reducing total storage needs by 90%-95%, enabling end-users to economically deploy disk throughout data-protection life-cycle

  6. Market Conditions and Trends 2007 • May 2007: “VTLs show no signs of giving way to straight disk-based solutions….Given the expanded capabilities like data de-duplication, VTLs seem poised to be a major part of many backup and disaster recovery infrastructures for several more years.” • March 2007: Survey shows 2/3 of customers will have a VTL by 2009 • Average retention time is increasing… Half of VTL users have retention of 2 months or longer • Spring 2007: De-duplication first on “Heat Index”, VTL third • 200 F1000 firms prioritized near-term spending plans with both De-dupe and VTLs a top priority

  7. Diligent’s ProtecTIER reduces the required backup disk capacity by up to 25 times or more, allowing you to Changing the data protection game… Protect More … Store Less

  8. The Impact of HyperFactor Up to 25X the physical capacity TSM Servers

  9. MemoryResident Index “Filtered” data New Data Stream Repository HyperFactor Disk Arrays FC Switch ProtecTIER Server Existing Data Backup Servers

  10. Significant bandwidth reduction Replication with ProtecTIER Primary Site Master Server PT-server based replication Secondary Site

  11. ProtecTIER Remote Replication Benefits • Reduced bandwidth over WAN • Redundancy in systems for disaster recovery • Ability to have “tapes” off site immediately • Ability to use “cross data center” bunkering and eliminate tape transportation

  12. Sample Enterprise Deployment: Top 10 Healthcare Service Provider • ProtecTIER deployed in 3 Datacenters in Minneapolis area • Total of 26 ProtecTIER Nodes in Production • Total physical capacity is more than 3 PB • Average repository size is 80 TB physical FC storage capacity • Measured throughput performance of >400 MBPS

  13. Sample Enterprise Deployment: Top 10 Wireless Provider Overview • ProtecTIER deployed in 3 Datacenters • Total of 10 ProtecTIER Nodes in Production • Total of 300 TB Physical Repository Capacity • Total nominal capacity greater than 3.5 PB • Measured throughput performance of 350 MBPS

  14. Sample Enterprise Deployment: Top 10 Telecom • Two data centers in close proximity, connected via high bandwidth FC • One PT node per data center (initial deployment, with growth expectations) • 60 TB per node initial deployment • Cross data center backups

  15. Sample Enterprise Deployment: Top 100 Financial Services Overview • Two data center deployment with long-distance replication • Minnesota • 4 ProtecTIER Nodes • Total capacity 50 TB physical • Texas • 4 ProtecTIER Nodes • Total capacity 50 TB physical • Measured performance: 200 MBPS • Nominal capacity > 1 PB

  16. ProtecTIER Advantages for a TSM Environment

  17. ProtecTIER Advantages with TSM • ProtecTIER offers some unique advantages in the TSM environment. • Benefit 1: Greatly Expanded BACKUP POOL. • ProtecTIER is typically installed as DEVCLASS=TAPE in the BACKUP POOL. • Since ProtecTIER offers 5 to 25 times the storage capacity of simple Disk-to-disk backups devices (DEVCLASS=DISK/FILE), the advantage is the ability to keep 5-25 times more save-sets in the BACKUP POOL.

  18. ProtecTIER Advantages with TSM • ProtecTIER offers some unique advantages in the TSM environment. • Benefit 2: Decreased Reclamation Processing. • ProtecTIER allows the ability to create very small (Example 10GB) virtual tape cartridges. Since they are virtual drives, there is no mount time penalties, and it doesn’t matter if a job spans five cartridges (instead of one). • Smaller cartridges equal very little reclamation processing, virtually none for virtual tapes. • ProtecTIER supports up to 256 tape drives per server, so virtual tape drives (and virtual cartridges) can be used as resources for “real” tape reclamation processing.

  19. ProtecTIER Advantages with TSM • ProtecTIER offers some unique advantages in the TSM environment. • Benefit 3: Much faster restores. • ProtecTIER allows the BACKUP POOL to be expanded to 10-25 times the size of the same traditional D2D deployments. • ProtecTIER allows customers to keep 60 to 365 days of save-sets in the BACKUP POOL! • ProtecTIER’s ability to create smaller tape cartridges equals more efficient Collocation processing. • The combination of a larger BACKUP POOL and collocated virtual tapes equals much, much faster restores.

  20. Enterprise de-dupe Requirements

  21. ProtecTIER’s Position in the De-Duplication Market

  22. Truck A Simple View of the Backup Process Backup process starts Backup Processing Norms Backup Server Backup Target Tape Library SLA Is Met 8:00 PM 2:00 AM 8:00 AM 8:00 PM Vault/Off-site process starts Data is Off-site

  23. Understanding the Knowledge Base • A key metric is the means used to map the user content • Balancing performance vs. capacity • With hash schemes the hash for a ‘chunk’ is remembered  an index • For example purposes imagine a chunk size of 8KB • 1 TByte repository has ~125,000,000 8 KB chunks • Each hash (signature) is 20 bytes long • Need pointers scheme to reference inside 1 TByte • The hashes require 2.9 GBytes of memory – no issue • With a 100 TByte repository ~306 GBytes of memory is required ProtecTIER maps 1 PB of data into 4 GB Index

  24. Two Basic Implementations • Inline • As data is received by the target device it is • de-duplicated in real time • not temporarily stored on disk • Data written to the disk storage is de-duplicated • Post Processing • As data is received by the target device it is • temporarily stored on disk storage • Data is subsequently read back in to be processed by a de-duplication engine

  25. Step 1: Index Lookup • HyperFactor™ • Memory access even when scaled to PBytes • Processing 10 TB requires only RAM-based index searches– index search is never a bottleneck • Hash Based • Given average of an 8 KByte data slice per fingerprint • Requires 1,250,000,000 accesses to an Index to process 10 TB— a major bottleneck • Content Aware • File size dependant • Given average file size of 1 MByte requires 10,000,000 accesses to an Index to process 10 TB- a major bottleneck

  26. Case Study: Impact of Different Post-Process De-dupe Speeds Profile: • Receive at 300 MB/s • Post Process @ 100 MB/sec • Backup 6 TB • Takes approximately 6 hours • Post process then consumes the next 18 hours! • So, it takes 24 hours to process 6 TB…the real performance is: • 6TB / 24 Hours = 69.4 MBPS! But • What about resources in support of vaulting/off-site? • When de-dupe is a post-process it competes for disk resources as any other process?

  27. Post Processing Truck Inline Processing Backup Server ProtecTIER VT Tape Library SLA is Met De-Dupe 8:00 PM 2:00 AM 8:00 AM 8:00 PM De-Dupe Overlap Truck Backup Server VTL Tape Library De-Dupe 8:00 PM 8:00 PM 2:00 AM 8:00 AM

  28. Questions to consider • When looking into de-dupe based solutions make sure you ask the critical questions: • How fast is the de-dupe process in an operational environment? • If de-dupe is done in parallel to ingest, what is the impact on ingest speed? • Does capacity scale without impacting performance? • How does the solution scale in performance? • Does the system need ‘quiet’ times for space management? • Will de-dupe impact operational/production activities? • How long has their system been in production? • How many customers do they have who backup more than 10 TB per night? If you require answers to these questions, you will be better prepared for what you will deploy.

  29. Qualifying Questions • Why are they considering de-dupe? • Annual data growth rate? • Decision criteria and time frame? • Environment • Back-up app • # media servers • Back-up volumes and frequency • Retention • File types and % to total back-up • Expectations • Performance

  30. De-dupe market at a glance

  31. Overview of ProtecTIER Differentiators

  32. Analysts and Data center customers agree Top Requirements in selecting a de-dupe solution* • High Cumulative Performance/Throughput** • 100% Data Integrity • Excellent Capacity/Scalability • Proven Vendor Reputation/Experience *Results from an online de-duplication survey conducted by the 451 group of in from March to May 2007 ** Cumulative performance takes both the ingest and any post-processing into consideration

  33. Ultimate Disk-based data protection Most scalable (over 25 PB) Most Cost Effective (reduce required disk capacity by 25X) Fastest de-dupe (400 MBPS) Inline de-dupe eliminates operational impact 100% data integrity Hardware agnostic Non-disruptive deployment Data Agnostic Enterprise De-Duplication Performance 400MB/s Software Only Capacity 1PB Simple and Non-Disruptive 100% Data Integrity ProtecTIER: Summary of Differentiators

More Related