1 / 23

EMC DATA DOMAIN OVERVIEW

EMC Data Domain: Leadership and Innovation. A history of industry firsts. First deduplication NAS. First deduplication volume replication. Largest deduplication array. First deduplicationdirectory replication. First deduplication virtual tape library. . First deduplication nearline storage. Fastest backupcontroller.

lou
Download Presentation

EMC DATA DOMAIN OVERVIEW

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. EMC DATA DOMAIN OVERVIEW Note to Presenter: Present to customers and prospects to provide them with an overview of EMC Data Domain. Note to Presenter: Present to customers and prospects to provide them with an overview of EMC Data Domain.

    2. EMC Data Domain: Leadership and Innovation A history of industry firsts As you can see here, Data Domain has a history of leadership and innovation in the deduplication storage category—starting with the first deduplicated NAS storage system back in 2003 and spanning to 2011 when Data Domain introduced the first long-term retention system for backup and archive.As you can see here, Data Domain has a history of leadership and innovation in the deduplication storage category—starting with the first deduplicated NAS storage system back in 2003 and spanning to 2011 when Data Domain introduced the first long-term retention system for backup and archive.

    3. Deduplication Dramatically Reduces Storage Capacity Requirements Backup can be an inefficient process that involves repetitively moving mostly the same data again and again. Deduplication dramatically reduces the amount of redundancy in backup storage and is defined as “the process of finding and eliminating duplication within sets of data.” The deduplication process uses well understood concepts such as cryptographic hashes and content-addressed storage. Only unique segments are stored along with metadata needed to reconstitute the original dataset. This chart gives you an indication of why nine out of 10 respondents to TheInfoPro Wave 15 Storage Study already have, or have plans for, deduplicated backup, and shows one angle on how to look at its impact. There are two points that are important to note here: First, the effect grows over time. The more redundant data that is stored, the greater the degree of deduplication effect between the amount stored by the backup software—the light blue area—and the amount of capacity used, which is the dark blue area on the bottom. Second, these numbers are based on a typical backup policy schedule of a full backup on a weekly basis. The amount of data reduction varies primarily on the basis of that policy and how long that data is kept. So the retention policy will guide the degree of deduplication more than any other factor. One thing is clear—the impact is significant. Note to Presenter: Details of the May 2011 release of TheInfoPro Wave 15 Storage Study can be found at this URL: http://www.theinfopro.com/2011/05/latest-it-market-study-from-theinfopro-f1000-enterprises-2011-storage-spend-continues-at-a-strong-pace/. TheInfoPro’s “Technology Heat Index” is widely regarded as effective measure of user “demand” for a technology, and from a vendor’s perspective, a good indicator of the relative size of the market opportunity. Backup can be an inefficient process that involves repetitively moving mostly the same data again and again. Deduplication dramatically reduces the amount of redundancy in backup storage and is defined as “the process of finding and eliminating duplication within sets of data.” The deduplication process uses well understood concepts such as cryptographic hashes and content-addressed storage. Only unique segments are stored along with metadata needed to reconstitute the original dataset. This chart gives you an indication of why nine out of 10 respondents to TheInfoPro Wave 15 Storage Study already have, or have plans for, deduplicated backup, and shows one angle on how to look at its impact. There are two points that are important to note here: First, the effect grows over time. The more redundant data that is stored, the greater the degree of deduplication effect between the amount stored by the backup software—the light blue area—and the amount of capacity used, which is the dark blue area on the bottom. Second, these numbers are based on a typical backup policy schedule of a full backup on a weekly basis. The amount of data reduction varies primarily on the basis of that policy and how long that data is kept. So the retention policy will guide the degree of deduplication more than any other factor. One thing is clear—the impact is significant. Note to Presenter: Details of the May 2011 release of TheInfoPro Wave 15 Storage Study can be found at this URL: http://www.theinfopro.com/2011/05/latest-it-market-study-from-theinfopro-f1000-enterprises-2011-storage-spend-continues-at-a-strong-pace/. TheInfoPro’s “Technology Heat Index” is widely regarded as effective measure of user “demand” for a technology, and from a vendor’s perspective, a good indicator of the relative size of the market opportunity.

    4. Backup Data Reduction/Deduplication Time Series of Large Enterprise Implementation Note to Presenter: View in Slide Show mode for hyperlink in footer to work. According to the latest TheInfoPro Wave Storage Study, 48 percent of Fortune 1000 respondents have backup deduplication in use and another 40 percent have it either in pilot, or in their future plans. That’s about nine in 10 respondents, either with deduplication or moving to it, giving deduplication a “Technology Heat Index” rank of 1. In other words, the move is on—from tape-centric backup architecture to disk-centric designed backup based on deduplication technologies. Note to Presenter: Details of the May 2011 release of TheInfoPro Wave 15 Storage Study can be found at this URL: http://www.theinfopro.com/2011/05/latest-it-market-study-from-theinfopro-f1000-enterprises-2011-storage-spend-continues-at-a-strong-pace/. TheInfoPro’s “Technology Heat Index” is widely regarded as effective measure of user “demand” for a technology, and from a vendor’s perspective, a good indicator of the relative size of the market opportunity.Note to Presenter: View in Slide Show mode for hyperlink in footer to work. According to the latest TheInfoPro Wave Storage Study, 48 percent of Fortune 1000 respondents have backup deduplication in use and another 40 percent have it either in pilot, or in their future plans. That’s about nine in 10 respondents, either with deduplication or moving to it, giving deduplication a “Technology Heat Index” rank of 1. In other words, the move is on—from tape-centric backup architecture to disk-centric designed backup based on deduplication technologies. Note to Presenter: Details of the May 2011 release of TheInfoPro Wave 15 Storage Study can be found at this URL: http://www.theinfopro.com/2011/05/latest-it-market-study-from-theinfopro-f1000-enterprises-2011-storage-spend-continues-at-a-strong-pace/. TheInfoPro’s “Technology Heat Index” is widely regarded as effective measure of user “demand” for a technology, and from a vendor’s perspective, a good indicator of the relative size of the market opportunity.

    5. Backup Data Reduction/Deduplication Large Enterprise Note to Presenter: View in Slide Show mode for hyperlink in footer to work. The previous slide showed a chart from TheInfoPro Wave 15 Storage Study that showed deduplication adoption and plans. This chart from the same TheInfoPro study shows EMC’s significant degree of leadership over competitors in the area of backup data reduction/deduplication. Note to Presenter: Details of the May 2011 release of TheInfoPro Wave 15 Storage Study can be found at this URL: http://www.theinfopro.com/2011/05/latest-it-market-study-from-theinfopro-f1000-enterprises-2011-storage-spend-continues-at-a-strong-pace/. Note to Presenter: View in Slide Show mode for hyperlink in footer to work. The previous slide showed a chart from TheInfoPro Wave 15 Storage Study that showed deduplication adoption and plans. This chart from the same TheInfoPro study shows EMC’s significant degree of leadership over competitors in the area of backup data reduction/deduplication. Note to Presenter: Details of the May 2011 release of TheInfoPro Wave 15 Storage Study can be found at this URL: http://www.theinfopro.com/2011/05/latest-it-market-study-from-theinfopro-f1000-enterprises-2011-storage-spend-continues-at-a-strong-pace/.

    6. Purpose-Built Backup Appliances Open Systems + Mainframe In addition to leadership in deduplication, IDC recently announced that EMC is the clear leader in the “purpose-built backup appliances” (PBBA) market. This is a $1.7B market that hasn’t really been tracked until now. This chart shows the total market worldwide, including mainframe-specific solutions. EMC is in a clear leadership position with more than 64 percent market share. Note to Presenter: The IDC report can be found at this URL: http://www.emc.com/collateral/analyst-reports/11530-idc-ww-pbba-2011-2015-forecast.pdf. In addition to leadership in deduplication, IDC recently announced that EMC is the clear leader in the “purpose-built backup appliances” (PBBA) market. This is a $1.7B market that hasn’t really been tracked until now. This chart shows the total market worldwide, including mainframe-specific solutions. EMC is in a clear leadership position with more than 64 percent market share. Note to Presenter: The IDC report can be found at this URL: http://www.emc.com/collateral/analyst-reports/11530-idc-ww-pbba-2011-2015-forecast.pdf.

    7. With Data Domain Deduplication Storage Systems, You Can… Retain longer Keep backups onsite longer with less disk for fast, reliable restores, and eliminate the use of tape for operational recovery Replicate smarter Move only deduplicated data over existing networks with up to 99% bandwidth efficiency for cost-effective disaster recovery Recover reliably Continuous fault detection and self-healing ensure data recoverability to meet service level agreements Note to Presenter: View in Slide Show mode for animation. Let’s look at what kind of transformational advantages you’ll get from Data Domain. You’ll be able to: Retain backups longer. By reducing data amounts by 10 to 30 times, you can keep backups onsite longer using less disk for fast, reliable restores, and eliminate the use of tape for operational recovery. Replicate smarter. Move only deduplicated data over existing networks for up to 99 percent bandwidth efficiency and cost-effective disaster recovery. Recovery reliably from disk. With continuous fault detection and system self-healing, you can ensure that data is recoverable and easily meet service level agreements.Note to Presenter: View in Slide Show mode for animation. Let’s look at what kind of transformational advantages you’ll get from Data Domain. You’ll be able to: Retain backups longer. By reducing data amounts by 10 to 30 times, you can keep backups onsite longer using less disk for fast, reliable restores, and eliminate the use of tape for operational recovery. Replicate smarter. Move only deduplicated data over existing networks for up to 99 percent bandwidth efficiency and cost-effective disaster recovery. Recovery reliably from disk. With continuous fault detection and system self-healing, you can ensure that data is recoverable and easily meet service level agreements.

    8. Deduplication Fundamentals The next section will focus on deduplication fundamentals.The next section will focus on deduplication fundamentals.

    9. Data Domain Basics Easy integration with existing environment Now I’ll introduce you to the Data Domain storage system and move from the outside in. This is a picture of what you would see in a Data Domain deployment. A Data Domain appliance is a storage system with shelves of disks and a controller. It’s optimized, first to back up and second to archive applications, and supports most of the industry-leading backup and archiving applications. I’ll talk primarily about backup in this discussion, and get to archiving later in the presentation. The list on the left is composed primarily of leading backup applications—not only EMC’s offerings with EMC NetWorker, but also Symantec, CommVault, and so on…even niche vendors like Veeam for VMware. On the way into the storage system, data can pass through either Ethernet or Fibre Channel. With Ethernet it can use mass protocols and NFS or CIFS; it can also use optimized protocols or products, such as Data Domain Boost, a custom integration with leading backup applications. After the data is stored and it’s deduplicated during the storage process, it can replicate for disaster recovery. Only the compressed deduplicated unique data segments that have been filtered out through the right process on the target tier are replicated. Within the hardware, there are best-of-class approaches for using commodity hardware for maximum effect. Data Domain was early, for example with a RAID 6 implementation. And within this is the heart of the Data Domain added value—a file system that deduplicates inline. Now I’ll introduce you to the Data Domain storage system and move from the outside in. This is a picture of what you would see in a Data Domain deployment. A Data Domain appliance is a storage system with shelves of disks and a controller. It’s optimized, first to back up and second to archive applications, and supports most of the industry-leading backup and archiving applications. I’ll talk primarily about backup in this discussion, and get to archiving later in the presentation. The list on the left is composed primarily of leading backup applications—not only EMC’s offerings with EMC NetWorker, but also Symantec, CommVault, and so on…even niche vendors like Veeam for VMware. On the way into the storage system, data can pass through either Ethernet or Fibre Channel. With Ethernet it can use mass protocols and NFS or CIFS; it can also use optimized protocols or products, such as Data Domain Boost, a custom integration with leading backup applications. After the data is stored and it’s deduplicated during the storage process, it can replicate for disaster recovery. Only the compressed deduplicated unique data segments that have been filtered out through the right process on the target tier are replicated. Within the hardware, there are best-of-class approaches for using commodity hardware for maximum effect. Data Domain was early, for example with a RAID 6 implementation. And within this is the heart of the Data Domain added value—a file system that deduplicates inline.

    10. Data Deduplication: Technology Overview Store more backups in a smaller footprint A technology overview of data deduplication will help illustrate how you can store more backups in a smaller footprint with Data Domain. Note to Presenter: Click now in Slide Show mode for animation. On Friday, the backup application initiates the first full backup of 1 TB, but only 250 GB is stored on Data Domain. This occurs because as the data stream is coming into Data Domain, the system is deduplicating before storing data to disk. On average this results in a two- to four-times reduction in data on a first full backup. Note to Presenter: Click now in Slide Show mode for animation. Over the course of the week, 100 GB daily incremental backups result in a seven- to 10-times reduction and only require 10 GB to be stored. As the graphic on the left shows, during the week incremental backups contain data that was already protected from the first full backup. Note to Presenter: Click now in Slide Show mode for animation. Finally, on the second Friday, the second full backup contains almost all redundant data. Therefore of the 1 TB backup dataset, only 18 GB needed to be stored. In total, over the course of a week, 2.4 TB of data was backed up to Data Domain, but the system only required 308 GB of capacity to protect this dataset. Overall, this resulted in a 7.8-times reduction in one week.A technology overview of data deduplication will help illustrate how you can store more backups in a smaller footprint with Data Domain. Note to Presenter: Click now in Slide Show mode for animation. On Friday, the backup application initiates the first full backup of 1 TB, but only 250 GB is stored on Data Domain. This occurs because as the data stream is coming into Data Domain, the system is deduplicating before storing data to disk. On average this results in a two- to four-times reduction in data on a first full backup. Note to Presenter: Click now in Slide Show mode for animation. Over the course of the week, 100 GB daily incremental backups result in a seven- to 10-times reduction and only require 10 GB to be stored. As the graphic on the left shows, during the week incremental backups contain data that was already protected from the first full backup. Note to Presenter: Click now in Slide Show mode for animation. Finally, on the second Friday, the second full backup contains almost all redundant data. Therefore of the 1 TB backup dataset, only 18 GB needed to be stored. In total, over the course of a week, 2.4 TB of data was backed up to Data Domain, but the system only required 308 GB of capacity to protect this dataset. Overall, this resulted in a 7.8-times reduction in one week.

    11. Retain: Store More for Longer with Less Over one year of retention in 3U of Data Domain deduplication storage Note to Presenter: View in Slide Show mode for animation. If you extend this scenario out to four months of backups, you’ll see how you could retain more backups longer with less disk by eliminating redundant data from your backup stream and reduce the necessary amount of backup storage. By doing this you’ll be able change the economics of using disk, eliminating or minimizing the use of tape for operational recovery. This chart shows the dramatic reduction in storage required for backups. Just like the previous slide, the first column is the type of backup data—the first full backup, full backups accumulated after week one, week two, and all the way through to month four in a four-month retention policy. The cumulative logical column is next and shows you how much data has been protected and would be stored without deduplication. Then there’s the estimated reduction from deduplication in the third column, with the last column representing the actual physical storage used with Data Domain. As you can see, at the end of four months, you’ve protected the equivalent of 23.4 TB of backups but only used 1.2 TB of disk—a 20-times reduction. Or viewed differently, the four-month deduplicated total is 50 percent less than the single week total using non-deduplicated storage. This dramatic impact shows you why so many companies have redesigned their backup around disk-optimized storage. Note to Presenter: View in Slide Show mode for animation. If you extend this scenario out to four months of backups, you’ll see how you could retain more backups longer with less disk by eliminating redundant data from your backup stream and reduce the necessary amount of backup storage. By doing this you’ll be able change the economics of using disk, eliminating or minimizing the use of tape for operational recovery. This chart shows the dramatic reduction in storage required for backups. Just like the previous slide, the first column is the type of backup data—the first full backup, full backups accumulated after week one, week two, and all the way through to month four in a four-month retention policy. The cumulative logical column is next and shows you how much data has been protected and would be stored without deduplication. Then there’s the estimated reduction from deduplication in the third column, with the last column representing the actual physical storage used with Data Domain. As you can see, at the end of four months, you’ve protected the equivalent of 23.4 TB of backups but only used 1.2 TB of disk—a 20-times reduction. Or viewed differently, the four-month deduplicated total is 50 percent less than the single week total using non-deduplicated storage. This dramatic impact shows you why so many companies have redesigned their backup around disk-optimized storage.

    12. Data Integrity: Data Invulnerability Architecture Another important differentiator for Data Domain systems is the Data Invulnerability Architecture. Data Domain Data Invulnerability Architecture lays out the industry's best defense against data integrity issues by providing unprecedented levels of data protection, data verification, and self-healing capabilities that are unavailable in conventional disk or tape systems. There are three key areas of data integrity protection described on this slide: First is end-to-end data verification at backup time. As illustrated by the graphic at the right, end-to-end verification means reading data after it is written and comparing it to what was sent to disk, proving that it is reachable through the file system to disk and that the data is not corrupted. Specifically, when the Data Domain Operating System receives a write request from backup software, it computes a checksum over the data. After analyzing the data for redundancy, it stores the new data segments and all of the checksums. After all the data has been written to disk, Data Domain Operating System verifies that it can read the entire file from the disk platter and through the Data Domain file system, and that the checksums of the data read back match the checksums of the written data. This confirms the data is correct and recoverable from every level of the system. If there are problems anywhere along the way—for example, if a bit has flipped on a disk drive—it will be caught. Since most restores happen within a day or two of backups, systems that verify/correct data integrity slowly over time will be too late for most recoveries. Second is a self-healing file system. Data Domain systems actively re-verify the integrity of all data every week in an ongoing background process. This scrub process will find and repair defects on the disk before they can become a problem. In addition, real-time error detection ensures that all data returned to the user during a restore is correct. On every read from disk, the system first verifies that the block read from disk is the block expected. It then uses the checksum to verify the integrity of the data. If any issue is found, the Data Domain Operating System will self-heal and correct the data error. In addition to data verification and self-healing, there are a collection of other capabilities. Data Domain with RAID 6 provides double disk failure protection; NVRAM enables fast, safe restart; and snapshots provide point-in-time file system recoverability. Backups are the data store of last resort. Data Domain Data Invulnerability Architecture provides extra levels of data integrity protection to detect faults and repair them to ensure backup data or recovery is not at risk.Another important differentiator for Data Domain systems is the Data Invulnerability Architecture. Data Domain Data Invulnerability Architecture lays out the industry's best defense against data integrity issues by providing unprecedented levels of data protection, data verification, and self-healing capabilities that are unavailable in conventional disk or tape systems. There are three key areas of data integrity protection described on this slide: First is end-to-end data verification at backup time. As illustrated by the graphic at the right, end-to-end verification means reading data after it is written and comparing it to what was sent to disk, proving that it is reachable through the file system to disk and that the data is not corrupted. Specifically, when the Data Domain Operating System receives a write request from backup software, it computes a checksum over the data. After analyzing the data for redundancy, it stores the new data segments and all of the checksums. After all the data has been written to disk, Data Domain Operating System verifies that it can read the entire file from the disk platter and through the Data Domain file system, and that the checksums of the data read back match the checksums of the written data. This confirms the data is correct and recoverable from every level of the system. If there are problems anywhere along the way—for example, if a bit has flipped on a disk drive—it will be caught. Since most restores happen within a day or two of backups, systems that verify/correct data integrity slowly over time will be too late for most recoveries. Second is a self-healing file system. Data Domain systems actively re-verify the integrity of all data every week in an ongoing background process. This scrub process will find and repair defects on the disk before they can become a problem. In addition, real-time error detection ensures that all data returned to the user during a restore is correct. On every read from disk, the system first verifies that the block read from disk is the block expected. It then uses the checksum to verify the integrity of the data. If any issue is found, the Data Domain Operating System will self-heal and correct the data error. In addition to data verification and self-healing, there are a collection of other capabilities. Data Domain with RAID 6 provides double disk failure protection; NVRAM enables fast, safe restart; and snapshots provide point-in-time file system recoverability. Backups are the data store of last resort. Data Domain Data Invulnerability Architecture provides extra levels of data integrity protection to detect faults and repair them to ensure backup data or recovery is not at risk.

    13. Network-Efficient Replication for True Disaster Recovery Lowers WAN costs; improves service level agreements Once the data is stored in a Data Domain system, there are a variety of replication options to move the compressed deduplicated changes to a secondary site or a tertiary site for restore in multiple locations for disaster recovery. This can be done in a number of ways. There is a very high-performance, whole system, volume-replication approach. In addition, the most popular is a directory or tape pool-oriented approach that lets you select a part of the file system, or a virtual tape library or tape pool, and only replicate that. So a single system could be used as both a backup target and a replica for another Data Domain system. This graphic shows a number of smaller sites all replicated into one hub site. In those cases the dialogue between those systems asks the hub whether or not it has a given segment of data yet. If it doesn’t, then it sends the data. If the destination system does have the data already, the source site doesn’t have to send the data again. In this scenario with multiple systems replicating to one, in a many-to-one configuration, there is cross-site deduplication, further reducing the WAN bandwidth required and the price.Once the data is stored in a Data Domain system, there are a variety of replication options to move the compressed deduplicated changes to a secondary site or a tertiary site for restore in multiple locations for disaster recovery. This can be done in a number of ways. There is a very high-performance, whole system, volume-replication approach. In addition, the most popular is a directory or tape pool-oriented approach that lets you select a part of the file system, or a virtual tape library or tape pool, and only replicate that. So a single system could be used as both a backup target and a replica for another Data Domain system. This graphic shows a number of smaller sites all replicated into one hub site. In those cases the dialogue between those systems asks the hub whether or not it has a given segment of data yet. If it doesn’t, then it sends the data. If the destination system does have the data already, the source site doesn’t have to send the data again. In this scenario with multiple systems replicating to one, in a many-to-one configuration, there is cross-site deduplication, further reducing the WAN bandwidth required and the price.

    14. Enterprise Recoverability Readiness at Disaster Recovery Site OPTIONAL SLIDE A consequence of a post-process approach, rather than Data Domain’s inline approach, is a delay in completing replication. In a Data Domain inline deduplication process, as the data is stored it’s deduplicated—once it hits disk and there’s a logical consistency point it can start replicating, so the data can be “DR-ready” at a remote site very quickly, reasonably shortly after the backup is complete. There are two different ways that vendors use post-process styles of storage to do replication, but they always end up with slower effects when it comes to a restore site. In an adapted process, data is stored on disk and then after a small amount of data is collected it will start deduplicating. After it finishes deduplicating, it can start to replicate that first image; meanwhile, it’s still backing up other data. The consequence is that the functions overlap, so a single controller is busy both managing the I/O-intensive process of backup and the typically disk-bound process of deduplication, followed by the additional work for replication. In a scheduled post-process approach, all of the backup data is stored, followed by all of it being deduplicated and replicated. This will go a little bit faster than when it overlaps, but it still takes longer. In some implementations the data is also not compressed when it replicates, so that can take additional time as well to send over the same bandwidth. In all of these non-inline approaches it’s just going to take longer, and if your recovery point objective on the disaster recovery side is to be able to restore data as soon as possible after backup, you will always be successful with an inline approach, especially a Data Domain approach. The worst case example (on the bottom of the slide) is backing up to a disk storage system like a virtual tape library and copying to tape, tracking that, and then recalling tapes. OPTIONAL SLIDE A consequence of a post-process approach, rather than Data Domain’s inline approach, is a delay in completing replication. In a Data Domain inline deduplication process, as the data is stored it’s deduplicated—once it hits disk and there’s a logical consistency point it can start replicating, so the data can be “DR-ready” at a remote site very quickly, reasonably shortly after the backup is complete. There are two different ways that vendors use post-process styles of storage to do replication, but they always end up with slower effects when it comes to a restore site. In an adapted process, data is stored on disk and then after a small amount of data is collected it will start deduplicating. After it finishes deduplicating, it can start to replicate that first image; meanwhile, it’s still backing up other data. The consequence is that the functions overlap, so a single controller is busy both managing the I/O-intensive process of backup and the typically disk-bound process of deduplication, followed by the additional work for replication. In a scheduled post-process approach, all of the backup data is stored, followed by all of it being deduplicated and replicated. This will go a little bit faster than when it overlaps, but it still takes longer. In some implementations the data is also not compressed when it replicates, so that can take additional time as well to send over the same bandwidth. In all of these non-inline approaches it’s just going to take longer, and if your recovery point objective on the disaster recovery side is to be able to restore data as soon as possible after backup, you will always be successful with an inline approach, especially a Data Domain approach. The worst case example (on the bottom of the slide) is backing up to a disk storage system like a virtual tape library and copying to tape, tracking that, and then recalling tapes.

    15. DD Boost Software Distributes parts of deduplication process to backup server or application clients Licensable software works across Data Domain portfolio Supports majority of backup software market EMC Avamar and NetWorker Symantec NetBackup and Backup Exec Speeds backups by up to 50 percent Process more backups with existing resources 20–40% less overall impact to backup server 80–99% less LAN bandwidth Enables Data Domain replication management from the backup application In the traditional backup world, backup software is backup software, and storage is storage. DD Boost software distributes part of the deduplication process out of the Data Domain system and onto the backup server. This makes the backup network more efficient, makes Data Domain systems 50 percent faster, and makes the whole aggregate system more manageable. It works across the entire Data Domain product line and supports the majority of the backup market. In the traditional backup world, backup software is backup software, and storage is storage. DD Boost software distributes part of the deduplication process out of the Data Domain system and onto the backup server. This makes the backup network more efficient, makes Data Domain systems 50 percent faster, and makes the whole aggregate system more manageable. It works across the entire Data Domain product line and supports the majority of the backup market.

    16. Additional Data Domain Software Options In addition to DD Boost, EMC offers four additional Data Domain software options that can enhance the value of a Data Domain system in your environment. Note to Presenter: Click now in Slide Show mode for animation. The first is DD Virtual Tape Library software, which eliminates tape-related failures by enabling all Data Domain systems to emulate multiple tape devices over a Fibre Channel interface. This software option provides easy integration of deduplication storage in open systems and IBM i environments. Note to Presenter: Click now in Slide Show mode for animation. Next is DD Replicator software, which provides fast, network-efficient , encrypted replication for disaster recovery, remote office data protection, multi-site tape consolidation, and long-term offsite retention. DD Replicator asynchronously transfers only the compressed, deduplicated data over the WAN, making network-based replication cost-effective, fast, and reliable. In addition, you can replicate up to 270 remote sites into a single Data Domain system for consolidated protection of your distributed enterprise. Note to Presenter: Click now in Slide Show mode for animation. Next, DD Retention Lock software enables you to easily implement deduplication with file locking to satisfy IT governance and compliance policies for archive protection. DD Retention Lock also enables electronic data shredding on a per-file basis to ensure that deleted files have been disposed of in an appropriate and permanent manner, in order to maintain confidentiality of classified material, limit liability, and enforce privacy requirements. Note to Presenter: Click now in Slide Show mode for animation. Finally, DD Encryption software protects backup and archive data stored on Data Domain systems with encryption that is performed inline—before the data is written to disk. Encrypting data at rest satisfies internal governance rules and compliance regulations and protects against theft or loss of a physical system. The combination of inline encryption and deduplication provides the most secure data-at-rest encryption solution available. In addition to DD Boost, EMC offers four additional Data Domain software options that can enhance the value of a Data Domain system in your environment. Note to Presenter: Click now in Slide Show mode for animation. The first is DD Virtual Tape Library software, which eliminates tape-related failures by enabling all Data Domain systems to emulate multiple tape devices over a Fibre Channel interface. This software option provides easy integration of deduplication storage in open systems and IBM i environments. Note to Presenter: Click now in Slide Show mode for animation. Next is DD Replicator software, which provides fast, network-efficient , encrypted replication for disaster recovery, remote office data protection, multi-site tape consolidation, and long-term offsite retention. DD Replicator asynchronously transfers only the compressed, deduplicated data over the WAN, making network-based replication cost-effective, fast, and reliable. In addition, you can replicate up to 270 remote sites into a single Data Domain system for consolidated protection of your distributed enterprise. Note to Presenter: Click now in Slide Show mode for animation. Next, DD Retention Lock software enables you to easily implement deduplication with file locking to satisfy IT governance and compliance policies for archive protection. DD Retention Lock also enables electronic data shredding on a per-file basis to ensure that deleted files have been disposed of in an appropriate and permanent manner, in order to maintain confidentiality of classified material, limit liability, and enforce privacy requirements. Note to Presenter: Click now in Slide Show mode for animation. Finally, DD Encryption software protects backup and archive data stored on Data Domain systems with encryption that is performed inline—before the data is written to disk. Encrypting data at rest satisfies internal governance rules and compliance regulations and protects against theft or loss of a physical system. The combination of inline encryption and deduplication provides the most secure data-at-rest encryption solution available.

    17. DD Archiver Overview Cost-optimized, long-term retention Data Domain system for backup and archive Active tier: short-term data protection; less than 90 days Archive tier: scalable long-term retention; multiple years High-throughput deduplication storage Up to 9.8 TB/hr Cost optimized for long-term retention Up to 570 TB usable, 28.5 PB logical capacity Low cost per gigabyte while maintaining high throughput Fault isolation of archive units for long-term recoverability Leverage existing Data Domain system advantages Supports DD Replicator and DD Retention Lock software options Data Domain Data Invulnerability Architecture to ensure data integrity Like other Data Domain systems, Data Domain Archiver includes a controller and storage shelves, referred to as the “active tier” in this system. The active tier can be expanded to up to four storage shelves (142 TB of usable capacity), and it is used for short-term (generally less than 90 days) retention of backup and archive data. In addition, DD Archiver also incorporates an “archive tier” with up to 23 additional storage shelves (474 TB of usable capacity). Built on a standard Data Domain controller, DD Archiver leverages existing Data Domain technology to enable high throughput of up to 9.8 TB/hr. DD Archiver is cost-optimized for long-term retention of backup and archive data—up to a total of 570 TB usable or 28.5 PB logical capacity (assuming a 50:1 deduplication ratio). In addition, the system offers the unique combination of low cost per gigabyte while still maintaining high throughput. Finally, new fault isolation capabilities ensure long-term recoverability of archive units. All of this leverages existing Data Domain system advantages, including support for network-efficient replication with DD Replicator as well as DD Retention Lock for enforcing file retention. In addition, Data Domain’s Data Invulnerability Architecture ensures data integrity for the life of the system. The combination of high-throughput, cost-optimized storage built on proven Data Domain system technology makes DD Archiver the perfect tape replacement solution. Like other Data Domain systems, Data Domain Archiver includes a controller and storage shelves, referred to as the “active tier” in this system. The active tier can be expanded to up to four storage shelves (142 TB of usable capacity), and it is used for short-term (generally less than 90 days) retention of backup and archive data. In addition, DD Archiver also incorporates an “archive tier” with up to 23 additional storage shelves (474 TB of usable capacity). Built on a standard Data Domain controller, DD Archiver leverages existing Data Domain technology to enable high throughput of up to 9.8 TB/hr. DD Archiver is cost-optimized for long-term retention of backup and archive data—up to a total of 570 TB usable or 28.5 PB logical capacity (assuming a 50:1 deduplication ratio). In addition, the system offers the unique combination of low cost per gigabyte while still maintaining high throughput. Finally, new fault isolation capabilities ensure long-term recoverability of archive units. All of this leverages existing Data Domain system advantages, including support for network-efficient replication with DD Replicator as well as DD Retention Lock for enforcing file retention. In addition, Data Domain’s Data Invulnerability Architecture ensures data integrity for the life of the system. The combination of high-throughput, cost-optimized storage built on proven Data Domain system technology makes DD Archiver the perfect tape replacement solution.

    18. Industry’s Most Scalable Inline Deduplication Systems Here’s a look at the latest Data Domain product family, including the recently introduced DD160, DD620, and DD640.Here’s a look at the latest Data Domain product family, including the recently introduced DD160, DD620, and DD640.

    19. Deduplication Storage Evaluation Criteria The next section will focus on deduplication storage evaluation criteria.The next section will focus on deduplication storage evaluation criteria.

    20. Methodology: Inline vs. Post-Process Deduplication One of the most conventional alternatives to the Data Domain inline deduplication storage system approach (shown on the left) is by using a methodology known as post-process (shown on the right). In the post-process architecture, data is stored to a disk before deduplication. Then after it is stored, it is read back internally, deduplicated, and written again to a different area. Although this approach may sound appealing because it seems as if it would allow for faster backups and the use of less resources, it actually creates two problems: First, a lot more disk is needed to store the multiple pools of data, and for speed, because most of the other vendor’s deduplication approaches are spindle-bound. Because of this, there’s typically a factor of three or four more disks in a post-process configuration than you’ll see in a Data Domain deployment. Second, it’s just a lot simpler to use an inline approach. If data is all filtered before it’s stored to disk, then it’s just like a regular storage system—it just writes data; it just reads data. There’s no separate administration involved in managing multiple pools—some with deduplication, some with regular storage—and managing the conditions between them. Less administration in the storage system is always better. One of the most conventional alternatives to the Data Domain inline deduplication storage system approach (shown on the left) is by using a methodology known as post-process (shown on the right). In the post-process architecture, data is stored to a disk before deduplication. Then after it is stored, it is read back internally, deduplicated, and written again to a different area. Although this approach may sound appealing because it seems as if it would allow for faster backups and the use of less resources, it actually creates two problems: First, a lot more disk is needed to store the multiple pools of data, and for speed, because most of the other vendor’s deduplication approaches are spindle-bound. Because of this, there’s typically a factor of three or four more disks in a post-process configuration than you’ll see in a Data Domain deployment. Second, it’s just a lot simpler to use an inline approach. If data is all filtered before it’s stored to disk, then it’s just like a regular storage system—it just writes data; it just reads data. There’s no separate administration involved in managing multiple pools—some with deduplication, some with regular storage—and managing the conditions between them. Less administration in the storage system is always better.

    21. Performance: CPU-Centric vs. Spindle-Bound This slide shows another way to look at the virtues of being CPU-centric. As mentioned before, most of the deduplication competitors for backup targets are spindle-bound or disk-bound. It takes so many disk seeks to look up the information to tell whether data has been stored before or not, and to sort out and then minimize the data, that it takes a lot of disk drives or faster disk drives to get the job done. This slide shows what’s happened in our competitive environment as a result. If they’re using SATA disk drives, most deduplication storage vendors tend to need three or four times as many drives as a Data Domain system to store the same amount of deduplicated data. In some cases, for example, IBM’s ProtecTIER, storage systems use Fibre Channel drives instead of SATA. This can decrease the seek time, but it comes at a significantly higher cost. Data Domain systems, by being CPU-centric and minimizing disk usage to only what is required to store the actual data, end up having a smaller footprint. This can look like a weakness, but it’s actually a strength. By keeping the costs down, the Data Domain system is much better structured to compete on a comparison, for example, with a tape library, and its cost per gigabyte. This slide shows another way to look at the virtues of being CPU-centric. As mentioned before, most of the deduplication competitors for backup targets are spindle-bound or disk-bound. It takes so many disk seeks to look up the information to tell whether data has been stored before or not, and to sort out and then minimize the data, that it takes a lot of disk drives or faster disk drives to get the job done. This slide shows what’s happened in our competitive environment as a result. If they’re using SATA disk drives, most deduplication storage vendors tend to need three or four times as many drives as a Data Domain system to store the same amount of deduplicated data. In some cases, for example, IBM’s ProtecTIER, storage systems use Fibre Channel drives instead of SATA. This can decrease the seek time, but it comes at a significantly higher cost. Data Domain systems, by being CPU-centric and minimizing disk usage to only what is required to store the actual data, end up having a smaller footprint. This can look like a weakness, but it’s actually a strength. By keeping the costs down, the Data Domain system is much better structured to compete on a comparison, for example, with a tape library, and its cost per gigabyte.

    22. Data Domain Systems Trajectory Data Domain SISL Scaling Architecture: CPU-centric Due to their unique architecture, Data Domain systems have continued to scale in significant ways. The red line on this slide shows how throughput and addressable capacity have increased over time in the flagship Data Domain system for industry standard data access protocols such as NFS, CIFS, and VTL. However, EMC changed the game with two introductions—Data Domain Boost, represented by the green line, and multi-controller systems like the Data Domain Global Deduplication Array, represented by the blue line, both of which bumped this trajectory onto a whole new level. The key to this scalability is the Data Domain CPU-centric architecture, referred to as Stream-Informed Segment Layout (SISL), which relies on CPU processing power, rather than disk spindles, to scale performance. This allows Data Domain systems to scale in performance every time Intel introduces a new CPU, versus competitors that rely on disk vendors to increase performance over time. Therefore, the benchmarks shown here are accomplished with the minimum hardware, unlike disk-based competitors who have to add extra disk for performance reasons. Since 2004, Data Domain systems have increased 175-times in throughput and 450-times in capacity and EMC expects this trajectory to continue over time. Therefore, as CPUs get faster, Data Domain systems can get faster. And as they gets faster, they get bigger and can protect more data on a single system. (Note to Presenter: Therefore, your opportunity to sell Data Domain systems into large enterprise accounts will continue to grow for years to come.)Due to their unique architecture, Data Domain systems have continued to scale in significant ways. The red line on this slide shows how throughput and addressable capacity have increased over time in the flagship Data Domain system for industry standard data access protocols such as NFS, CIFS, and VTL. However, EMC changed the game with two introductions—Data Domain Boost, represented by the green line, and multi-controller systems like the Data Domain Global Deduplication Array, represented by the blue line, both of which bumped this trajectory onto a whole new level. The key to this scalability is the Data Domain CPU-centric architecture, referred to as Stream-Informed Segment Layout (SISL), which relies on CPU processing power, rather than disk spindles, to scale performance. This allows Data Domain systems to scale in performance every time Intel introduces a new CPU, versus competitors that rely on disk vendors to increase performance over time. Therefore, the benchmarks shown here are accomplished with the minimum hardware, unlike disk-based competitors who have to add extra disk for performance reasons. Since 2004, Data Domain systems have increased 175-times in throughput and 450-times in capacity and EMC expects this trajectory to continue over time. Therefore, as CPUs get faster, Data Domain systems can get faster. And as they gets faster, they get bigger and can protect more data on a single system. (Note to Presenter: Therefore, your opportunity to sell Data Domain systems into large enterprise accounts will continue to grow for years to come.)

    23. Why Data Domain? Less disk to resource, less to manage CPU-centric deduplication Inline deduplication Simple, mature, and flexible Simple, mature appliance Any fabric, any software, backup or archive applications Resilience and disaster recovery Storage of last resort Fast time-to-disaster recovery (DR) readiness Cross-site global compression Data center or remote office Why Data Domain? To summarize, it starts from economics. There’s less disk to resource and less to manage. The CPU-centric deduplication approach of SISL Scaling Architecture allows the system to be simpler to manage as well as easier to provision and green. In addition, Data Domain is more mature and flexible than most of its competitors. Data Domain has been sold longer, and all the problems that most of EMC’s competitors are just starting to discover have been fixed. It works as advertised, and that alone is highly differentiated in this particular category. Finally, because of their resilience and replication flexibility, Data Domain systems not only work as advertised but work reliably. Why Data Domain? To summarize, it starts from economics. There’s less disk to resource and less to manage. The CPU-centric deduplication approach of SISL Scaling Architecture allows the system to be simpler to manage as well as easier to provision and green. In addition, Data Domain is more mature and flexible than most of its competitors. Data Domain has been sold longer, and all the problems that most of EMC’s competitors are just starting to discover have been fixed. It works as advertised, and that alone is highly differentiated in this particular category. Finally, because of their resilience and replication flexibility, Data Domain systems not only work as advertised but work reliably.

    24. Thank you.Thank you.

More Related