ibm protectier deduplication solutions n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
IBM ProtecTIER Deduplication Solutions PowerPoint Presentation
Download Presentation
IBM ProtecTIER Deduplication Solutions

Loading in 2 Seconds...

play fullscreen
1 / 41

IBM ProtecTIER Deduplication Solutions - PowerPoint PPT Presentation


  • 176 Views
  • Uploaded on

IBM ProtecTIER Deduplication Solutions. Stanislav Dzúrik IBM FTSS Storage stanislav_ dzurik@sk.ibm.com. too much. got data?. And not enough ( blank ) to store it all?. Time Money People Floor Space Electricity Air Conditioning. Protect More. Store Less.®.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'IBM ProtecTIER Deduplication Solutions' - leoma


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ibm protectier deduplication solutions

IBM ProtecTIER Deduplication Solutions

Stanislav Dzúrik

IBM FTSS Storage

stanislav_ dzurik@sk.ibm.com

slide2

too much

got

data?

And not enough (blank) to store it all?

Time Money People Floor Space Electricity Air Conditioning

Protect More. Store Less.®

slide3

The tidal wave of data continues …

  • The amount of digital information continues to grow exponentially
  • And we need to keep more of it, longer
  • And the costs of losing data are increasingly unacceptable
    • Lost revenues
    • Lost customer confidence
    • Embarrassment in the market
    • Fines from contracts, government agencies
    • CEO and CFO could go to jail
  • But budgets are not increasing

2005

2006

2007

2008

2009

2010

Data created and copied is expected to grow at 48% CAGR through 2010

We Need to do More with Less, and we need to do it smarter

Source: Various external consultant reports

Protect More. Store Less.®

slide4

Survey - what are your two biggest storage pain points?

* TheInfoPro Storage Study: F1000 Sample. n=149. Other n=14. *Multiple responses recorded

Protect More. Store Less.®

storage efficiency strategies and best practices

Storage efficiency strategies and best practices

  • Stop storing so much
  • Move data to the right place
  • Store more with what’s on the floor

Protect More. Store Less.®

a set of essential technologies enables storage efficiency

A set of essential technologies enables storage efficiency

  • Stop storing so much
  • Data Compression
  • Data Deduplication
  • Move data to the right place
  • Automated Tiering
  • Automated Data Migration
  • Store more with what’s on the floor
  • Storage Virtualization
  • Thin Provisioning

Protect More. Store Less.®

slide7

The pressures on backup administrators are growing

More new data coming

Backup takes longer

Growth

Backup

Manage

Recover

Can’t buy more storage

Recovery takes longer

Protect More. Store Less.®

slide8

Using the right balance of high density tapeand high performance disk will help . . .

  • Long Term Retention
    • Cost effective capacity
    • Removable & transportable
  • Compliance
    • Meet financial & regulatory requirements
    • Data encryption, WORM
  • Short Term Retention
    • Use disk for daily backup & restore operations
  • Performance
    • Fast backups
    • Even faster restores
    • Meet “backup windows”

Protect More. Store Less.®

compression and deduplication use less physical storage

Compression and Deduplication use less physical storage

  • Store data more efficiently
  • Lower Operating Expenses: Power, cooling, floor space
  • Keep more data online for analytics and fast restores

Protect More. Store Less.®

slide10

And data deduplication is the key to using more disk more cost effectively!

Protect More. Store Less.®

deduplication architectures

Deduplication Architectures

Storage Devices

Server

Client

LAN

SAN

  • Block Storage Device
    • Transparent to clients and servers
    • Reduces load on server and client
    • Adds load to storage device
    • No file or format awareness
  • Server side
    • Allows cross correlation of data among multiple clients
    • Adds load to server
  • Client side
    • Reduce load on server
    • Reduces bandwidth on LAN
    • Adds load to client
    • No cross correlation among multiple clients

Protect More. Store Less.®

data deduplication process simplified

Data Deduplication Process (simplified)

Assume a Data-Object or -Stream as Subject for deduplication

Data Object / Stream

Data Object is split in Chunks (fixed or variable size)

For each Chunk an identity characteristic is determined

  • Duplicate chunks are identified
    • Identical Chunks are referenced with pointers, references.
    • Non-identical chunks or single instances are effectively stored
    • Compression may be performed in addition.

A

B

C

D

A

E

F

F

D

B

A

F

A

B

C

D

E

F

Required Disk-Cache is reduced

Identical Chunks

Protect More. Store Less.®

methods for data chunking

Data Object / Stream

Methods for Data Chunking

File based

One chunk is one file, most appropriate for file systems

E.g. TSM Incremental Backup forever helps eliminate redundant data

Fixed block

Data object is split into fixed blocks

Used by block storage devices

Format Aware

Understands explicit data formats and chunk data object according to format

Example: breaking a PowerPoint deck into separate slides

Format agnostic

Chunking is based on algorithm that looks for logical breaks or similar elements within a data object

Chunking method influences deduplication ratio

Protect More. Store Less.®

method for determining duplicates

A

B

C

D

A

E

F

F

D

Method for Determining Duplicates

A

B

C

D

E

F

Hashing

Computes a hash (MD-5, SHA-256) for each data chunk

Compares hash with all hash of existing data

Identical hash means most likely identical data

Potential (small) Risk of Hash Collisions: identical hash and non identical data

Must be prevented through secondary comparison (additional metadata, second hash method, binary comparison)

Binary Comparison

Compares all bits of similar chunks

Delta Differencing

Computes a “delta” between two “similar” chunks of data where one chunk is the baseline and the second is the delta

Since each delta is unique there is no possibility of collision

To reconstruct the original chunk the delta(s) have to be re-applied to the baseline chunk

Protect More. Store Less.®

in line deduplication

In-Line Deduplication

Deduplication

VTL

Backup

Secondary

Storage

Primary

Storage

Data is deduplicated before it is actually stored

Deduplication is performed as data flows into the secondary storage system

  • Advantages
    • Processes data once, eliminates additional post-processing tasks
  • Disadvantages
    • CPU intensive deduplication process can create performance bottleneck
    • One process per I/O stream

Protect More. Store Less.®

out band deduplication post processing

Out-Band Deduplication (Post-Processing)

Deduplication

VTL

Backup

Secondary

Storage

Secondary

Storage

Primary

Storage

  • Advantages
    • De-duplication CPU overhead no longer affects backup window
    • Supports multiple I/O streams
    • Potentially faster restore for first version (not deduplicated)
  • Disadvantages
    • Data is written, read and written – thus more I/O intensive
    • Deduplication window must be coordinated with backup window as it take typically longer than in-line processing
    • Requires larger secondary storage because first version is not deduplicated

Data is first stored and deduplicated in the background

Protect More. Store Less.®

3 deduplication in the ibm portfolio

3 × Deduplication in the IBM Portfolio

FileLUN

Tape

TSMAPI

ProtecTIER

A-SIS

TSM R6

N seriesGateway

TS7650G

Protect More. Store Less.®

protectier reduces the required backup disk capacity by up to 25 times

Protect More. Store Less.®

ProtecTIER reduces the required backup disk capacity by

up to 25 times!

Protect More. Store Less.®

slide21

IBM ProtecTIER Deduplication Innovation and Leadership

2003

2004

2005

2006

2007

2008

2009

2011

2010

6 PhDs begin researching massively scalable

deduplication algorithms

First Deduplication Virtual Tape Library deployed into production

First single node system to store over 1PB of deduplicated data

Fastest single node inline deduplication solution

First to deliver Many-to-Many replication

The only “true” enterprise-class deduplication solution on the market today

IBM acquires Diligent

First Deduplication solution for System z

Fastest restore speed – up to 2800 MB/sec!

First non-hash deduplication algorithm developed, designed for 100% data integrity

First to deliver VTL solutions for both Open and Mainframe environments

First true clustered system with Global Deduplication

IBM’s first midrange solution released

Installed in all major industries

Over 1,400 ProtecTIER systems sold to date

Production systems range in size from 5TB to over 700TB

Over 90 PB of physical disk capacity behind ProtecTIER servers in production protecting thousands of PBs of backup data

Protect More. Store Less.®

ibm s virtual tape de duplication sw products

IBM’s Virtual Tape De-duplication SW Products

  • ProtecTIER
  • ProtecTIER VT is a scalable and robust virtual tape solution that emulates tape libraries, enabling existing backup applications to send data to the ProtecTIER disk-based platform, rather than directly to tape.
  • HyperFactor
  • HyperFactor is a revolutionary de-duplication solution which eliminates redundant data, enabling customers to increase their effective capacity by up to 25 times. ProtecTIER is powered by HyperFactor and can radically reduce both physical disk capacity and total storage costs.

Protect More. Store Less.®

slide23

How ProtecTIER works

Repository

New Data Stream

HyperFactor™

MemoryResident Index

ProtecTIER™

Server

  • Backup with Inline deduplication
  • Up to 1400MB/sec per server or 2000MB/sec with 2 node cluster!
  • Only 4GB needed to map
  • 1PB of physical disk!

Backup Servers

“Filtered” data

Protect More. Store Less.®

slide24

ProtecTIER Deduplication Operation and Results Example

  • Backup application writes data to ProtecTIER as it would to tape
  • Only unique data is stored, existing duplicate data is referenced
  • When data objects expire, references are removed and free space is reclaimed and reused

Backup Amount Amount Dedupe

Event Received Stored Ratio

First Full Backup 1 TB 250 GB 4:1

Incremental Backup 100 GB 10 GB 4.2:1

Incremental Backup 100 GB 10 GB 4.4:1

1

2

3

4

5

Second Full Backup 1 TB 10 GB 7.8:1

Incremental Backup 100 GB 10 GB 8:1

Third Full Backup 1 TB 10 GB 11:1

After two months . . . 7.8 TB 350 GB 22:1

A

B

C

D

E

F

G

H

I

J

Protect More. Store Less.®

slide25

Storage Impact from ProtecTIER Deduplication

Represented capacity

Master Server

Backup Server

ProtecTIER Server

Physical capacity

Store up to 25 times backup data on given physical storage capacity

Protect More. Store Less.®

slide26

Significantly Reduces Replication Bandwidth

Represented capacity

Backup Server

Primary Site

ProtecTIER

Gateway

Physical capacity

Deduplication enables a large amounts of data to be replicated with significantly less bandwidth

Backup Server

IP-based WAN link

Secondary Site

Virtual cartridges can be cloned to tape at DR site

Physical capacity

ProtecTIER Gateway

Backup Server

Tape library

Protect More. Store Less.®

slide27

ProtecTIER Many-to-One Replication Overview

Up to 12Branch Offices (spokes): Gateways and/or Appliances

1 target (hub): Appliance, Gateway, single or two-node cluster

IP based NR links

Backup Server

Virtual cartridges can be cloned to tape by the Main-Site B/U server

Physical capacity

ProtecTIER Gateway

Central / DR Site

Tape library

Protect More. Store Less.®

slide28

ProtecTIER Many-to-Many Native Replication Grid

Site A

Up to 4 hubs in a grid

Site B

Site C

Site D

Backup Server

Physical capacity

ProtecTIER Gateway

Supports any combination of Gateways, Appliances, single or two-node clusters

Protect More. Store Less.®

slide29

ProtecTIER Support for Symantec OpenStorage (OST)

  • OST API separates the backup logic from the storage appliance logic and implementation

NetBackupPolicy and Control

NetBackup Server

OpenStorage API

ProtecTIER OST Plugin

IBM ProtecTIER:Backup storage appliance with Deduplication and Native Replication

ProtecTIER Server

Protect More. Store Less.®

slide30

IBM ProtecTIER® Deduplication Family

Scalable Capacity and Performance

TS7650G & TS7680 ProtecTIER Gateways

TS7650 ProtecTIER Appliances

Highest Performance

Largest Capacity

High Availability

TS7610 ProtecTIER Appliance Express

Better Performance

Larger Capacity

Scalable

Good Performance

Entry Level

Easy to Install

Backup: Up to 2000 MB/sec

Restore: Up to 2800 MB/sec

Up to 1 PB Useable Capacity

Up to 500 MB/sec

7 TB to 36 TB Useable Capacity

Up to 100 MB/sec

4 TB and 5.4 TB Useable Capacity

Protect More. Store Less.®

protectier advantage data integrity

ProtecTIER Advantage: Data Integrity

  • Unique and patented HyperFactor® deduplication technology
  • The only production proven deduplication solution not based on a hash algorithm
  • Designed for 100% data integrity
  • Bit for bit comparison of data to ensure data is a duplicate
  • Can NEVER lose data due to a hash collision

Although the chance of losing data from a hash collision is low, it is NOT ZERO as it is with a ProtecTIER solution

Protect More. Store Less.®

protectier advantage restore performance

ProtecTIER Advantage: Restore Performance

  • Restoring data from a ProtecTIER solution is even FASTER than backing up
  • ProtecTIER can easily restore at 2800MB/sec!
  • High restore performance not limited to certain backup applications or specific data sets like other vendors
  • High restore performance achieved on real data with realistic 20% change rate in production environments
  • Never requires agents on backup servers

Other vendor’s “CPU-centric” architectures are optimized for processing hashes not moving data

Protect More. Store Less.®

protectier advantage scalability

A single ProtecTIER system can support up to 1 Petabyte of useable capacity

  • ProtecTIER supports the use of any IBM storage system (DS8000, DS5000, XIV, etc.) and most third party storage systems for the repository
  • IBM has hundreds of ProtecTIER systems with over 100TBs of useable capacity in production environments throughout the world
  • IBM always states “Useable Capacity” and never uses the deceptive “RAW capacity” terms like other vendors

ProtecTIER Advantage: Scalability

The hidden costs associated with managing, maintaining, powering

and cooling multiple appliances is significant and should not be ignored!

Protect More. Store Less.®

protectier advantage global deduplication

ProtecTIER Cluster with true Global Deduplication has been Generally Available and in production since 2008

  • Supported with all major backup applications and available for all Open Systems, System z and System I platforms
  • No agents or backup server upgrades required
  • Other vendor’s Global Deduplication capabilities are immature and incomplete with very few if any systems in production
  • Other vendor’s Global Dedupe restricted to certain models, only with NetBackup OST and require agents to be installed

ProtecTIER Advantage: Global Deduplication

Many vendors claim to have Global Deduplication but create multiple separate repositories that may contain redundant data!

Protect More. Store Less.®

protectier advantage inline deduplication

ProtecTIER Advantage: Inline Deduplication

Post Process Approach: Deduplicate after Storing

  • Requires:
    • > storage
    • > I/Os
    • > Time
    • > Effort
    • > Admin

Hash-based

Post Process

Write 10 TB

10 TB Data

2x

Read 10 TB

Example: Disk activity needed to ingest and deduplicate 10 TBs of backup data

ProtecTIER Inline Approach: Deduplicate before Storing

  • Results:
    • simple
    • faster
    • easier
    • cheaper
    • efficient

Read or Write 10 TB

10 TB Data

HyperFactor

1x

Protect More. Store Less.®

slide37

ProtecTIER Advantage: Inline Deduplication

Inline Processing

Backup

Server

Truck

ProtecTIER VT

Tape Library

SLA is Met

Dedupe

8:00 PM

2:00 AM

8:00 AM

8:00 PM

Post Processing

Dedupe

Overlap

Truck

Backup

Server

VTL

Tape Library

Dedupe

8:00 PM

8:00 PM

2:00 AM

8:00 AM

Protect More. Store Less.®

slide38

With an IBM ProtecTIER Solution you can . . .

  • Store up to 25 times more data on disk
    • Up to 25:1 reduction with 100% data integrity
  • Reduce backup and restore times
    • Fast inline deduplication up to 2000 MB/sec
    • Even faster restores up to 2800 MB/sec
  • Improve the reliability of backup operations
    • Eliminates mechanical & handling failures
  • Drive the cost of disk based backup down
    • Reduces energy, cooling, and space required
  • Increase data retention
    • Store more backup data on disk for a longer time with very little additional cost

Protect More. Store Less.®

ibm customers the main protectier web page www ibm com systems storage tape protectier

For More Information on IBM’s ProtecTIER

IBM Customers

The main ProtecTIER Web Page

www.ibm.com/systems/storage/tape/protectier

Protect More. Store Less.®

trademarks and disclaimers

Trademarks and Disclaimers

8 IBM Corporation 1994-2011. All rights reserved.

References in this document to IBM products or services do not imply that IBM intends to make them available in every country.

Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml.

Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered

trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

Information is provided "AS IS" without warranty of any kind.

The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.

Photographs shown may be engineering prototypes. Changes may be incorporated in production models.

Protect More. Store Less.®

slide41

Ďakujem za pozornosť

Protect More. Store Less.®