1 / 58

Essentials of Test Data Management

Essentials of Test Data Management. Presented by: Michael Katalinich katalini@us.ibm.com 404-395-6416. Section 1: Effective Test Data Management (TDM) for Improving Costs & Efficiency. . Disclaimer.

dana
Download Presentation

Essentials of Test Data Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Essentials of Test Data Management Presented by: Michael Katalinich katalini@us.ibm.com 404-395-6416

  2. Section 1: Effective Test Data Management (TDM) for Improving Costs & Efficiency ...

  3. Disclaimer This presentation is intended to provide general background information, not regulatory, legal or other advice. IBM cannot and does not provide such advice. Readers are advised to seek competent assistance from qualified professionals in the applicable jurisdictions for the types of services needed, including regulatory, legal or other advice.

  4. Outline • TDM – What is it? Why is it important? • Current approaches • Key requirements for an effective approach for TDM

  5. Enterprise ApplicationSnapshot in Time Development Unit Testing V3 QA Regression Testing Production Version 1 V2 User Acceptance Testing V2 Development and QA Environments

  6. Multiple “Consumers” For Test Environments Internal and External “consumers” such as off-shore teams and partners

  7. Multiple Requirements for Test Environments • Functionality • Features and capabilities • Performance • Speed, availability, tolerance for load • Usability • Ease with which the software can be employed • Security • Vulnerability to unauthorized usage • Compliance • Conformance to internal standards or external regulations

  8. Key Business Goals • Reduce Business Downtime • Get to Market Faster • Maximize Process Efficiencies • Improve Quality

  9. Some Key Considerations • Infrastructure Costs – higher HW storage costs • Development Labor - higher costs • Defects – Can be expensive • Cost to resolve defects in the production environment can be 10 – 100 time greater than those caught in the development environment • Data Privacy/Compliance • Data breaches can put you out of business

  10. Test Data Management Strategy and approach to creating and managing test environments to meet the needs of various stakeholders and business requirements.

  11. #1 - Clone Production #2 - Write SQL Clone Production Write SQL • Complex • Subject to • Change Request for Copy Extract Wait After Production Database Copy Production Database Copy Changes Extract After Changes Manual examination: Right data? What Changed? Correct results? Unintended Result? Someone else modify? Expensive, Dedicated Staff, Ongoing Responsibility. • RI Accuracy? • Right Data? Share test database with everyone else Current Approaches

  12. Cloning And Data Multiplier Effect 1. Production 500 GB 2. Training 500 GB 3. Unit 500 GB 4. System 500 GB 5. UAT 500 GB 6. Integration 500 GB Total 3,000 GB 2 3 1 6 4 5

  13. Some Key Issues With Current Approaches • Cloning can create duplicate copies of large databases • Large storage requirements and associated expenses • Time consuming to create • Difficult to manage on an on-going basis • Data privacy not addressed • Internally developed approaches not cost effective • Lengthy development cycles • Dedicated staff • On-going maintenance

  14. Using Subsetting For Effective TDM PROD CLONED PROD Database resized* and re-indexed REDUCED CLONE GOLD Extract & Load TRAINING TEST DEV The cloning is performed only once!

  15. Effective Test Data Management Solution • Subsetting capabilities to create realistic and manageable test databases • De-identify (mask) data to protect privacy • Quickly and easily refresh test environments • Edit data to create targeted test cases • Audit/Compare ‘before’ and ‘after’ images of the test data

  16. Key Aspects of an Effective TDM Approach . . . Test Environment Production Database Production Database Test Database Test Database Test Environment Test Environment Production Environment Subset De-Identify? Refresh Analyze

  17. Legacy CRM Oracle ERP Subset: Key Capabilities • Precise subsets to build realistic “right-sized” test databases • Application Aware • Flexible criteria for determining record sets • Business Logic Driven • Complete Business Object: Referentially intact subsets • Across heterogeneous environments DB2 Order Entry

  18. CUSTOMERS 19101 02134 27645 Joe Pitt John Jones Karen Smith Subset: Complete Business Object Cust_ID is Primary Key • Referentially-intact subset of data • Example: All Open –DN Call Back related to Cust_ID 27645 (Karen Smith) ORDERS 27645 80-2382 20 June 2006 27645 86-4538 10 October 2006 DETAILS 86-4538 DR1001 System Outage 86-4538 CL2010 Broken Cup Holder

  19. CUSTOMERS 19101 02134 27645 Joe Pitt John Jones Karen Smith DETAILS 86-4538 DR1001 System Outage 86-4538 CL2010 Broken Cup Holder Subset: Complete Business Object • ITEMS is a “Reference Table” ORDERS 27645 80-2382 20 June 2006 2764586-4538 10 October 2006 ITEMS DR1001 Widget #1 25.00 CL2010 Widget #PG13 30.00 CM3002 Widget#45 28.00

  20. Data De-Identification Production Test Validate and Compare • De-identify for privacy protection • Deploy multiple masking algorithms • Substitute real data with fictionalized yet contextually accurate data • Provide consistency across environments and iterations • No value to hackers • Enable off-shore testing Ensure Data Privacy Across Non-Production Environments!

  21. AP_INVOICES AP_INVOICES -- ---- ---- ---- ------- ------ ---- ---- ---- ------- ---- -- ---- ---- ---- ------- ------ ---- ---- ---- ------- ---- INVOICE DIST INVOICE DIST -- -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ---- -- -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ------ -- ------ -- --------- ---- ACCT EVENTS ACCT EVENTS -- ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ---- -- ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ------ ---- ---- ---- ------- ---- Refresh TESTDB • Load test environment with precise set of data • Subset further as required • Load utility for large volumes of data • Easily refresh environments • Insert • Update • Load Subset QADB

  22. INVOICES 2764586-4538 Widget#1 $80.00 27645 86-4538 Widget#PG13 $20.00 Analyzing Test Data Version 1 • Both Invoices total $100 • Composition is different • Could an error have been missed? Invoice Total $100.00 Version 2 INVOICES 2764586-4538 Widget#1$50.00 27645 86-4538 Widget#PG13 $50.00 Invoice Total $100.00

  23. COMPARE FILE Analyze Test Data SOURCE 1 • Compare the "before" and "after" data from an application test • Compare results after running modified application during regression testing • Identify differences between separate databases • Audit changes to a database • Compare should analyze complete sets data– finding changes in rows in tables • Single-table or multi-table compare • Compare file of results • Edit Data to Create Test Cases COMPAREPROCESS SOURCE 2

  24. Effective TDM: Example ROI Benefits Projected ROI = 504% (3 years), Payback Period = 13 months

  25. Summary: An Effective TDM Solution • Ability to extract precise subsets of related data to build realistic, “right-sized” test databases • Complete business object • Create referentially intact subsets • Flexible criteria for determining record sets • De-identify sensitive data in the test environment to ensure compliance with regulatory requirements for data privacy • Easily refresh test environments • Analyze test data.

  26. Section 2: Data Privacy....Closing the Gap

  27. Agenda • The Latest on Data Privacy • The Easiest Way to Expose Private Data • Understanding the Insider Threat • Considerations for a Privacy Project • Success Stories No part of this presentation may be reproduced or transmitted in any form by any means, electronic or mechanical, including photocopying and recording, for any purpose without the express written permission of IBM

  28. The Latest on Data Privacy • 2007 statistics • $197 • Cost to companies per compromised record • $6.3 Million • Average cost per data breach “incident” • 40% • % of breaches where the responsibility was with Outsourcers, contractors, consultants and business partners • 217 Million • TOTAL number of records containing sensitive personal information involved in security breaches in the U.S. since 2005 * Sources”: Ponemon Institute, Pirvacy Rights Clearinghouse, 2007

  29. Did You Hear? • UK gov’t suffered a massive data breach in Nov. 07 • HMRC (Her Majesty's Revenue & Customs) UK equivalent to IRS • Lost 2 disks containing personal information on 25 million people (ALMOST ½ of UK population!) • Information has a criminal value of $3.1 Billion • No reported criminal activity to date

  30. How much is personal data worth? • Credit Card Number With PIN - $500 • Drivers License - $150 • Birth Certificate - $150 • Social Security Card - $100 • Credit Card Number with Security Code and Expiration Date - $7-$25 • Paypal account Log-on and Password - $7 Representative asking prices found recently on cybercrime forums. Source: USA TODAY research 10/06

  31. Where do F1000 Corporations Stand today?

  32. Cost to Company per Missing Record: $197 Over 100 million records lost at a cost of $16 Billion. Source: Ponemon Institute

  33. Where is Confidential Data Stored? [1] ESG Research Report: Protecting Confidential Data, March, 2006.

  34. What is Done to Protect Data Today? • Production “Lockdown” • Physical entry access controls • Network, application and database-level security • Multi-factor authentication schemes (tokens, biometrics) • Unique challenges in Development and Test • Replication of production safeguards not sufficient • Need “realistic” data to test accurately

  35. The Easiest Way to Expose Private Data …Internally with the Test Environment • 70% of data breaches occur internally (Gartner) • Test environments use personally identifiable data • Standard Non-Disclosure Agreements may not deter a disgruntled employee • What about test data stored on laptops? • What about test data sent to outsourced/overseas consultants? • How about Healthcare/Marketing Analysis of data? • Payment Card Data Security Industry Reg. 6.3.4 states, “Production data (real credit card numbers) cannot be used for testing or development” * The Solution is Data De-Identification *

  36. The Latest Research on Test Data Usage • Overall application testing/development • 62% of companies surveyed use actual customer data instead of disguised data to test applications during the development process • 50% of respondents have no way of knowing if the data used in testing had been compromised. • Outsourcing • 52% of respondents outsourced application testing • 49% shared live data!!! • Responsibility • 26% of respondents said they did not know who was responsible for securing test data Source: The Ponemon Institute. The Insecurity of Test Data: The Unseen Crisis

  37. What is Data De-Identification? • AKA data masking, depersonalization, desensitization, obfuscation or data scrubbing • Technology that helps conceal real data • Scrambles data to create new, legible data • Retains the data's properties, such as its width, type, and format • Common data masking algorithms include random, substring, concatenation, date aging • Used in Non-Production environments as a Best Practice to protect sensitive data

  38. Failure Story – A Real Life Insider Threat • 28 yr. old Software Development Consultant • Employed by a large Insurance Company in Michigan • Needed to pay off Gambling debts • Decided to sell Social Security Numbers and other identity information pilfered from company databases on 110,000 Customers • Attempted to sell data via the Internet • Names/Addresses/SS#s/birth dates • 36,000 people for $25,000 • Flew to Nashville to make the deal with….. • The United States Secret Service (Ooops) Results: • Sentenced to 5 Years in Jail • Order to pay Sentry $520,000

  39. How is Risk of Exposure being Mitigated? • No laptops allowed in the building • Development and test devices • Do not have USB • No write devices (CD, DVD, etc.) • Employees sign documents • Off-shore development does not do the testing • The use of live data is ‘kept quiet’

  40. Encryption is not Enough • DBMS encryption protects DBMS theft and hackers • Data decryption occurs as data is retrieved from the DBMS • Application testing displays data • Web screens under development • Reports • Date entry/update client/server devices • If data can be seen it can be copied • Download • Screen captures • Simple picture of a screen

  41. Strategic Issues for Implementing Data Privacy

  42. Data Masking Considerations • Establish a project leader/project group • Determine what you need to mask • Understand Application and Business Requirements • Top Level Masking Components • Project Methodology

  43. Data Masking Consideration – Step 1 • Establish a Project Leader/Group • Many questions to be answered/decisions to be made • Project Focus • Inter-Departmental Cooperation • Use for additional Privacy Projects

  44. Data Masking Consideration – Step 2 • Determine what you need to mask • Customer Information • Employee Information • Company Trade Secrets • Other

  45. Data Masking Consideration – Step 3 • Understand Application and Business Requirements • Where do applications exist? • What is the purpose of the application(s)? • How close does replacement data need to match the original data? • How much data needs to be masked?

  46. Data Masking Consideration – Step 4Masking Components (Top Level) • Masking is not simple! • Many DBMS • Legacy Files • Multiple platforms • Needs to fit within existing processes • Not a point solution – consider the enterprise • Not a one time process

  47. Component A - Consistency • Masking is a repeatable process • Subsystems need to match originating • The same mask needs to be applied across the enterprise • Predictable changes • Random change will not work • Change all ‘Jane’ to ‘Mary’ again and again

  48. Customers Table Cust ID Name Street 08054 Alice Bennett 2 Park Blvd 19101 Carl Davis 258 Main 27645 Elliot Flynn96 Avenue Cust ID Item # Order Date 2764580-2382 20 June 2004 27645 86-4538 10 October 2005 Orders Table Propagating Masked Data • Key propagation • Propagate values in the primary key to all related tables • Necessary to maintain referential integrity

  49. Cust ID Name Street 08054 Alice Bennett 2 Park Blvd 19101 Carl Davis 258 Main 27645 Elliot Flynn96 Avenue Cust ID Name Street 10000 Auguste Renoir Mars23 10001 Claude Monet Venus24 10002Pablo PicassoSaturn25 Cust ID Item # Order Date 1000280-2382 20 June 2004 10002 86-4538 10 October 2005 Cust ID Item # Order Date 2764580-2382 20 June 2004 27645 86-4538 10 October 2005 Masking with Key Propagation Original Data De-Identified Data Customers Table Customers Table Referential integrity is maintained Orders Table Orders Table

  50. Component B - Context Client Billing Application • A single mask will affect ‘downstream’ systems • Column/field values must still pass edits • SSN • Phone numbers • E-mail ID • Zip code must match • Address • Phone area code • Age must match birth date DB2 Data is masked Masked fields are consistent

More Related