1 / 31

A system integration test approach mitigating the unavailability of good quality data in DWT

A system integration test approach mitigating the unavailability of good quality data in DWT. Saurabh Shinde Partha Majhi Infosys Limited (NASDAQ: INFY). Abstract.

amber-bowen
Download Presentation

A system integration test approach mitigating the unavailability of good quality data in DWT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A system integration test approach mitigating the unavailability of good quality data in DWT Saurabh Shinde ParthaMajhi Infosys Limited (NASDAQ: INFY)

  2. Abstract Retail banks have ever increasing challenge of servicing millions of customers with varied needs and hence posses the necessity to maintain sound functional systems which deliver on time. Some of the key challenges with performing testing related to such systems are proper knowledge of the interacting systems, the business awareness and availability of good quality test data. A bad production release might impact the overall functioning of the bank, affect the credit risk analyzers or even loose the trust of its customers. Hence it is imperative need of the quality management team for the bank to have system integration test approach that also gives due diligence to ensure good test data.

  3. Abstract (cont..) • We would like to share our experience with an approach related to such quality testing for banking organization whose need was to ensure that its data is Basel compliant. Typical challenges in such area of testing are lack of domain and technical expertise, stringent timeline for delivery with no expected issues, etc. Hence a testing approach is required which encompasses testing techniques that are driven by business rules and data, and possess means to validate with good quality test data.

  4. Target Audience • This tutorial can benefit: • Test Managersto plan approach for creating test data and performing system integration test. • Test Leads/Engineersto practice the approach to perform the system integration test along with creating valid test data

  5. Outline of the Tutorial • Introduction • Objectives of the session • User expectations • Context setting • Banking Operations • Overview and relation to context • Data warehouse testing • Overview and relation to context • Limitations of test data encountered in testing • Limitations • Managing test data • Approach to manage test data • System Integration test approach • System integration test approach • Case Study (application of the approach) • Case studies • Summarization • Closure • Q&A

  6. Objectives, Expectations… • Objectives of this session – is to prepare you for the situations that we come across daily in our testing life with respect to test data tend to ignore it. • Your Expectations for this session – It won’t solve all of your test data related woes but promise handle a handful that trouble you most.

  7. Banking Operations • Retail Banks provide varied payment services to its customer. • Personal accounts (checking, saving) • Cards (Credit, Debit) • Mortgage loans • Home equity loans • Personal loans • They have to manage the related data for day-to-day functioning. • They charge interests on the services and generate revenue which keeps them functioning. They possess risk of ‘failure to generate revenue’ in the event that the customer defaults. Hence, it is critical for every bank to analyze and manage their risks to keep them profitable. They process the data through such ‘risk application’ systems. Typically, the data would flow such as

  8. Banking Operations • The application systems are supported by data warehouses. Testing these systems requires having sound understanding of the data processing done by them.

  9. Data Warehouse testing • Typically, DWT project testing is done at 3 phases: • EXTRACT • TRANSFORM • LOAD

  10. Banking Operations • The testing that is to be done in EXTRACT and LOAD phase is predominantly process oriented and has no to little data dependency. A subset of production data will be sufficient to test these two phases. • The challenge comes when we are to test TRANSFORM phase. This is the process of altering data based on a set of business rules. • Simple conversion: Value of a field converted based on another. Eg: a currency field is converted from native currency to local currency value. • Complex conversion: Value of a field is derived based on relation between multiple fields. This might also necessitate joining different tables. • Filter out: Some data is filtered out based on defined business criteria. Eg: Duplicate records are dropped.

  11. Limitations of test data encountered in testing • To visualize an example, consider a transform process that has many logical branches based on which the Source data is transformed. • e.g.  If (0<= amount_field <100) set attribute_1 = 1 •           else If (100 <= amount_field <200) then attribute_1 = 2 •           else If (200 <= amount_field <300) then attribute_1 = 3 •           else If (300 <= amount_field <400) then attribute_1 = 4 •           else If (400 <= amount_field <500) then attribute_1 = 5 •          . •          . •          . •         Else If (1000 <= amount_field <1100) then attribute_1 = A • We require test data that could test all the conditions in order to validate the logical branches.

  12. Limitations of test data encountered in testing • To visualize an example, consider a transform process that has many logical branches based on which the Source data is transformed. • e.g.  If (0<= amount_field <100) set attribute_1 = 1 •           else If (100 <= amount_field <200) then attribute_1 = 2 •           else If (200 <= amount_field <300) then attribute_1 = 3 •           else If (300 <= amount_field <400) then attribute_1 = 4 •           else If (400 <= amount_field <500) then attribute_1 = 5 •          . •          . •          . •         Else If (1000 <= amount_field <1100) then attribute_1 = A • We require test data that could test all the conditions in order to validate the logical branches.

  13. Limitations of test data encountered in testing • But most often, we experience • Test data do not contain values to test all scenarios • Test data is not available for certain fields that have restricted permissions • Test data that is sourced from upstream may not be available in time for the testing cycle, leading to reduced testing window for the release • Test data is not production like, leading to possibly miss out on identifying specific production issues • Even if we get production sanitized test data, it may not provide coverage for all business rules covering the application domain • To mitigate this unavailability of test data, we need to manage the test data so that it resembles to production data and at same time it covers business scenarios and is based on our requirements.

  14. Managing test data • Managing test data is a process involving several activities. • Creating instances of test environment layers • Layer 1: Procuring data from production • Layer 2: Manipulating data as per test requirements • Layer 3: Storing data as ‘Regression Set’ for future regression use • Layer 4: Test instance where data would be made available for test • Procure data from Production • Analyze the test scope and identify the data that is to be fetched from production • Batch jobs programs could be created to fetch the data from production (with assist from Development team) into the Test Layer 1 instance • Handling production data is difficult if it is of huge volume, hence based on the test scope and test requirements, a proper sub-set should be chosen

  15. Limitations of test data encountered in testing • Masking the data for security • Customer sensitive information such as customer name, customer account number, customer address, customer phone number, customer identification number (country specific), etc should be masked in order to provide security for such confidential data. • Analyze data for test coverage • Analyze the data fetched from production to verify it is sufficient to cover all the business scenarios of the test requirements • Usually it is found that the data do not provide coverage for all business scenarios and performing test with such data potential leads to letting the defects move into production • In order to manipulate the data as per business scenarios under test, copy the Test Layer 1 instance data to Test Layer 2 instance • Data could directly be modified in Layer 2 or else if the volume of data is minimum, excel files could be used for modification

  16. Limitations of test data encountered in testing • Manipulating data as per test requirements • Study the business scenarios • Identify data that is nearest to match to test the business scenarios • Update or create test data as needed • The test data as per test requirements to cover the business scenarios is available in Layer 2 • Test data regression set • Identify regression test scenarios/test cases • Store data created for these in Layer 3 for future use • Create a mapping between test scenarios/business scenarios to test data

  17. Limitations of test data encountered in testing • Move test data to testing environment • Once the Layer 2 data is ready, it could be copied to the desired test environment instance (Layer 4) • Other salient activities • Timely Clean up mechanism should be planned • Layer 1 and Layer 2 instances serve temporary instances for creating the required test data • A periodical cleanup plan as applicable to the testing cycles should be planned

  18. Limitations of test data encountered in testing • New project initiative • If test is to perform for a completely new project and hence no production data is available, one should understand the business scenarios and create data meets the coverage • Multiple project requirements • Mechanism for provisioning simultaneous projects requiring different sets of data should be planned

  19. System Integration test approach • An integration test could be made more effective with the following approach

  20. Limitations of test data encountered in testing • Reference • Get the count of records sourced from the input source • Get the sum of critical amount fields • This will be form as the reference to check the loading process into staging area • The data at this stage is selected from production • Test 1 • Compare the count of records and sum of critical amount fields against the reference values • Take into account records that are designed to be dropped (eg: duplicate records or null value records, etc) • This validates the loading process into staging area • Manipulate the data to cover all business scenarios • This data forms the input to perform the functional test on the system • Get the count of records and sum of critical amount fields

  21. Limitations of test data encountered in testing • Test 2 • Compare the count of records and sum of critical amount fields against the values from staging area • Take into account records that are designed to be dropped or modified as per business design

  22. Limitations of test data encountered in testing • Perform thorough business functionality testing, some of the tests here would include • Straight move: • Valid values / lookup values validation: • Data derivation:

  23. Limitations of test data encountered in testing • Test 3 • Compare the count of records and sum of critical amount fields against the values from system intermediate area. In most systems, these should match • Perform tests for critical business functionalities • This is final system area which is available for business users and downstream applications • Benefits of the approach • Identification of defect earlier in the system process • Identification of precise defect origination stage within the system • Uncovering defects not just related to data, but also related to processes within the system • Early defect fix reduces cost to fix the defect, improves system quality and accelerates the release cycle • Gains confidence of the end user that the system is delivered as expected quality and functionality

  24. Case Study (application of the approach) Consider a credit risk reporting system of a Financial Institution that gets data from multiple sources and this data is transformed and used further for regulatory reporting. There exist numerous processes that are used to transform this incoming data. We explain here the difficulty that we faced when we tried to test a process whose purpose was to set a flag called Asset flag. Asset flag indicated what kind of asset we are dealing with. Based on combination of parameters, assets (in this case loan) used to be classified in to "Commercial" or "Retail" category. Asset flag was a very important flag as further BASEL calculation were done based on this flag's value. These calculations are very important for Regulatory reporting. These kinds of things make or break a bank in today's world. We have represented a subset of scenarios and test data requirements using below GRID.

  25. Limitations of test data encountered in testing • Column 1 - Gives you the scenario number • Columns 2 to Column 9 are the input parameters • Column 10 is the output.

  26. Limitations of test data encountered in testing Here we are showing only a sample example of the actual grid; in practice this grid had few thousands of rows. Due to sheer volume of data (millions of records), each row cannot be validated individually. In order to test, one approach was to execute these processes for a month end (production) data and the output of process to be validated using exception queries. The month end data may or may not satisfy all the scenarios that are given in above grid. So even though test scenarios weren’t satisfied by data, the test is passed when no exception is returned by exception query. Hence many of the logical branches were not tested but passed. When you consider few hundreds of sources of data the problem multiplies by that factor and you end with huge amount of code that has not been tested but still QA certified. To overcome this we had to resort to test data management. What we did was create a golden copy of data which is a subset of production data which satisfied maximum number of our test scenarios. Then based on testing requirement this data was cloned and sanitized further based on test requirement. So whenever a new source of data was introduced the data was sanitized and manipulated to satisfy all the test scenarios and was made available for testing

  27. Limitations of test data encountered in testing

  28. Summarization • To summarize, in today’s time where data ware house testing is an inseparable part of banking industry, there is an urgent need of incorporating test data management in QA process so that no part of code remains untested. • Incorporating test data management will result in: • Data for all test scenarios • Increase data quality • Better testing • Better test coverage

  29. Closure Did we meet the objectives?

  30. References • Infosys project experience • Infosys resources (www.infosys.com)

  31. Q&A: saurabhdipak_shinde@infosys.com, Partha_majhi@infosys.com 31

More Related