Dealing with large data set and complexity in your testing

Dealing with large data set and complexity in your testing Jae-Jin Lee

Search results (Facts) • Google/Bing • Possible number of inputs is close to infinity • There are huge amount of data source • Algorithms (placement) are very complex Expedia Possible inputs are not as huge as Google, but the same input can return different results based on dates, traveler info and other factors. There are huge amount of inventories Algorithms are very complex and direct impact to the business

Search results (Facts) • Google/Bing • Possible number of inputs is close to infinity • There are huge amount of data source • Algorithms (placement) are very complex • Expedia • Possible inputs are not as huge as Google, but the same input can return different results based on dates, traveler info and other factors. • There are huge amount of data source(world, inventories) • Algorithms are very complex and direct impact to the business

Testing Challenges • Test input selection • Data is not organized in a way to be tested • Equivalent partitioning is hard • Randomness? Coverage? • Verifying mechanism • How do we get expected result? • RE-implement the algorithm? • 474,000,000 results for "Seattle“ search

Good news • Algorithms are complex but defined • We have a full access to data source • Historical data/statistics are available • Not all the results are equally important

Risk analysis / assessment

Risk analysis / assessment • Question the project (Is it feasible to do it?) Practical risk analysis Understand the risk on business perspective Understand the likelihood of faults from development perspective Validations to be done Test cases Come up with list and reviewed by entire project team

Risk analysis / assessment • Question the project (Is it feasible to do it?) • Practical risk analysis • Understand the risk from business perspective • Understand the likelihood of faults from development perspective • Test cases • Validations to be done Come up with list and reviewed by entire project team

Risk analysis / assessment • Question the project (Is it feasible to do it?) • Practical risk analysis • Understand the risk from business perspective • Understand the likelihood of faults from development perspective • Test cases • Validations to be done • Summarized it and reviewed by entire project team

Understand data source • Data source is trusted source of test case validation / verification mechanism • Modifying data source should be piece of cake • Insert, delete, update rows or execute sprocs • Setup and tear down • No assumption on data source

Test input selection • Historic data and statistics • Priority from risk analysis • Creativity and product knowledge to break • Radom valid inputs from bucketing (do as much as you can and log the useful details) • Hard-coded data

Decompose the algorithm • Exercise each logic separately by controlling data source and dependencies • Working with dev for testability or hooks (architecture, logs, and etc.) • If possible, implement algorithms for happy path in your test automation

Heuristic approach helps • Is there a place where good enough result acceptable? • Seatttle (three ‘t’s) • Is that in the list? • Is that in the first 10 results?

Hybrid approach (manual + automation) • Integration environment • Combine human’s intuition/product knowledge and machine’s powerful diligence • Execute manually and validate using test validation code (turning on logs) • Requires decoupled class design in your automation • UI(JavaScript) broke the functionality

Your thought?

Dealing with large data set and complexity in your testing

Dealing with large data set and complexity in your testing

Presentation Transcript

Dealing with Complexity

Dealing with Data

An Efficient Data Envelopment Analysis with a large data set in Stata

Dealing with Large Applicant Pools

Dealing with Quantitative Data

Dealing with Data Quality

Dealing with data

Dealing with Data

Dealing with Large Scale Power Emergencies

Reinforcement Learning Dealing with Complexity and Safety in RL

Dealing with Data

Dealing with Remote Data

Dealing with MASSIVE Data

Dealing with Software Complexity

Simple ways of dealing with complexity?

Dealing with 401k Testing Failures

Dealing with Data

Dealing with Ionospheric Complexity

Dealing with Data

Dealing with Large Lecture Classes