1 / 24

Anatomy of Device Physics Big Data & Machine learning Janet George Fellow/Chief Data Scientist

Anatomy of Device Physics Big Data & Machine learning Janet George Fellow/Chief Data Scientist Western Digital Wolfram Data Summit 2016. Contents Gartner 3Vs & IBM 4Vs

Download Presentation

Anatomy of Device Physics Big Data & Machine learning Janet George Fellow/Chief Data Scientist

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Anatomy of Device Physics Big Data & Machine learning Janet George Fellow/Chief Data Scientist Western Digital Wolfram Data Summit 2016

  2. Contents • Gartner 3Vs & IBM 4Vs • Anatomy of physics data – getting to the basics • Heteroskedastic data • Complexity of the data • Yield versus Endurance • Finding Nemo – Correlations and value (non-deterministic) • Getting caught up in machine learning • A simple performance predictor example • Game changing with Data • Data Intimacy • Embracing complexity, leading change. Sub-subsection head

  3. Gartner 2012 -3Vs

  4. IBM published the four V’s

  5. Anatomy of Device Physics data (Getting to the basics) • 4th Dimension of data • High variability (manufacturing process Materials are constantly changing) • Experimentation @ scale • Production @ scale • Leading edge • Extremely complex

  6. Going deeper into Device Physics Data • Heteroskedastic data • Heteroskedastic: A measure in statistics that refers to the variance of errors over a sample. • Heteroskedasticity is present in samples where random variables display differing variabilities than other subsets of the variables.

  7. Studying complexity of the data

  8. Higher Orders of Complexity

  9. Challenges in manufacturing data (New technology Node Creation • Material instability. Deformation, does not bond • Material build-up • Substrate over heating. • Heat sink • Resistive loss • Deposition issues • Thickness • Warpage • Wetting layers • Oxidation • Diffusion barrier • DPPM known and unknown causes, complex error recovery • Coupling effects, adjacent track interferences.

  10. Dealing with Heteroskedastic data - Challenges • Machine learning model building requires constant optimization – training and re-training with change. • Ranking and weighting correlations for DPPM – Page Rank Model. • Linear regression versus • Random forest • What works for the data/value creation

  11. Yield versus Endurance

  12. High Yield

  13. Predicting Retention Cycles

  14. Failure Classification & Clustering Complex error recovery, tail failures

  15. Finding Nemo! Correlation and value (non-deterministic) Permutation and Combination of every known and unknown correlation Manufacturing Data Screen Test Prep Measure Screen 1 Par 1 Clean Height Screen 2 Par 2 Film Tool Config Par 3 Screen 3 Oxide ALO Par 4 Screen 4 Cover Wet Par 5 Screen 5 Thickness

  16. Caught up in Machine learning – A simple example • Problem statement • Simple enough – Predicting employee performance • Where is the data? • What data do we have?

  17. Performance Analysis

  18. Performance Variables

  19. The Weight of Important Variables

  20. Key Findings Match Bias: Bias in the data will yield biased results • Calibration is inherently biased • Line of business data likely to provide more reliable results (common example: sales target vs. actual sales) Bias in company/management policies will yield biased results

  21. Game Changing with DATA! – New collection of Data • Unbiased data collection – getting away from bais. • Raw data – annotations, Lineage • Unfitted for existing tools • ETL – traditional versus new approaches to data collection methods. Machine learning mirrors human biases with extracted data. • Data loss • Getting to the “Holy Grail” faster. • No ETL • Evolving Schemas - Avro

  22. Developing Intimacy with Data with Domain Experts • Asking the right questions • Critical thinking • Observing the signals • Systemic patterns

  23. Leading the industry (Running faster than competitors) • Embracing Complexity and Change • Creating endless possibilities Key take away: Leading edge data from Industrial Internet, machine data is Heteroskedastic (high variability)

  24. Q/A

More Related