1 / 34

Data Mining Manufacturing Data

Data Mining Manufacturing Data. Dave E. Stevens Eastman Chemical Company Kingsport, TN. Presentation Outline. Intro: Data Mining Manufacturing Data Data Preparation P rincipal C omponent A nalysis P artial L east S quares PLS Discriminate Analysis. Manufacturing Data Then and Now.

carlyn
Download Presentation

Data Mining Manufacturing Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data MiningManufacturingData Dave E. Stevens Eastman Chemical Company Kingsport, TN

  2. Presentation Outline • Intro: Data Mining Manufacturing Data • Data Preparation • Principal Component Analysis • Partial Least Squares • PLS Discriminate Analysis

  3. Manufacturing DataThen and Now • 40 Years Ago • - Few Measurements • - Temp, Press., Flows • Today • - Many Measurements • - Very Often • - Creates Large Data Sets • Purposes For Measuring • - Process “State” • - Relationships (X, X to Y) • - Classification • - Optimization

  4. Concerns With Current Manufacturing Data • Dimensionality: (Large) >1000 process variables every few seconds >10 quality variables every few hours Data Overload - Analyst concentrates on only a few variables and ignore most of the information! • Collinearity: Not 1000 independent things at work. Only a few underlying events affecting all variables. Variables are all highly correlated. • Noise: • Missing Data:

  5. Multivariate Data Concept * * * * * * * * ** * * * * * * * * * * * * * * * BreakLoad Control Chart * * * Is This Process In Control? * * * * Elongation Control Chart *

  6. Data Preparation • Data collected in a Process Data Historian will have Process Up and Down Times recorded from the instrumentation • Need a software tool that will permit easy methods to clean the data and do initial Exploratory Data Analyses • JMP Software • Interactive Graphing • Removal of Outliers • Graphically or Variable Selection Criteria • Join and/or Subset Data Tables • Statistical Analyses

  7. Principle ComponentsAnalysis Understanding Relationships Between Process Variables

  8. Principle ComponentAnalysis • Principle Component Analysis is a Projection Technique • Raw data are first “Centered” and “Scaled” • Each Principle Component represents a direction through the data that captures the maximum amount of raw data variation • For each Principle Component (a), new data values are generated for each obs. (i) which are a linear combination of the raw X variables (k): ti,a = ba,1*Xi,1 + ba,2*Xi,2 . . . ba,k*Xi,k for each obs. i Where the b’s are loadings (-1 to 1) • Increasing number of Principle Components represent less and less raw data variation

  9. Principle Component AnalysisFundamentals X3 1st PC Projections X2 2nd PC X1

  10. PCA: Scores x2 ti,2 1st PC Obs. i x3 2nd PC x1 ti,1 The scores tia (observation i, dimension a) are the places along the component lines where the observations are projected.

  11. PCA: Loadings x2 x2 1st PC a3 a2 x3 Cos(a)=X/PC a1 x1 x3 x1 The loadings pak (dimension a, variable k) indicate the importance of the variable k to the given dimension. pak is the direction cosine (cos a) of the given component line vs. the xk coordinate axis.

  12. PCA Example • 10 process responses obtained on each observation • Data represented weekly process response averages • Data spanned 10 months • Objective: Determine if the system was stable.

  13. PCA Score Plot PC #2 Process Shift June 30 (5_30) PC #1

  14. PCA Loadings Plot X3 X7 L o a d i n g s P C # 2 X10 X8 X4 X2 X1 X5 X9 X6 Loadings PC #1

  15. PC #2 Process Shift June 30 (5_30) PC #1 Relative to process shift, X1 and X5 were high in value and X4 and X8 were low in value. Pos. Corr. Vars. were X1, X5 and X4, X8 Neg. Corr. Vars. were X1, X5 to X4, X8

  16. Process variable X1 increased in value when the system shifted from the left side to the right side on the PCA Score plot

  17. Variables X1 and X5 were positively correlated

  18. PartialLeast SquaresTechnique Understanding Relationships Between Process & Response Variables

  19. Partial Least Squares Fundamentals X Space Y Space X3 Y3 Planes Projections X2 Y2 X1 Y1

  20. TA Filter Example • Objective: Relate Filtrate, TA Catalyst and Dryer Temp to Filter Speed, Vacuum, Wash Acid, Weir Level, Nash Discharge Pressure and Feed Tank Temperature • Keep Filtrate High, TA Catalyst Low • Data: 12 Hour Averages from PI collected over a four month period

  21. TA Filter

  22. TA Filter Relationships Catalyst Higher filter speed and vac. pressure increased the filtrate flow and catalyst content but lowered the dyer temp. Higher weir level, nash discharge pressure and Op tank temp increased filtrate flow. Wash acid flow had no driving effect on the responses.

  23. PLS Results • Obtain Weight Plots (Previous Slide) • Shows the inter-relationships between the Xs and Ys • Obtain Regression Coefficients • Can be used to generate response surface plot • Display Variables Important to Prediction (VIP) • Display Residual Plots and Distance to the Model Plot

  24. CorrelationDoes Not Always MeanCausation

  25. PLS DiscriminateTechnique Determine What Drives Data Groups To Be Different

  26. Objective • Given groups of data from a particular process, determine what makes the groups different with respect to the given measurements. • Example: TA %T • Measurements: 4-HMB, TMA, TPAD, 4-HBA, 4-CBA, IPA, BA, PTAD, p-TA, 2,7-DCF, 2,6-DCF, 4-4-DCB, 3,5-DCF, 9-F-2-CA, 9-F-4-CA, 2,6-DCA, 4,4-DCS, L*, a*, b*, .1%, .9%, Mean, %T • Daily Numbers • Data taken from Convey Line #1 and #2

  27. TA %T

  28. PLS Discriminate Analysis High %T Low %T

  29. What Measurements Separated the Groups? 2 The high %T group ($DA1) was high in %T, 0.1, Mean and L. The low %T group ($DA2) had several measurements that were high in value and were positively correlated (see next slide for details).

  30. The low %T group ($DA2) had several variables that were correlated and high in value: 4 4’-DCS, 4-CBA TMA and p-TA

  31. Cat

  32. Computer Software • JMP Software • http://www.jmpdiscovery.com • SIMCA-P from Umetrics • http://www.umetrics.com

More Related