Download Presentation

Making the Most of Process Information via Multiscale and Bayesian Methods

Making the Most of Process Information via Multiscale and Bayesian Methods

343 Views

Download Presentation
## Making the Most of Process Information via Multiscale and Bayesian Methods

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Making the Most of Process Information viaMultiscale and**Bayesian Methods Bhavik R. Bakshi Department of Chemical Engineering Ohio State University Columbus, OH 43210 CPACT Conference, Edinburgh, April 25-26, 2002**Overview of Research Group**• Goal: Develop tools and techniques for efficient and sustainable process engineering • Projects focus on process and global scales • Process scale • Multiscale and Bayesian methods for extracting knowledge from process data • Global scale • Economically and ecologically conscious process engineering • Develop rigorous and systematic methods and explore their applications**Motivation for Multiscale and Bayesian Methods**• Processes and data are usually multiscale in nature • Events and features at multiple scales • Multirate measurements • Autocorrelated stochastic processes • Variety of process knowledge and information available • Measured data • Fundamental, empirical or heuristic knowledge • Single-scale and non-Bayesian methods lead to • Inferior analysis and modeling • Inefficient computation and use of available information • Disintegrated operation • Multiscale and Bayesian methods can perform better**Multiphase Flow**• Flow regimes in fluidized bed • Partial models and data are available for each regime intensity time Homogeneous Flow Heterogeneous Flow Slug Flow**Sheet and Film Manufacturing**• Different sampling interval in each channel • Dynamic models are also available sensor direction machine direction**Chemical Process Operation**Planning Scheduling Planning Scheduling Supervisory Control Supervisory Control Monitoring and Diagnosis Monitoring and Diagnosis Regulatory Control Regulatory Control Data Acquisition Data Acquisition Process Process • Efficient operation requires reasoning at different scales • Process data and knowledge are available**Objectives**• Develop methods for efficient process operation that can exploit • Multiscale nature of processes • All available process data and knowledge • Focus on the following tasks • Process Monitoring • Fault Diagnosis • Empirical Modeling • Data Rectification and Estimation • Analysis of complex chemical and biological systems • Integrate process operation tasks**Outline**• Introduction to • Bayesian methods • Wavelet analysis • General Approach for Multiscale Methods • Fault Detection and Diagnosis • MSPCA, MSART • Empirical Modeling • Bayesian PCA, Bayesian Latent Variable Regression • Dynamic Data Rectification • Linear systems with and without accurate models • Nonlinear systems • Approaches are general and broadly applicable to variety of modeling and analysis tasks**Bayesian Estimation**Prior knowledge, P(H) Rev. Thomas Bayes 1702-1761 Bayesian estimate, H Posterior,P(H|D) (Current Belief) ^ Info. from data, P(D|H) (New Belief) (New Information) Loss Function (Select sample from posterior) • Statistical framework for combining priorknowledge with empirical observations • Posterior becomes prior at next time • Bayes Rule, P(H | D) =P(D | H) P(H) • P(D)**Illustration of Bayesian Estimation**• P(H|D) 1 as t t t=1 t=2 t=3 ... Prior Posterior Posterior/ Prior Posterior/ Prior Posterior/ Prior Data Data Data • A newly born baby sees the sun setting and wonders, “Will it be back?” (Malakoff, 1999) • Prior knowledge: sun may or may not rise, P(H) = 0.5 • Data obtained everyday = Sun rises • Posterior at t=k becomes prior at t=k+1**Challenges in Bayesian Analysis**• Need distributions for prior and likelihood • Bad prior can give slow convergence and misleading answer • Gaussian densities are mathematically convenient but may not represent reality • Can be computationally expensive, particularly for non-Gaussian densities • Potential solutions • Use Empirical Bayes methods - estimate prior from measured data • Combine Bayesian analysis with Multiscale analysis • Markov Chain Monte Carlo methods**Multiscale Nature of Variables**Equipment degradation w Sensor failure Noise Sensor failure Disturbance Equipment failure Equipment • Delta functions • Fourier Transform degradation • Linear Filters • Wavelet Transform Disturbance Noise 0 20 40 60 80 100 Equipment failure time, t Process Signal 0 20 40 60 80 Time 100 w t**Wavelets**Haar wavelet Haar scaling function m=1, k=0 m=2, k=4 m=1, k=0 m=2, k=4 (x) y(x) x Daubechies-6 scaling function Daubechies-6 wavelet (x) y(x) x • Family of basis functions of fixed shape • Translations and dilations of mother wavelet ymk(x) = 2-m/2y(2-mx - k) m, k are integers**Wavelet Decomposition**G H m=1 G H m=2 w Original signal m=0 t Scaled signals, ym Wavelet Transform/Detail signal, dm**Properties of Wavelets**• Represents signals and functions as • y(t) = SSdmkymk(t) + SyLkfLk(t) • Localized in time and frequency • Deterministic features are captured by few large coefficients • Approximate eigenfunctions • Stochastic processes are approximately decorrelated • Can be orthonormal • Fast computation, O(N) • Extended to libraries of basis functions • Wavelet packets, cosine packets, etc.**Multiscale Feature Extraction**Original Signal Wavelet Coef. m=1 Wavelet Coef. m=2 Wavelet Coef. m=3 Scaled Coef. m=3 Threshold & Reconstruct**Analysis of Stochastic Processes**• Wavelet coefficients are approximately uncorrelated and Gaussian ARIMA ACF PDF Original, y0 Wavelet coeffs., d1, d2 Last Scaled Signal, y2**Process Operation Tasks**• Process Monitoring / Fault Detection • Detect abnormal operation from measured data • Empirical Modeling • Determine relationship between variables based on measured data • Data Rectification • Clean measured data by removing errors and satisfying process models**General Multiscale Methodology**coarse Operate on HLX . . . . Operate on GLX . . ^ ^ WT X, q X W Operate on GmX fine Operate on G1X • Convert traditional to multiscale methods (Bakshi, 1999) • Can use models at each scale and across scales**Multiscale Statistical Process Control(Bakshi, 1998, Aradhye**et al., 2000a, b) • SPC detects abnormal behavior from measured data • Lacks generality, best for certain types of changes • Shewhart charts for large shifts • CUSUM, EWMA for small shifts • Assumes uncorrelated measurements • Multivariate SPC reduces dimensionality by linear or nonlinear modeling • Normal and abnormal behavior usually occur at different scales • MSSPC should perform better**Detecting Mean Shift by MSSPC**8 4 0 -4 3 0 6 3 -3 W WT 0 0 3 -4 -2 60 140 140 0 40 80 100 120 60 80 100 120 20 0 20 40 0 time -3 4 0 -4 • Uncorrelated data with mean shift of 2s • First shift detection at scale m=2 • Current shift detection in last scaled signal**Example of Univariate MSSPC**SPC MSSPC • Mean shift of size 5 in iid Gaussian measurements • MSSPC detection limits adapt to signal features**General Framework for SPC**• Existing SPC filters operate at different fixed scales • MSSPC subsumes existing methods CUSUM Shewhart MA EWMA Haar Daubechies-4, boundary corrected**Library of MSSPC Filters**Moving Avg. CUSUM Moving Avg. Shewhart**Multivariate SPC**x2 X2 Normal PC1 * * * * * PC2 *** * ** * * * * ** ** * ** * * ** * * ** + ** + ** + + + + + + + + + + + + + + + + + + + + Abnormal x1 X1 • Univariate charts are inconvenient for multivariate tasks • Multivariate modeling reduces dimensionality • Linear modeling (PCA, PLS) • Nonlinear (clustering, NLPCA) • Detect changes in transformed space**Clustering with ART**Typical process data • Features of Adaptive Resonance Theory (ART) • Adaptive clustering • Inspired by neural networks (Carpenter and Grossberg) • Useful for change detection and diagnosis X2 Normal * * * * * *** * ** * * * * ** ** * ** * * ** * * ** + ** + ** + + + + + + + + + + + + + + + + + + + + Known operational event X1**MSSPC - Industrial Validation**• Case Studies • Change in Furnace Feed • Valve Leak Malfunction • Cold Weather Malfunction • Feed Malfunction • Event start and end determined with operator input • Cannot perform ARL analysis • Plot “Missed Alarm Rate” versus “False Alarm Rate” for different detection parameters • Better method has smaller missed alarm rate for same number of false alarms**Data - Valve Leak Malfunction**• Three redundant sensors**Performance - Valve Leak**• Multiscale methods do better ART PCA Missed Alarm Rate MSART MSPCA False Alarm Rate**MSART vs. Operator - Valve Leak**• MSART detects leak ~ 200 minutes before operator Abnormal Operator Normal Time step (minutes)**Data - Cold Weather Event**• Valve failure due to low ambient temperature • Single measured variable**Performance - Cold Weather Event**• Approximately stationary and Gaussian data • MSPCA does best ART Missed Alarm Rate MSART PCA MSPCA False Alarm Rate**MSSPC - Summary**• MSSPC provides better average performance for a variety of types and magnitudes of faults • Recommended when nature of features representing process change is unknown • If type of feature to be detected is known a priori, better to use traditional methods • Extension to reduce user-defined parameters, and to bigger library of basis functions is in progress • Bayesian MSSPC can do better, but requires probability of faults**Linear Regression**• All methods determine a model of the form Y = Zb • Inputs, Z, may be combined to form latent variables, T, in reduced dimension space (PCA, PLS) T = ZP • Latent Variable Regression (LVR) model Y = ZPb • Ideal method • Handles collinear variables • Accounts for errors in both input and output variables • Integrates regression and filtering • Incorporates external information and multiscale behavior ^ ^ ^ ^ ^ ^**Bayesian PCA and LVR**• Maximize posterior P(T, P, r, b|Z, Y) = P(Z, Y |T, P, b, r) P(T, P, r, b) • Approach • Solve conventional regression problem • Estimate prior from conventional solution • Solve Bayesian regression problem by iterating between • Rectification to estimate T, P • Parameter estimation to obtain b • Assumptions • Noise and underlying measurements are Gaussian • Regression parameters are Gaussian • Rank is known ^ ^**BPCA - Example**• Three correlated variables u3 = u1 + u2; u1 ~ N(3,1); u2 ~ N(1,4) • Measurements corrupted by additive Gaussian noise Z = U + e • MSE for 100 realizations • Smaller coeffs. MSE for higher dimensional problems Method Prior Inputs Coeffs. PCA uniform 2.72 0.187 MLPCA uniform 2.11 0.093 BPCA empirical 1.40 0.092 BPCA exact 1.22 0.000**BLVR - Example**• Three correlated variables u3 = u1 + u2; u1 ~ N(3,2); u2 ~ N(1,4) • Noise-free output x = 0.8u1 + 0.8u2 • Measurements corrupted by additive Gaussian noise y = x + ex; Z = U + eu • MSE for 100 realizations Method Prior Inputs Outputs Coeffs. OLS uniform 1.32 0.66 0.010 PLS uniform 1.18 0.71 0.012 BLVR empirical 0.69 0.60 0.007 BLVR exact 0.66 0.55 0.000**Bayesian Regression - Summary**• Bayesian approach can improve PCA and LVR without additional data • Can deal with • Errors in all variables • Correlated variables • External information • Prior knowledge may be obtained from • Data being modeled, via empirical Bayes approach • Historical data • Many opportunities for further work**Data Rectification and Estimation**• Estimate measured variables and unknown quantities • Bayesian problem formulation Given y1:k = {y1, y2, ..., yk} maximize P(xk|y1:k) subject to xk = fk-1(xk-1, wk-1) state eqn. yk = hk(xk, vk) measurement eqn. g1(xk) = 0 equality constr. g2(xk) ≥ 0 inequality constr. • Existing methods rely on many assumptions .**Existing Methods for NDDR**• Extended Kalman Filtering (Jazwinski, 1970) • Assumes fixed Gaussian distributions, • Uses linearized models • Cannot satisfy constraints • Moving Horizon Estimation (Robertson, Lee and Rawlings, 1996; Rao and Rawlings, 2002) • Satisfies constraints • Assumes fixed Gaussian distributions • Computationally expensive due to non-recursive solution • Existing methods solve the convenient NDDR problem, not the real one • Actual probability distributions are infinite dimensional and change in size and shape**Evolution of Probability Distributions**• Evolution of posterior for popular adiabatic CSTR • Gaussian approximation is even more inaccurate with constraints**Results of CSTR Example**• Perfect initial guess • 100 realizations, 1600 measurements /realization, 500 samples/realization • Work in progress • Relevant to model predictive control, Bayesian neural networks, etc.**Rectification without Accurate Models**^ y x -1 -1 m m • Most processes are dynamic but lack accurate models • Wavelet representation captures dynamics in variation of variance across scales w(m) ~ N(0,Pd ); Ax(m) = Hm; Bx(m) = Gm • Rectify coefficients at each scale (Bakshi et al., 2001) dm = Km(CTRm dm + Pd md ) • Features of multiscale approach • More accurate than single-scale approaches • More computationally efficient since scales with less information can be identified before rectification m**Example**• Level control process (Bellingham and Lees, 1977) hk+1 = 0.995 -0.1373 hk + 0.00012 0 F3k xk+1 0 1 xk 0 1 ek • F3k and ek are iid Gaussian • [hk xk F3k ek] are corrupted by iid Gaussian noise • None None 1.00 • Max. Likelihood Steady state 0.67 • Single scale Bayes Steady state 0.40 • Multiscale Bayes Steady state 0.06 • Single scale Bayes Dynamic 0.05 • Multiscale Bayes Dynamic 0.03 Method Model MSE**Data Rectification - Summary**• Existing approaches to nonlinear estimation and rectification requires assumptions • Gaussian noise, prior • Non-time varying distributions • Assumptions are readily violated • Proposed approach relies on Monte Carlo sampling • More accurate that existing methods • Computationally less expensive than MHE • Many opportunities for further work**Summary**• Large amounts of measured data and process knowledge are available • Existing methods do not make the most of available data and knowledge • Processes are multiscale, but methods are single-scale • Fundamental models and partial knowledge are underutilized • Developed new multiscale and Bayesian methods for, • Fault detection and diagnosis, • Dynamic data rectification, and • Empirical modeling • Significant opportunities for future research and applications**Future Work**• Nonlinear dynamic data rectification • Bayesian nonlinear regression/neural networks • Estimation of multirate systems and missing data • Integrated rectification, monitoring, diagnosis, and supervision • Bioinformatics and genomics • Process scale-up**Acknowledgments**• Graduate students and post-docs • Prof. Sridhar Ungarala • Dr. Hrishikesh Aradhye • Collaborators • Prof. Prem K. Goel • Dr. Manabu Kano • Financial Support • National Science Foundation (CTS 9733627) • Abnormal Situation Management Consortium • Du Pont Education Fund • Technical Association of Pulp and Paper Industry • American Chemical Society - Petroleum Research Fund • Dr. Mohamed Nounou • Mr. Wen-Shiang Chen • Prof. Xiaotong Shen