Anirban Chaudhuri, Garrett Waycaster, Taiki Matsumura, Nathaniel Price, Raphael T. Haftka

Framework for Quantification and Reliability Analysis for Layered Uncertainty using Optimization: NASA UQ Challenge Anirban Chaudhuri, Garrett Waycaster, Taiki Matsumura, Nathaniel Price, Raphael T. Haftka Structural and Multidisciplinary Optimization Group, University of Florida

NASA Problem Description • Combined aleatory and epistemic uncertainty • Epistemic uncertainty: 31 θ’s (Sub-parameters) • Aleatory uncertainty: 21 p’s (Parameters) PerformanceMetrics Worstcasescenario Intermediate Variables Parameters Constraints Design Variables

Toy Problem True distribution of G1 Function of G G1 = 5(- P1 + P2 - (P3 – 0.5)) G2= 0.7 – P3 P1: Constant P2: Normal distribution P3: Beta distribution No Intermediate variables w(p) = max(g1, g2)

Task A: Uncertainty Characterization

Assumption and Approaches • AssumptionThe distribution of each uncertain parameter is modeled as a uniform distribution. • Approaches • Bayesian based approach • CDF matching approach • Prioritized Observation UQ approach

Bayesian Based Approach • Uncertainty models θ are updated by Bayesian inference • Marginal distribution of each parameter θi is obtained by integration • Each marginal distribution (posterior) is obtained by Markov Chain Monte Carlo (MCMC) method as a sample distribution. θ : set of uncertain parameters x1,obs : 1st set of observations P(θ) : prior distribution L(θ|x1,obs) : likelihood functionf(θ|x1,obs) : posterior distribution

CDF Matching Approach • Two sample Kolmogorov-Smirnov (KS) test forms the basis of this method. • Use given observations for an empirical CDF (eCDF). • We can also get confidence bands (10-95%) for these eCDFs. • Optimize s to match these eCDFs and the confidence bands using modified KS statistic, Dn,n’. • Sum of distances instead of maximum distance. eCDF using a particular realization and generate n’ samples using the aleatory uncertainty eCDF using given observations

CDF Matching Approach • For 31 eCDFs (actual eCDF+30 confidence bands), optimize the s. • 10-95% confidence bands. • Gives a refined range of s from the 31 possibly optimal s. DIRECT optimizer is used (Finkelet al.)

Prioritized Observation UQ • Both performance metrics measure risk. • Refining the UQ based on amount of risk attached to an observation. • Similar strategy as the CDF matching method except the objective function is weighted modified KS statistic. • WR is weight of the observation according to the risk associated with it. • Could be decided based on the J2 value. • Implementation is very expensive in order to find J2 using Monte Carlo. • Exploring Importance sampling or surrogate based strategies for future work.

Toy Problem Results:Posterior Distributions using Bayesian approach • 20 observations of G1 are used. • Initially, the mean and variance P2 are most uncertain (wider ranges). • While MCMC reduced the range of mean and variance of P2, the ranges of other parameters remain.  makes sense! Posterior distributions updated by using 20 observations True Value

Toy Problem Results:Reduced bounds using CDF matching • Using 20 observations for G1. • Maximum reduction in bounds for mean and variance of P2. • Similar results as the Bayesian Approach.

Toy Problem Results:Effects of Number of Observations • Started with only 5 observations of G1 and then increased to 20. • Bayesian approach: MCMC provided sets of s. • CDF matching: around 7000 s generated in the updated range. Perform KS test to see if the hypothesis that the CDF is same as eCDF of the given all 20 observations is rejected. Create eCDF using 1000 samples Each , 1000 G1 samples generated • Rejection rate is substantially reduced by both approaches when 20 observations are used.

NASA Problem Results:Posterior Distribution using Bayesian approach Posterior by first 25 observations Posterior by all 50 observations

NASA Problem Results: Reduced bounds using CDF matching Using first 25 observations Using all 50 observations

NASA Problem Results:Effects of Number of Observations • Started with 25 observations of x1 and then increased to 50. • Bayesian approach: MCMC provided sets of s. • CDF matching: around 7000 s generated in the updated range. Perform KS test to see if the hypothesis that the CDF is same as eCDF of the given all 50 observations is rejected. Create eCDF using 1000 samples For each , 1000 psamples generated • Rejection rate is reduced by both approaches as compared to the prior.

Task B: Sensitivity Analysis

Primary objectives • Effect of reduced sub-parameter bounds on intermediate variable uncertainty • Fix parameter values without error in intermediate variables • Effect of reduced bounds on range of values of interest, J1 and J2 • Fix parameter values without error in range of J1 or J2

Intermediate Variable Sensitivity • Reduce the range of sub-parameters by 25% and repeat the above process. • Reduce upper bound. • Increase lower bound. • Centered reduction. • Sensitivity analysis based on changes in the bounds of a variable, rather than its value. • Empirical estimate of p-box of intermediate variable, x. • Using double-loop Monte Carlo simulation. • Sample sub-parameter values within the bounds and subsequent parameter realizations. • Average change in the area of the p-box brought about by these three reductions is a measure of sensitivity of these bounds. Arevised Ainitial

J1 and J2 Range Sensitivity -For J1 and J2, we use the range of values from Monte Carlo simulation -Surrogate models are used to reduce computation of J1 and J2 -Parameters are ranked based on each parameter’s sensitivity on J1 and J2 using a rank sum score

Fixing Parameter Values • We use DIRECT global optimization to maximize the remaining uncertainty (either p-box area or J1/J2 range) while fixing a single parameter. • We generate an initial large random sample of all parameters and replace one parameter with a constant. • We fix parameters where the optimized uncertainty measure is close to the initial value

Toy Problem Results G1= 5(- P1 + P2 - (P3– 0.5)) G2 = 0.7 – P3 P1: Constant P2: Normal distribution P3: Beta distribution • Monte Carlo simulation introduces some error, as should be expected. • Able to accurately rank the sensitivities of each of the parameter bounds, and suggests fixing unimportant parameters at reasonable values.

NASA Problem Results: Revised uncertainty model • Initial intermediate variable analysis: • We are able to fix nine parameters: 2, 4, 5, 7, 8, 13, 14, 15, and 17. • Based on their expected impact on both J1 and J2, we select revised models for parameters 1, 16, 18, and 21.

Tasks C & D: Uncertainty Propagation & Extreme Case Analysis

Primary objectives • Uncertainty Propagation • Find the range of J1 and J2 • Extreme Case Analysis • Find the epistemic realizations that yield extreme J1 and J2 values • Find a few representative realizations of x leading to J2 > 0

Double Loop Sampling (DLS) • Double Loop Monte Carlo Sampling (DLS) • Parameter Loop – samples sub-parameters (epistemic uncertainty) • 31 distribution parameters • Probability Loop – samples parameters (aleatory uncertainty) • 17 parameters (p’s) • Challenges: • Computationally expensive

Efficient Reliability Re-Analysis (ERR) (Importance Sampling Method) • Full double loop MCS is infeasible. • Black box function g = f(x,dbaseline) is computationally expensive • Instead of re-evaluating the constraints at each epistemic realization we weigh existing points based on likelihood. • Not importance sampling in traditional sense (i.e. No “important” region”). • How do we handle fixed but unknown constants that lie within given interval? • Generate initial p samples over entire range, [0,1] • Use narrow normal distribution as “true” pdf • pi ~ N(θi,0.25θi) [1] Farizal, F., and EfstratiosNikolaidis. "Assessment of Imprecise Reliability Using Efficient Probabilistic Reanalysis." System 2013: 10-17.

Optimized Parameter Sampling • Optimization was used to find the epistemic realizations (s) corresponding to extreme values of J1 and J2. • The optimization process was repeated 4 times to minimize / maximize J1 and J2. • The objective function was based on ERR. • Computational cost is significantly reduced. • We need to use a global optimizer that is not gradient based. • DIRECT algorithm by Daniel E. Finkel. • Hasn’t been implemented on NASA problem yet due to high computational costs. • Use of surrogates is being explored.

Validation of ERR Method on Toy Problem • This shows that ERR method performed well when compared to the more expensive DLS method for the toy problem. An MCS was performed using the epistemic realizations from the optimization.

Results of DLS for NASA problem • It was only possible to use a small number of samples due to computational time required • 400 samples of the epistemic uncertainty • 1,000 samples of the aleatory uncertainty • Results show a significant reduction in range of J1 • Can we trust these results with such a small sample size?

Results of ERR method: NASA problem • ERR results didn’t correspond very well with the DLS results.

Limitations of current Importance sampling based approach • Good agreement with double loop sampling results for the toy problem but not for NASA problem. • Hypothesized that poor performance of the importance sampling based approach is due to: • Difficulty in creating initial set of samples with good coverage in 21 dimensional space (limited samples). • Fixed but unknown constant parameters that were modeled using narrow normal distributions. • Possible fix: • Dimensionality reduction by fixing the parameters through sensitivity analysis. • Use of surrogates to reduce computational time.

Summary • Uncertainty Quantification using a given set of samples was successfully performed using a Bayesian approach and a CDF matching approach. • P-box / reduction in range was used as the criterion to decided the sensitivity of the parameters. • An importance sampling based approach was utilized for uncertainty propagation and extreme case analysis. • A simpler toy problem was used validate all our methods, increasing our confidence in the methods.

Thank YouQuestions??

Back-Up Slides

Reduced bounds using CDF matching • Repeated the process 50 times.

MCMC Implementation (Backup Slide) • Metropolis MCMC is used • 20 MCMC runs (m=20) - Different starting points * - 10,000 posterior samples (2n=10,000) - First 5000 samples are discarded for accuracy • Proposal distribution* is a normal distribution with the standard deviation of 10% of the prior range. • 1000 random samples* is generated to construct an empirical PDF of G1 to calculate the likelihood • Likelihood (empirical PDF) is calculated by the kernel density estimation - MATLAB ksdensity * Sources of noise in output

MCMC Convergence (Backup Slide) • (1) Discard the first n draws • (2) Use Glen and Rubin Multiple Sequence Diagnostic • (2) If is close to 1 (say less than 1.1), MCMC can be considered to be converged and the total (m x n) draws are combined as a one chain. Potential scale reduction factor where : Within chain variance : Between chain variance

Correlations between sub-parameters

Task B Summary • Evaluating sensitivity using p-box and range as a metric to quantify changes • Surrogate models are utilized to reduce the computational expense of the double loop simulation • Parameter values are fixed by optimizing the remaining uncertainty using DIRECT global optimization • Refined models are requesting based on the rank sum score of each parameter for both values of interest, J1 and J2 • Though the Monte Carlo simulation and surrogate models introduce errors through approximation, our simple toy problem suggests this method is still adequate to provide rankings of parameter sensitivity

Other Methods That Were Tried… • P-box Convolution Sampling • Requires replacing distributional p-box with free p-box • Failure Domain Bounding (Homothetic Deformations) • NASA UQ toolbox for Matlab has steep learning curve • Theoretical background is challenging • Replacing x to g function with surrogates • Requires 8 surrogates (one for each constraint function) in 5 dimensional space • Exploration of functions indicates delta function type behavior that is difficult to fit with surrogate • Attempts at creating PRS and Kriging surrogates results in poor accuracy 10 / 10

Importance Sampling Formulation • Worst case requirement metric • Similarly, for probability of failure 4 / 10

Sampling Distributions • Sampling Distributions • 19 p’s are bounded between 0 and 1 (Beta, Uniform, or Constant). • Uniform sampling distribution is used. • 2 p’s are normally distributed and possibly correlated. • Samples must cover a large range. • -5 ≤ E[pi] ≤ 5 • 1/400 ≤ V[pi] ≤ 4 • Uncorrelated multivariate normal distribution with mean of 0 and standard deviation of 4.5 is used. • 8 constraint functions are evaluated for 1e6 realizations of p.

Epistemic Realizations Corresponding to J1/J2 Extrema: Toy Problem

Updated Uncertainty model Given Uncertainty model J1 J2

NASA Problem ERR error Percent error between MCS estimates for J1 and J2 using 1,000 p samples and ERR estimates using 1e6 initial samples

Anirban Chaudhuri, Garrett Waycaster, Taiki Matsumura, Nathaniel Price, Raphael T. Haftka

Anirban Chaudhuri, Garrett Waycaster, Taiki Matsumura, Nathaniel Price, Raphael T. Haftka

Presentation Transcript

Raphael

Bharani Ravishankar, Benjamin Smarslok Advisors Dr. Raphael T. Haftka, Dr. Bhavani V. Sankar

Raphael

Chanyoung Park Raphael T. Haftka

Raphael

Raphael Sanzio

Raphael

Nathaniel:

Raphael Santi “Galatea Raphael”

Prof. Swarat Chaudhuri

Raphael

RAPHAEL

Raphael

Raphael

Anirban Lahiri

Prof. Swarat Chaudhuri

Prof. Swarat Chaudhuri

Swarat Chaudhuri

Prof. Swarat Chaudhuri

Raphael

Raphael Collazo