Rudolf Žitný, Ústav procesní a zpracovatelské techniky ČVUT FS 2010

Experimental methods E181101 EXM8 Error analysisStatisticsRegression Rudolf Žitný, Ústav procesní a zpracovatelské techniky ČVUT FS 2010

EVALUATION OF EXPERIMENTAL DATA EXM8 Distribution of errors. It is assumed that a true value of a quantity is distorted by n-small effects of the same magnitude (positive or negative). Superposition of these effect results to a random error, having binomial distribution. As soon as the number of effects goes to infinity, this distribution reduces to the normal Gauss distribution of errors ()d is the probability that an error is within interval <,+d> Gauss integral where  is the mean quadratic error called standard deviation. Probability, that an error is somewhere within the range <-,> is the integral distribution Example P()=0.68 P(3)=0.997

EVALUATION OF EXPERIMENTAL DATA EXM8 Arithmetic average of repeated measurement this is the best estimate of expected value. Standard deviation of single measurement can be estimated using this average (the best estimate of standard deviation) Please notice the fact, than n-1 and not just n is used in denominator. This is because we do not know the expected value, estimated as arithmetic average, and therefore number of degrees of freedom is reduced by 1 (n-1). The set of recorded data x1,…. xn enables to evaluate also standard deviation of the calculated arithmetic average (which is obviously smaller than the standard deviation of measured data) .

Measuring chain EXM8 The measured quantity x (e.g. temperature) is usually measured by a chain of different instruments (e.g. by thermocouple and voltage amplifier), with generally nonlinear characteristics (voltage is not exactly linear function of temperature for thermocouple) and instrument transforms input signal according to its characteristics. There are always some random errors superposed. f(x) thermocouple g(x) amplifier y=g(f(x)+fi)+ gi f(x)+fi x (actual value) . Random noise with normal distribution (f) and zero mean value Random noise with normal distribution (g) and zero mean value

Measuring chain EXM8 Expected mean value for n repeated experiments Mean value of noise is zero Therefore the mean value (even for a very large number of experiments n) is distorted in the case that the function g(x) is nonlinear and this deviation is proportional to variance of errors applied to instrument f(x) (thermocouple): . Variance of thermocouple noise

Measuring chain EXM8 Expected variance of y for repeated measurement of the same value x Variance of amplifier noise Variance of thermocouple noise .

Taylor expansion EXM8 Taylor expansion of function of M variables .

Variance of evaluated property EXM8 Variance of property calculated from M measured values (independent variables) .

Variance of evaluated property EXM8 Proof of variance of arithmetic average .

Variance of evaluated property EXM8 • Example related to the project of capillary rheometer ( syringe): Evaluation of viscosity from the following capillary rheometer data • Geometry: • D diameter of needle, • L-length of needle, • p pressure drop, • V volumetric flowrate Hagen Poiseuille relation for laminar flow Variance of individual measured parameters can be estimated from repeated measurement, e.g. from repeated measurement of the needle length L . The variance can be sometimes estimated from instrument data sheets

Data Regression EXM8 . Hopper

Data Regression EXM8 Regression analysis: Approximation of relationship between independent variables x (there can be more than one independent variable) and dependent variable y. Let us assume that data are arranged in the matrix of observation points (each row describes one point x,y). For example this is a matrix with two columns and N rows if there is one independent variable x and N-pairs of x,y. The relationship y(x) is represented by model where is vector of model parameters. Regression analysis looks for the model parameters giving the best approximation of observation points, i.e. minimising the goal function . where i is standard deviation of dependent variable y at the point x. Chi square criterion

Data Regression EXM8 A good model f(x,p) (that reasonably approximates the unknown relationship y(x)) should give chi square value of about N-M (N is number of points and M is number of identified parameters p). Another indicator of quality of the selected regression model is correlation index r The correlation index r=1 in the case of absolutely perfect fit (model reproduces all observation points exactly), the worst case is r=0, because than the function f would be better approximated by a constant, the mean value of dependent variable . .

Linear regression analysis EXM8 In this case only the models f(x,p) which are linear with respect to the model parameters pk are used gm(x) are design functions, which can be selected more or less arbitrarily, they must be only linearly independent. Example g1=1, g2=x, g3=x2,… For N observation points the design matrix A is defined as Aij=gj(xi) .

Linear regression analysis EXM8 Parameters p are identified in such a way that the sum of squares will be minimized (it corresponds to minimization of chi square criterion for the case, that standard deviation error of all data points is the same). The sum of squares can be expressed also in matrix notation as a scalar product of two vectors (residual vectors of differences between measured values of y and prediction by linear model) . Vector of data yi Design matrix (function of xi)

Linear regression analysis EXM8 Looking for minimum of sum of squares (zero gradient at minimum) This is system of linear algebraic equations for unknown vector of model parameter p Right hand side vector Square matrix M x M . This system is called NORMAL EQUATIONS and inverted matrix [[C]]-1 is called COVARIANCE MATRIX.

Linear regression analysis EXM8 The covariance matrix C-1 is closely related to probable uncertainties (standard deviations) of calculated parameters: Variance of measured data Variance of calculated parameters Proof: .

NonLinear regression EXM8 In this case the model can’t be decomposed to linear combination of design functions, ane has a general form y=f(x,p1,…,pM) – this model can be in form of an algebraic expression, but it can be for example solution of differential equation. The parameters p should be again calculated from the requirement, that the sum of squares of deviations (or weighted sum of squares) is the least possible. The Marquardt Levenberg method is based upon linearisation of optimised model f(xi,p1,…,pM)=fi, where xi are independent variables of the i-th observation point and p1,…,pM are optimised parameters of model. The least squares criterion is used for optimisation Increment of k-th parameter in iteration step Weight of i-th data point .

NonLinear regression EXM8 Each iteration of Marquardt Levenberg method consists in solution of linear algebraic equations for vector of parameter increments . Concergency of iterations is improved by artificial increase of C matrix diagonal, by adding a constant  to C11, C22,…CMM. For very large  the algorithm reduces to the steepest discent method (gradient method) – slow, but reliable, while for very small  iterations approach Gauss method – faster but sensitive to initial estimate of searched parameters.

Example Regression EXM8 Regression model .

1(of 5) Tr U1 A/D converter U2 UM T Example Calibration EXM8 Simultaneous calibration of multiple thermocouples or pressure transducers Consider linear characteristics of individual channels Measured data are represented by matrix of observation points .

2(of 5) Example Calibration EXM8 Calibration means identification of constants kj and tj of all transducers. As soon as the reference values Tr are accurate (recorded by a standard instrument with better accuracy than the accuracy of calibrated probes) the problem is quite simple: Parameters kj,tj can be identified by linear regression for each probe separately. [B] [[C]] Evaluated temperature Recorded voltage .

3(of 5) Example CalibrationCOVARIANCE EXM8 Covariance matrix C is inverted matrix of normal equations . Variances of kj and tj Estimated variances of transducers

4(of 5) Example Calibration SIMULTANEOUS EXM8 Actual temperature Ti in the i-th measurement is not exactly the recorded reference value Tr (due to inaccuracy of standard instrument) but Ti is the same for all probes assuming a good mixing of liquid in the bath (this assumption is fulfiled even better with simultaneous calibration of pressure transducers). Question is how to use this information for improvement of identified constants accuracy? The best estimate of actual temperature of bath in the i-th measurement is based upon minimisation of deviation with respect Tri and deviations of the predicted temperatures from M-probes (assuming that their characteristics are known) result Weight of standard instrument (select high w if accuracy of standard is high) .

5(of 5) Example Calibration SIMULTANEOUS EXM8 The best approximation of bath temperature Ti can be used instead of Tri, and the whole procedure repeated until convergency is achieved Data: w,uij,Tri i=1,2,…, N j=1,2,…, M j=1,2,…, M . yes no Result kj, tj converge

Example Laser scanner (1 of 2) EXM8 How to identify a circle, given set of points xi yi y x0 y0 xi yi x But this is a system of 3 nonlinear equations .

x3 y3 y x0 y0 x1 y1 x2 y2 x Example Laser scanner (2 of 2) EXM8 How to identify a circle, given set of points xi yi 3 points define a circle. So you can evaluate triplets (for n=100 this is 161700 radii) and estimate radius by average. .

Rudolf Žitný, Ústav procesní a zpracovatelské techniky ČVUT FS 2010