LECTURE 3: ANALYSIS OF EXPERIMENTAL DATA

MEASUREMENT AND INSTRUMENTATION BMCC 3743 LECTURE 3: ANALYSIS OF EXPERIMENTAL DATA Mochamad Safarudin Faculty of Mechanical Engineering, UTeM 2010

Contents • Introduction • Measures of dispersion • Parameter estimation • Criterion for rejection questionable data points • Correlation of experimental data

Introduction • Needed in all measurements with random inputs, e.g. random broadband sound/noise • Tyre/road noise, rain drops, waterfall • Some important terms are: • Random variable (continuous or discrete), histogram, bins, population, sample, distribution function, parameter, event, statistic, probability.

Terminology • Population : the entire collection of objects, measurements, observations and so on whose properties are under consideration • Sample: a representative subset of a population on which an experiment is performed and numerical data are obtained

Measures of dispersion =>Measures of data spreading or variability • Deviation (error) is defined as • Mean deviation is defined as • Population standard deviation is defined as

Measures of dispersion • Sample standard deviation is defined as • is used when data of a sample are used to estimate population std dev. • Variance is defined as

Exercise • Find the mean, median, standard deviation and variance of this measurement: 1089, 1092, 1094, 1095, 1098, 1100, 1104, 1105, 1107, 1108, 1110, 1112, 1115

Answer to exercise • Mean = 1103 (1102.2) • Median = 1104 • Std deviation = 5.79 (7.89) • Variance = 33.49 (62.18)

Parameter estimation Generally, • Estimation of population mean, is sample mean, . • Estimation of population standard deviation, is sample standard deviation, S.

Interval estimation of the population mean • Confidence interval is the interval between to , where is an uncertainty. • Confidence level is the probability for the population mean to fall within specified interval:

Interval estimation of the population mean • Normally referred in terms of , also called level of significance, where confidence level • If n is sufficiently large (> 30), we can apply the central limit theorem to find the estimation of the population mean.

Central limit theorem • If original population is normal, then distribution for the sample means’ is normal (Gaussian) • If original population is not normal and n is large, then distribution for sample means’ is normal • If original population is not normal and n is small, then sample means’ follow a normal distribution only approximately.

Normal (Gaussian) distribution • When n is large, where • Rearranged to get • Or with confidence level

Area under 0 to z

Student’s t distribution • When n is small, where • Rearranged to get • Or with confidence level t table

Interval estimation of the population variance • Similarly as before, but now using chi-squared distribution, , (always positive) where

Interval estimation of the population variance • Hence, the confidence interval on the population variance is Chi squared table

Contents • Introduction • Measures of dispersion • Parameterestimation • Criterion for rejection questionable data points • Correlation of experimental data

Criterion for rejection questionable data points • To eliminate data which has low probability of occurrence => use Thompson test. • Example: Data consists of nine values, Dn = 12.02, 12.05, 11.96, 11.99, 12.10, 12.03, 12.00, 11.95 and 12.16. • = 12.03, S = 0.07 • So, calculate deviation:

Criterion for rejection questionable data points • From Thompson’s table, when n = 9, then • Comparing with where then D9 = 12.16 should be discarded. • Recalculate S and to obtain 0.05 and 12.01 respectively. • Hence forn = 8, and so remaining data stay. Thompson’s t table

Contents • Introduction • Measures of dispersion • Parameterestimation • Criterion for rejection questionable data points • Correlation of experimental data

Correlation of experimental data • Correlation coefficient • Least-square linear fit • Linear regression using data transformation

A) Correlation coefficient • Case I: Strong, linear relationship between x and y • Case II: Weak/no relationship • Case III: Pure chance => Use correlation coefficient, rxy to determine Case III

Linear correlation coefficient • Given as where • +1 means positive slope (perfectly linear relationship) • -1 means negative slope (perfectly linear relationship) • 0 means no linear correlation

Linear correlation coefficient • In practice, we use special Table (using critical values of rt) to determine Case III. • If from experimental value of |rxy|is equal or more than rt as given in the Table, then linear relationship exists. • If from experimental value of |rxy|is less than rt as given in the Table, then only pure chance => no linear relationship exists.

B) Least-square linear fit To get best straight line on the plot: • Simple approach: ruler & eyes • More systematic approach: least squares • Variation in the data is assumed to be normally distributed and due to random causes • To get Y = ax + b, it is assumed that Y values are randomly vary and x values have no error.

Least-square best fit • For each value of xi, error for Y values are • Then, the sum of squared errors is

Least-square best fit • Minimising this equation and solving it for a & b, we get

Least-square best fit • Substitute a & b values into Y = ax + b, which is then called the least-squares best fit. • To measure how well the best-fit line represents the data, we calculate the standard error of estimate, given by where Sy,x is the standard deviation of the differences between data points and the best-fit line. Its unit is the same as y.

Coefficient of determination • …Is another good measure to determine how well the best-fit line represents the data, using • For a good fit, must be close to unity.

C) Linear regression using data transformation • For some special cases, such as • Applying natural logarithm at both sides, gives where ln(a) is a constant, so ln(y) is linearly related to x.

Example • Thermocouples are usually approximately linear devices in a limited range of temperature. A manufacturer of a brand of thermocouple has obtained the following data for a pair of thermocouple wires: Determine the linear correlation between T and V

Solution: Tabulate the data using this table:

Another example The following measurements were obtained in the calibration of a pressure transducer: • Determine the best fit • straight line • Find the coefficient of • determination for the • best fit

Y=6.56x-0.06

From the result before we can find coeff of determination r2 by tabulating the following values r2=

Next Lecture Experimental Uncertainty Analysis End of Lecture 3

LECTURE 3: ANALYSIS OF EXPERIMENTAL DATA

LECTURE 3: ANALYSIS OF EXPERIMENTAL DATA

Presentation Transcript

the statistical analysis of data

Cluster Analysis: Basic Concepts and Algorithms

Missing Data: Analysis and Design

Bivariate data Correlation Coefficient of Determination Regression One-way Analysis of Variance (ANOVA)

Quantitative Data Analysis

Function-Oriented Software Design (lecture 5)

An Overview of Experimental Analysis of Academic Skills:

Dose-response analysis

Lecture series: Data analysis

Computational Movement Analysis Lecture 3: Curve simplification Joachim Gudmundsson

Environmental Data Analysis with MatLab

Experimental Particle Physics

Analysis of Complex Survey Data and Survival Analysis

Introduction to Classification Issues in Microarray Data Analysis

Experimental Design and Analysis of Variance: Basic Design

NUMERICAL ANALYSIS OF BIOLOGICAL AND ENVIRONMENTAL DATA

Lecture 5 – Categorical Data and Survival Analyses

Quarkonium experimental overview I

Chapter 2 Data Design and Implementation

LECTURE 2

Missing Data: Analysis and Design

NUMERICAL ANALYSIS OF BIOLOGICAL AND ENVIRONMENTAL DATA