Data Analysis I

105 Views

Download Presentation
## Data Analysis I

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Data Analysis I**Anthony E. Butterfield CH EN 4903-1 "When a man finds a conclusion agreeable, he accepts it without argument, but when he finds it disagreeable, he will bring against it all the forces of logic and reason." ~ Thucydides (460 – 395 BC)**Data Analysis**• Reasons for data analysis using our p data. • Basics of data analysis. • Statistics. • Probability distributions. • Confidence Intervals. • Error Propagation. • Rejecting data. • Hypothesis Testing. • Fitting data. http://www.che.utah.edu/~geoff/writing/index.html**Analysis of Our Experiment**• Hypothesis:Stuff that look like circles are circles. • We have our data…What now? Is the hypothesis true? Object Name Width Perimeter Battery 4.4 ± 0.1 14.0 ± 0.1 Scotch Tape 2.6 ± 0.0 8.2 ± 0.0 Duct Tape 5.3 ± 0.1 16.8 ± 0.3 Floppy 6.3 ± 0.1 19.0 ± 1.0 Fitting 8.8 ± 0.0 27.7 ± 0.2 Gold Doubloon 3.5 ± 0.0 10.7 ± 0.2 Red Cap 4.1 ± 0.5 12.9 ± 1.0 White Cap 4.0 ± 0.0 12.5 ± 0.0 Black Cap 7.8 ± 0.0 24.6 ± 0.0 Soup Can 6.7 ± 0.5 21.3 ± 0.1 Frisbee 8.8 ± 0.1 27.8 ± 0.5 Poker Chip 27.0 ± 0.5 85.0 ± 1.0 Toy Wheel 5.6 ± 0.1 17.1 ± 0.2 Spool of Wire 25.9 ± 0.1 81.5 ± 0.1 Plastic Cup 9.8 ± 0.0 31.4 ± 0.0 Paper Cup 2.9 ± 0.0 9.3 ± 0.0**Results from Our Experiment**• Good news: The average “” we found is pretty close to . • But is it close enough? • Other issues: Precision, accuracy, types of error?**For or Against**• “” ≈ • Confidence in our hypothesis is increased. • Nothing is “proven”. • Publish results:A.E.Butterfield, et al., “The Circularity of Circular Looking Stuff”, Nature, 2009. • “” Does Not ≈ • Confidence in our hypothesis is diminished. • Going against robust “theory”: Check methods, calculations, take more data… • Good luck publishing….**Data Analysis, Big Picture**• We need an objective means to avoid Thucydides‘ criticism, and impartially choose whether our data supports or undermines our favored hypothesis. • "The method of science, as stodgy and grumpy as it may seem, is far more important than the findings of science." ~ Carl Sagan, The Demon Haunted World**Types of Data Analysis**• Quality vs Quantity • Quantitative • “The temperature is 45.2 ± 0.1 °C (95% CL).” • Semi-Quantitative • “The temperature is above 0 °C.” • Qualitative • “It’s hot.” • Structural Analysis – What is its structure? • Content Analysis – What is in it? • Distribution Analysis – Where is it? • Process Analysis – When does it occur?**Basics of Statistics**• Mean: • Deviation: • Standard Deviation: • Variance:**Discrete Probability Distributions**• Random variable x can take on n different values, x1, x2…, xn, with probabilities of P1, P2…, Pn, respectively. • Examples:**Continuous Probability Distributions**• A probability density function that describes the probability that a continuous variable will fall within a particular range. • Examples:**Central Limit Theorem**• The sum of a sufficiently large number of independent and identically distributed random variables has a normal distribution, regardless of the original distribution:**Normal Distribution**• AKA: Gaussian distribution, bell curve. • One of the most common distributions in nature and, therefore, data analysis. • Probability density function (PDF):**Normal Cumulative Distribution Function**• Integrate PDF from -∞ to x. • The probability that a value will be below x.**Normal Examples**0% • What is the probability, with =0 and =1, of the measurement being exactly 0? • What is the probability of measuring a value between -0.5 and 1.5 , with =0 and =1? • Between -1 and 3 if with =0 and =2? 62% 62%**An Abnormal Distribution**• Log-normal Distribution • Used when random variables multiply. • Particles often take this distribution.**Normal Confidence Intervals**• A range that a parameter lies within, given a certain probability. • Confidence intervals for normal distributions:**C.I. for Single Measurements**• Gauges / Rulers. • Estimated by the distinguishable increments. • In our experiment? • Digital readouts. • Often ± the smallest digital precision available. • Fluctuating values. • Use the range of fluctuation over an appropriate amount of time. 2.41±0.01 (0.4%) 3.11±0.01 (0.3%) 1.27±0.02 (1.6%)**Error Propagation**• For addition or subtraction intuition may be: • But it is unlikely the extremes or error will occur twice: • Multiplication or division • Our p data: • In general:**Some Examples**• Calculate interfacial tension between a liquid and a solid: • If T = 25 ± 1 °C and P = 101 ± 2 KPa, what is v? T’s Contribution P’s Contribution**A Better, Numerical Method**• Can be used for problems which are solved numerically. • May add or subtract si and get different results.**An Example**• If T = 25 ± 1 °C and P = 101 ± 2 KPa, what is v of an ideal gas? P is biggest source of error**Chauvenet’s Criterion**• A statistically justifiable means of rejecting outlying data may be desired (illegitimate error). • The probability of taking a certain measurement on a normal distribution times the number of measurements must be less than 50%. • Tossing data out is suspect, though; avoid it.**Example of Chauvenet’s Criterion**• Data from our circle experiment: • We could toss the “floppy”datum. • Would make ouraverage p=3.1465,verses 3.138. • Further from p.