1 / 64

Bruce Mayer, PE Licensed Electrical & Mechanical Engineer BMayer@ChabotCollege

Engr/Math/Physics 25. Chp7 Statistics-1. Bruce Mayer, PE Licensed Electrical & Mechanical Engineer BMayer@ChabotCollege.edu. Learning Goals. Use MATLAB to solve Problems in Statistics Probability Use Monte Carlo (random) Methods to Simulate Random processes

caitir
Download Presentation

Bruce Mayer, PE Licensed Electrical & Mechanical Engineer BMayer@ChabotCollege

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Engr/Math/Physics 25 Chp7Statistics-1 Bruce Mayer, PE Licensed Electrical & Mechanical EngineerBMayer@ChabotCollege.edu

  2. Learning Goals • Use MATLAB to solve Problems in • Statistics • Probability • Use Monte Carlo (random) Methods to Simulate Random processes • Properly Apply Interpolation or Extrapolation to Estimate values between or outside of know data points

  3. Histogram • Histograms are COLUMN Plots that show the Distribution of Data • Height Represents Data Frequency • Some General Characteristics • Used to represent continuous grouped, or BINNED, data • BIN  SubRange within the Data • Usually Does not have any gaps between bars • Areas represent %-of-Total Data

  4. HistoGram ≡ Frequency Chart • A HistoGram shows how OFTEN some event Occurs • Histograms areoften constructedusing FrequencyTables

  5. MATLAB has 6 Forms of the Histogram Cmd The Simplest Histograms In MATLAB TmaxOAK = [70, 75, 63, 64, 65, 66, 65, 65, 67, 78, 75, 73, 79, 71, 72, 67, 69, 69, 70, 74, 71, 72, 71, 74, 77, 77, 86, 90, 90, 70, 71, 66, 66, 72, 68, 73, 72, 82, 91, 82, 76, 75, 72, 72, 69, 70, 68, 65, 67, 65, 63, 64, 72, 70, 68, 71, 77, 65, 63, 69, 69, 67] Hist(y) • Generates a Histogram with 10 bins • Example: Max Temp at Oakland AirPort in Jul-Aug08 • The Plot Statement hist(TmaxOAK), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Oakland Airport - Jul-Aug08')

  6. hist Result for Oakland • It was COLD in Summer 08 • Bin Width = (91-63)/10 = 2.8 °F

  7. Next Example: Max Temp at Stockton AirPort in Jul-Aug08 Histograms In MATLAB TmaxSTK = [94, 98, 93, 94, 91, 96, 93, 87, 89, 94, 100, 99, 103, 103, 103, 97, 91, 83, 84, 90, 89, 95, 94, 99, 97, 94, 102, 103, 107, 98, 86, 89, 95, 91, 84, 93, 98, 104, 105, 107, 103, 91, 90, 96, 93, 86, 92, 93, 95, 95, 86, 81, 93, 97, 96, 97, 101, 92, 89, 92, 93, 94] Hist(y) • Generates a Histogram with 10 bins • The Plot Statement hist(TmaxSTK), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title(‘Stockton Airport - Jul-Aug08')

  8. hist Result for Stockton • It was HOT in Summer 08 • Bin Width = (107-81)/10 = 2.6 °F

  9. Adjust The number and width of the bins using hist Command Refinements • Consider Summer 08 Max-Temp Data from Oakland and Stockton hist(y,N) hist(y,x) • Where • N  an integer specifying the NUMBER of Bins • x  A vector that Specs CENTERs of the Bins • Make 2 Histograms • 17 bins • 60F→110F by 2.5’s

  10. hist Plots  17 Bins hist(TmaxOAK,17), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Oakland, CA - Jul-Aug08') >> hist(TmaxSTK,17), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Stockton, CA - Jul-Aug08')>>

  11. hist Plots  Same Scale >> x = [60:2.5:110]; hist(TmaxOAK,x), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Oakland, CA - Jul-Aug08') >> x = [60:2.5:110]; >> hist(TmaxSTK,x), ylabel('No. Days'), xlabel('Max. Temp (°F)'), title('Stockton, CA - Jul-Aug08')

  12. Hist can also provide numerical Data about the Histogram hist Numerical Output k = 2 5 1 10 16 7 9 2 7 3 • We can also spec the number and/or Width of Bins n = hist(y) >> k13 = hist(TmaxSTK,13) k13 = 2 2 4 4 6 10 10 7 5 2 6 2 2 >> k2_5s = hist(TmaxOAK,x) • Gives the number of values in each of the (default) 10 Bins • For the Stockton data

  13. hist Numerical Output • Bin-Count and Bin-Locations (Frequency Table) for the Oakland Data >> [u, v] = hist(TmaxOAK,x) u = 0 3 11 7 159 6 4 1 2 1 0 3 0 0 0 0 0 0 0 0 v = 60.0000 62.5000 65.0000 67.5000 70.0000 72.5000 75.0000 77.5000 80.0000 82.5000 85.0000 87.5000 90.0000 92.5000 95.0000 97.5000 100.0000 102.5000 105.0000 107.5000 110.0000

  14. Histogram Commands - 1

  15. Histogram Commands - 2

  16. Make Line-Plot of Temp Data for Stockton, CA Use the Tools Menu to find the Data Statistics Tool Data Statistics Tool - 1 Time for LIVE Demo

  17. Use the Tool to Add Plot Lines for The Mean ±StdDev Data Statistics Tool - 2

  18. Quite a Nice Tool, Actually The Result Data Statistics Tool - 3 • The Avg Max Temp Was 96.97 °F

  19. Probability • Probability  The LIKELYHOOD that a Specified OutCome Will be Realized • The “Odds” Run from 0% to 100% • Class Question: What are the Odds of winning the California MEGA-MILLIONS Lottery? Exactly! 175 711 536 : 1

  20. 175 711 536 ... EXACTLY???!!! • To Win the MegaMillions Lottery • Pick five numbers from 1 to 56 • Pick a MEGA number from 1 to 46 • The Odds for the 1st ping-pong Ball = 5 out of 56 • The Odds for the 2nd ping-pong Ball = 4 out of 55, and so On • The Odds for the MEGA are 1 out of 46

  21. 175 711 536 ... Calculated • Calc the OverAll Odds as the PRODUCT of each of the Individual OutComes • This is Technically a COMBINATION

  22. 175 711 536 ... is a DEAL! • The ORDER in Which the Ping-Pong Balls are Drawn Does NOT affect the Winning Odds • If we Had to Match the Pull-Order: • This is a PERMUTATION

  23. Consider Data on the Height of a sample group of 20 year old Men Normal Distribution - 1 • We can Plot this Frequency Data using bar >> y_abs=[1,0,0,0,2,4,5,4,8,11,12,10,9,8,7,5,4,4,3,1,1,0,1]; >> xbins = [64:0.5:75]; >> bar(xbins, y_abs), ylabel('No.'), xlabel('Height (Inches'), title('Height of 20 Yr-Old Men')

  24. We can also SCALE the Bar/Hist such that the AREA UNDER the CURVE equals 1.00, exactly Normal Distribution - 2 • The Game Plan for Scaling • Calc the Height of Each Bar To Get the Total Area = [Bin Width] x [Σ(individual counts)] • The individual Bar Area =[Bin Width] x [individual count] • %-Area any one bar → [Bar Areas]/[Total Area]

  25. We can Use bar to Plot the Scaled-Area Hist. Normal Distribution - 3 >>y_abs=[1,0,0,0,2,4,5,4,8,11,12,10,9,8,7,5,4,4,3,1,1,0,1]; >> xbins = [64:0.5:75]; >> TotalArea = sum(0.5*y_abs) >> y_scale = 100*y_abs/TotalArea; >> bar(xbins, y_scale), ylabel('Fraction (%/inch)'), xlabel('Height (inches)'), title('Height of 20 Yr-Old Men')

  26. This is a Good Time for a UNITS Check Remember, our GOAL → the Area Under the Curve = 1 Recall From the Plot the UNITS for the y-axis → %/inch (?) The Units come from these MATLAB Statements Normal Distribution - 4 TotalArea = sum(0.5*y_abs) Bin Width in INCHES • So TotalArea is in inches•No. • Now y_scale y_scale = 100*y_abs/TotalArea; • Cont. on Next Slide

  27. The Units Analysis for y-scale Normal Distribution - 5 • Recall From MTH1 that for y = f(x) displayed in BAR Form the Area Under the Curve y_scale = 100*y_abs/TotalArea;

  28. In this Case y(x) → y_scalein %/inch Δx → Bin Width = 0.5 in inches Then The Units Analysis for Our “integration” Normal Distribution - 6 • Check the integration Example

  29. Normal Distribution - 7 • The 71” Bar Area = Hgt•Width: • Example  71” • Alternatively from the Absolute values • The Total Abs Area = 50 No.•inch

  30. Because the Area Under the Scaled Plot is 1.00, exactly, The FRACTIONAL Area under any bar, or set-of-bars gives the probability that any randomly Selected 20 yr-old man will be that height e.g., from the Plot we Find 67.5 in → 8 %/in 68 in → 16 %/in 68.5 in → 22%/in Summing → 46 %/in Multiply the Uniform BinWidth of 0.5 in → 23% of 20 yr-old men are 67.25-68.75 inches tall Probability Distribution Fcn (PDF)

  31. Random Variable • A random variable x takes on a defined set of values with different probabilities; e.g.. • If you roll a die, the outcome is random (not fixed) and there are 6 possible outcomes, each of which occur with equal probability of one-sixth. • If you poll people about their voting preferences, the percentage of the sample that responds “Yes on Proposition 101” is a also a random variable • the %-age will be slightly differently every time you poll. • Roughly, probability is how frequently we expect different outcomes to occur if we repeat the experiment over and over (“frequentist” view)

  32. Random variables can be Discrete or Continuous • Discrete random variables have a countable number of outcomes • Examples: Dead/Alive, Red/Black, Heads/Tales, dice, counts, etc. • Continuous random variables have an infinite continuum of possible values. • Examples: blood pressure, weight, Air Temperature, the speed of a car, the real numbers from 1 to 6.

  33. Probability Distribution Functions • A Probability Distribution Function (PDF) maps the possible values of x against their respective probabilities of occurrence, p(x) • p(x) is a number from 0 to 1.0, or alternatively, from 0% to 100%. • The area under a probability distribution function curve is always 1 (or 100%).

  34. x p(x) 1 p(x=1)=1/6 2 p(x=2)=1/6 3 p(x=3)=1/6 4 p(x=4)=1/6 5 p(x=5)=1/6 6 p(x=6)=1/6 Discrete Example: Roll The Die 1/6 1 2 3 4 5 6

  35. Continuous Case • The probability function that accompanies a continuous random variable is a continuous mathematical function that integrates to 1. • The Probabilities associated with continuous functions are just areas under a Region of the curve (→ Definite Integrals) • Probabilities are given for a range of values, rather than a particular value • e.g., the probability of getting a math SAT score between 700 and 800 is 2%).

  36. Continuous Case PDF Example • Recall the negative exponential function (in probability, this is called an “exponential distribution”): • This Function Integrates to 1 zero to infinity as required for all PDF’s

  37. 1 2 Continuous Case PDF Example • The probability that x is any exact value (e.g.: 1.9976) is 0 • we can ONLY assign Probabilities to possible RANGES of x • For example, the probability of x falling within 1 to 2: p(x)=e-x 1 x p(x)=e-x NO Area Under a LINE 1 x

  38. The Man-Height HistroGram had some Limited, and thus DISCRETE, Data If we were to Measure 10,000 (or more) young men we would obtain a HistoGram like this Gaussian Curve • As We increase the number and fineness of the measurements The PDF approaches a CONTINUOUS Curve

  39. Gaussian Distribution • A Distribution that Describes Many Physical Processes is called the GAUSSIAN or NORMAL Distribution • Gaussian (Normal) distribution • Gaussian → famous “bell-shaped curve” • Describes IQ scores, how fast horses can run, the no. of Bees in a hive, wear profile on old stone stairs... • All these are cases where: • deviation from mean is equally probable in either direction • Variable is continuous (or large enough integer to look continuous)

  40. Normal Distribution • Real-valued PDF: f(x) → −∞ < x < +∞ • 2 independent fitting parameters: µ , σ (central location and width) • Properties: • Symmetrical about Mode at µ , • Median = Mean = Mode, • Inflection points at ±σ • Area (probability of observing event) within: • ± 1σ = 0.683 • ± 2σ = 0.955 • For larger σ, bell shaped curve becomes wider and lower (since area =1 for any σ)

  41. Normal Distribution • Mathematically • Where • σ2 = Variance • µ = Mean • The Area Under the Curve

  42. 68-95-99.7 Rule for Normal Dist 68% of the data σ σ 95% of the data 2σ 2σ 99.7% of the data 3σ 3σ

  43. 68-95-99.7 Rule in Math terms… • Using Definite-Integral Calculus

  44. How Good is the Rule for Real? • Check some example data: • The mean, µ, of the weight of a large group of women Cross Country Runners = 127.8 lbs • The standard deviation (σ) for this Group = 15.5 lbs

  45. 112.3 143.3 68% of 120 = .68x120 = ~ 82 runners In fact, 79 runners fall within 1σ (15.5 lbs) of the mean 127.8

  46. 96.8 158.8 95% of 120 = .95 x 120 = ~ 114 runners In fact, 115 runners fall within 2σ of the mean 127.8

  47. 81.3 174.3 99.7% of 120 = .997 x 120 = 119.6 runners In fact, all 120 runners fall within 3σ of the mean 127.8

  48. The Location & Width Parameters, µ & σ, are Calculated from the ENTIRE POPULATION Mean, µ Estimating µ & σ (1) • Standard Deviation, σ • For LARGE Populations it is usually impractical to measure all the xk • In this case we take a Finite SAMPLE to ESTIMATE µ & σ • Variance, σ2

  49. Say we want to characterize Miles/Yr driven by Every Licensed Driver in the USA We assume that this is Normally Distributed, so we take a Sample of N = 1013 Drivers Estimating µ & σ (2) • We Take the Mean of the SAMPLE • Use the SAMPLE-Mean to Estimate the POPULATION-Mean

  50. Now Calc the SAMPLE Variance & StdDev Estimating µ & σ (3) • Estimate • standard deviation: positive square root of the variance • small std dev: observations are clustered tightly around a central value • large std dev: observations are scattered widely about the mean • Number decreased from N to (N – 1) To Account for case where N = 1 • In this case x-bar = x1, and the S2 result is meaningless

More Related