1 / 34

Statistics: The Science of Learning from Data - Data Collection, Analysis, Interpretation, Prediction, and Taking Action

This class focuses on the fundamental concepts of statistics, including data collection, analysis, interpretation, prediction, and the importance of taking action based on statistical predictions.

rogerv
Download Presentation

Statistics: The Science of Learning from Data - Data Collection, Analysis, Interpretation, Prediction, and Taking Action

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics: The Science of Learning from Data • Data Collection • Data Analysis • Interpretation • PredictionTake Action • W.E. Deming “The value of statistics and statisticians is to make predictions that form the basis for action”

  2. Data Collection • Observational Studies To study correlation in variables Prediction OK ---infer causation No! • Sampling Surveys Estimate Population Totals, Ratios etc. • Experimental Designs – to study cause and effect relationships “If you want to predict what will happen in the future when you do something”

  3. The only way to find out what happens when you manipulate a variable, is to go ahead and manipulate it, then observe the result!

  4. Typical Purposes for Experimentation • To determine principal causes of variation in a measured response • To find conditions that give rise to a maximum or minimum response • To determine if there is a difference in (or how big that difference is) between responses achieved at different settings of controllable variables • To obtain a mathematical model in order to predict future responses, when controllable variables are changed

  5. Goals of This Class • Students should be able to choose an experimental design plan that is appropriate for the research problem at hand • Students should be able to construct the design (including performing proper randomization and determining the required number of replicates) • Execute the plan to collect the data (or advise a researcher to do it) • Determine the appropriate model to fit the data • Fit the model to the data and check the appropriateness of the model • Interpret and explain the results in a meaningful way to answer the research question

  6. Some Basic Definitions • Experiment or Run – experimenter changes at least one of the items under study and observes the effect of his action • Experimental Unit – the “material” under study upon which something is changed • Treatment Factor or Independent Variable – a variable under study which is controlled at some level during a given experiment and varied from experiment to experiment, at the will of the experimenter

  7. Some Basic Definitions • Treatment Factor Levels – the different settings the • treatment factor that will be used throughout the • course of experimentation ●Background or Lurking Variable A variable the experimenter is unaware of, or cannot control, that may affect the outcome • Response or Dependent Variable – measurements • of experimental units that depend upon settings of • the factors.

  8. Some Basic Definitions • Effect – Change in the response caused by a change in the factor level • Replication – more than one experimental unit assigned to the same combination of treatment factor levels • Repeated measurements (Duplicates) – more one measures of the same characteristic of an experimental unit • Subsamples– observational unit, random subsample of the larger experimental unit

  9. Some Basic Definitions • Experimental Design – Collection of experiments or runs to be made • Confounded Factors – two or more factors are changed at the same time resulting in confused effects • Biased Factor – Background variable changes when factor is changed resulting in confused effect.

  10. Some Basic Definitions • Experimental Error – the difference between the response for a given experiment and the long run average of all potential experiments that could be made at the same factor settings. This is usually caused by inherent differences in experimental units • Sources of noise – anything that could cause the response for one experiment to be different than another (treatment factors, nuisance factors – variation in experimental units)

  11. Examples of Experimental Units Medical Experiments – human subjects Agriculture – individual plots of land Manufacturing – batch of raw materials If an experiment has to be run over a period of time with observations collected sequentially over time, the time of the run (or conditions that exist at the time of the run) or trial may be regarded as the experimental unit Experimental units should be representative of the material and conditions to which the conclusions of the experiment are applied

  12. Blocking • The act of grouping the experimental units together into similar groups or Blocks • Each treatment factor level will be tested on at least one experimental level within each Block

  13. Purpose of Blocking • Increase precision of treatment factor level comparisons by comparing treatment factor levels within homogeneous groups of experimental factors • Broaden the scope of the results by including blocks which are representative of all conditions where conclusions are to be applied

  14. Randomization • The act of assigning treatment factor levels to experimental units in a random manner (utilizing a table of random numbers or randomization computer algorithm)

  15. Purpose of Randomization • Prevent experimenter bias • Prevent systematic bias • Insure independence of experimental error

  16. Types of Experimental Designs

  17. Types of Experimental Designs Classify Sources of Variation Screen important factors Constrained optimization Unconstrained optimization Mechanistic modeling

  18. Planning Experiments • Define objectives • Identify experimental units • Define meaningful and measurable response • List independent and lurking variables • Run pilot tests • Make flow diagram of the experimental procedure for run • Choose experimental design • Determine number of replicates • Randomize experimental conditions to experimental units • Describe the method of data analysis • Provide timetable and budget

  19. Example 1 Problem: Orange cookies spread out while cooking

  20. Recipe is the same for both chocolate and orange cookies up to the point of adding the syrup Baking time and temperature is the same for both

  21. Hypothesis: Maybe the baking temperature must be modified for the orange cookie recipe Plan: Vary the oven temperature from one sheet of orange cookies to the next, and measure the diameter of each cookie and calculate the average for each tray of cookies. What is the Purpose for Experimenting ?

  22. The Plan

  23. What is the treatment factor? What is the response ? What is the experimental unit? What is an experiment ? What is the experimental design ? Are there any replicates or repeated reasurements? What other sources of noise exist (beside Treatment factor levels)? Could blocking or randomization help?

  24. Example 2 Problem: Want to increase average flight time of paper helicopters made from one 8.5×11 sheet of paper

  25. Adding to the tail width and length only increases Fw, therefore hold them constant at minimum

  26. Hypothesis: Changing wing-length and wing-width should affect average flight time, and if so an optimal combination should exist Plan: Construct four different prototypes to test, test each repeatedly, compare the average flight time

  27. What are the treatment factor(s)? What is the experimental unit? What other sources of variation exist (beside treatment factor levels)?

  28. Tomato Experiment Box Hunter and Hunter(1978) Why only 11 Plants? How will Yields Be measured? Will Plants be planted Far enough apart to Prevent fertilizer Bleeding over?

  29. Analysis 1 - Plot the Data! • Observations: • quite a bit of variation (factor of 2-3 in yield, low to high); • one possible outlier • evidence of trend toward decreasing yield along the row from position 1 - 11 • Conclusion: It matters more where you plant than what fertilizer you use! Has the positional trend been noted before?

More Related