Statistics and anova
Download
1 / 50

Statistics and ANOVA - PowerPoint PPT Presentation


  • 106 Views
  • Uploaded on

Statistics and ANOVA. ME 470 Spring 2012. Product Development Process. Concept Development. System-Level Design. Detail Design. Testing and Refinement. Production Ramp-Up. Planning. Concept Development Process. Mission Statement. Development Plan. Identify Customer Needs.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Statistics and ANOVA' - iorwen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Statistics and anova

Statistics and ANOVA

ME 470

Spring 2012


Product development process
Product Development Process

Concept

Development

System-Level

Design

Detail

Design

Testing and

Refinement

Production

Ramp-Up

Planning

Concept Development Process

Mission

Statement

Development

Plan

Identify

Customer

Needs

Establish

Target

Specifications

Generate

Product

Concepts

Select

Product

Concept(s)

Test

Product

Concept(s)

Set Final

Specifications

Plan

Downstream

Development

Perform Economic Analysis

Benchmark Competitive Products

Build and Test Models and Prototypes


We will use statistics to make good design decisions
We will use statistics to make good design decisions!

We will categorize populations by the mean, standard deviation, and use control charts to determine if a process is in control.

We may be forced to run experiments to characterize our system. We will use valid statistical tools such as Linear Regression, DOE, and Robust Design methods to help us make those characterizations.


Let s consider the toyota problem
Let’s consider the Toyota problem.

What was the first clue that there was a problem?

Starting in 2003, NHSTA received information regarding reports of accelerator pedals that were operating improperly.

How many reports causes the manufacturer to suspect a problem?

To issue a recall NHTSA would need to prove that a substantial number of failures attributable to the defect have occurred or is likely to occur in consumers’ use of the vehicle or equipment and that the failures pose an unreasonable risk to motor vehicle safety.

ODI conducted a VOQ-based assessment of UA rates on the subject Lexus in

comparison to two peer vehicles and concluded the Lexus LS400t vehicles were not overrepresented in the VOQ database.

How might we look at two populations and decide this?


How can we use statistics to make sense of data that we are getting
How can we use statistics to make sense of data that we are getting?

  • Quiz for the day

  • What can we say about our M&Ms?


What kinds of questions can we answer
What kinds of questions can we answer? getting?

  • What does the data look like?

  • What is the mean, the standard deviation?

  • What are the extreme points?

  • Is the data normal?

  • Is there a difference between years? Did one class get more M&Ms than another?

  • If you were packaging the M&Ms, are you doing a good job?

  • If you are the designer, what factors might cause the variation?


Stat basic statistics display descriptive statistics
> getting?Stat>Basic Statistics>Display Descriptive Statistics


Results for 2008, 2010, 2011 getting?(From the “Session”)



largest value excluding outliers getting?

B

o

x

p

l

o

t

o

f

B

S

N

O

x

Q3

2

.

4

5

2

.

4

0

(Q2), median

2

.

3

5

x

O

N

S

B

2

.

3

0

Q1

2

.

2

5

2

.

2

0

outliers are marked as ‘*’

smallest value excluding outliers

Assessing Shape: Boxplot

Values between 1.5 and 3 times away from the middle 50% of the data are outliers.



Anderson-Darling normality test: getting?

Used to determine if data follow a normal distribution. If the p-value is lower than the pre-determined level of significance, the data do not follow a normal distribution.


Anderson-Darling Normality Test getting?

Measures the area between the fitted line (based on chosen distribution) and the nonparametric step function (based on the plot points). The statistic is a squared distance that is weighted more heavily in the tails of the distribution. Anderson-Smaller Anderson-Darling values indicates that the distribution fits the data better.

The Anderson-Darling Normality test is defined as:

H0:  The data follow a normal distribution.  

Ha:  The data do not follow a normal distribution.  

Another quantitative measure for reporting the result of the normality test is the p-value. A small p-value is an indication that the null hypothesis is false. (Remember: If p is low, H0 must go.)

P-values are often used in hypothesis tests, where you either reject or fail to reject a null hypothesis. The p-value represents the probability of making a Type I error, which is rejecting the null hypothesis when it is true. The smaller the p-value, the smaller is the probability that you would be making a mistake by rejecting the null hypothesis.

It is customary to call the test statistic (and the data) significant when the null hypothesis H0 is rejected, so we may think of the p-value as the smallest level α at which the data are significant.


Note that our p value is quite low, which makes us consider rejecting the fact that the data are normal. However, in assessing the closeness of the points to the straight line, “imagine a fat pencil lying along the line. If all the points are covered by this imaginary pencil, a normal distribution adequately describes the data.” Montgomery, Design and Analysis of Experiments, 6th Edition, p. 39

If you are confused about whether or not to consider the data normal, it is always best if you can consult a statistician. The author has observed statisticians feeling quite happy with assuming very fat lines are normal.

For more on Normality and the Fat Pencil

http://www.statit.com/support/quality_practice_tips/normal_probability_plot_interpre.shtml


Walter Shewhart rejecting the fact that the data are normal. However, in assessing the closeness of the points to the straight line, “imagine a fat pencil lying along the line. If all the points are covered by this imaginary pencil, a normal distribution adequately describes the data.” Montgomery,

Developer of Control Charts in the late 1920’s

You did Control Charts in DFM. There the emphasis was on tolerances. Here the emphasis is on determining if a process is in control. If the process is in control, we want to know the capability.

www.york.ac.uk/.../ histstat/people/welcome.htm


What does this data tell us about our process
What does this data tell us about our process? rejecting the fact that the data are normal. However, in assessing the closeness of the points to the straight line, “imagine a fat pencil lying along the line. If all the points are covered by this imaginary pencil, a normal distribution adequately describes the data.” Montgomery,

SPC is a continuous improvement tool which minimizes tampering or unnecessary adjustments (which increase variability) by distinguishing between special cause and common cause sources of variation

Control Charts have two basic uses:

Give evidence whether a process is operating in a state of statistical control and to highlight the presence of special causes of variation so that corrective action can take place.

Maintain the state of statistical control by extending the statistical limits as a basis for real time decisions.

If a process is in a state of statistical control, then capability studies my be undertaken. (But not before!! If a process is not in a state of statistical control, you must bring it under control.)

SPC applies to design activities in that we use data from manufacturing to predict the capability of a manufacturing system. Knowing the capability of the manufacturing system plays a crucial role in selecting the concepts.


Voice of the process
Voice of the Process rejecting the fact that the data are normal. However, in assessing the closeness of the points to the straight line, “imagine a fat pencil lying along the line. If all the points are covered by this imaginary pencil, a normal distribution adequately describes the data.” Montgomery,

Control limits are not spec limits.

Control limits define the amount of fluctuation that a process with only common cause variation will have.

Control limits are calculated from the process data.

Any fluctuations within the limits are simply due to the common cause variation of the process.

Anything outside of the limits would indicate a special cause (or change) in the process has occurred.

Control limits are the voice of the process.


The capability index is defined as
The capability index is defined as: rejecting the fact that the data are normal. However, in assessing the closeness of the points to the straight line, “imagine a fat pencil lying along the line. If all the points are covered by this imaginary pencil, a normal distribution adequately describes the data.” Montgomery,

Cp = (allowable range)/6s = (USL - LSL)/6s

LSL

USL (Upper Specification Limit)

LCL

UCL (Upper Control Limit)

http://lorien.ncl.ac.uk/ming/spc/spc9.htm



Select all tests Individuals>Individuals


These test failures fall into the category of “special cause variations”, statistically unlikely events that are worth looking into as possible problems


Upper Control Limit cause variations”, statistically unlikely events that are worth looking into as possible problems

Lower Control Limit


Are the 2 distributions different
Are the 2 Distributions Different? cause variations”, statistically unlikely events that are worth looking into as possible problems

X Data

Single X

Multiple Xs

X Data

X Data

Discrete

Continuous

Discrete

Continuous

Discrete

Logistic Regression

Multiple Logistic Regression

Multiple Logistic Regression

Chi-Square

Discrete

Y Data

Y Data

Single Y

Continuous

One-sample t-test

Two-sample t-test

ANOVA

Y Data

Simple Linear Regression

Multiple Linear Regression

Continuous

ANOVA

Multiple Ys


When to use ANOVA cause variations”, statistically unlikely events that are worth looking into as possible problems

  • The use of ANOVA is appropriate when

    • Dependent variable is continuous

    • Independent variable is discrete, i.e. categorical

    • Independent variable has 2 or more levels under study

    • Interested in the mean value

    • There is one independent variable or more

  • We will first consider just one independent variable


Practical Applications cause variations”, statistically unlikely events that are worth looking into as possible problems

  • Compare 3 different suppliers of the same component

  • Compare 4 test cells

  • Compare 2 performance calibrations

  • Compare 6 combustion recipes through simulation

  • Compare our brake failure rate with other companies

  • Compare 3 distributions of M&M’s

  • And MANY more …


ANOVA cause variations”, statistically unlikely events that are worth looking into as possible problemsAnalysis of Variance

  • Used to determine the effects of categorical independent variables on the average response of a continuous variable

  • Choices in MINITAB

    • One-way ANOVA

      • Use with one factor, varied over multiple levels

    • Two-way ANOVA

      • Use with two factors, varied over multiple levels

    • Balanced ANOVA

      • Use with two or more factors and equal sample sizes in each cell

    • General Linear Model

      • Use anytime!


>Stat>ANOVA>General Linear Model cause variations”, statistically unlikely events that are worth looking into as possible problems



>Stat>ANOVA>General Linear Model ---Select Comparisons are different is correct

We use the Tukey comparison to determine if the years are different. Confidence intervals that contain zero suggest no difference.


Tukey Comparison are different is correct

Zero is contained in the interval.

The years are NOT different.

Zero is NOT contained in the interval.

The years are different.



What do you see with the boxplot
What do you see with the boxplot? are different is correct


What do you see with the boxplot1
What do you see with the boxplot? are different is correct


Do we see anything that looks unusual
Do we see anything that looks unusual? are different is correct


General Linear Model: stackedTotal versus StackedYear are different is correct

Factor Type Levels Values

StackedYear fixed 4 2004, 2005, 2006, 2009

Analysis of Variance for stackedTotal, using Adjusted SS for Tests

Source DF Seq SS Adj SS Adj MS F P

StackedYear 3 1165.33 1165.33 388.44 149.39 0.000 Look at low P-value!

Error 266 691.63 691.63 2.60

Total 269 1856.96

S = 1.61249 R-Sq = 62.75% R-Sq(adj) = 62.33%

Unusual Observations for stackedTotal

Obs stackedTotal Fit SE Fit Residual St Resid

25 27.0000 23.4667 0.2082 3.5333 2.21 R

34 20.0000 23.4667 0.2082 -3.4667 -2.17 R

209 40.0000 21.7917 0.1700 18.2083 11.36 R

215 21.0000 17.4917 0.2082 3.5083 2.19 R

R denotes an observation with a large standardized residual.


Grouping Information Using Tukey Method and 95.0% Confidence are different is correct

StackedYear N Mean Grouping

2004 60 23.5 A

2006 90 21.8 B

2005 60 20.7 C

2009 60 17.5 D

Means that do not share a letter are significantly different.

Tukey 95.0% Simultaneous Confidence Intervals

Response Variable stackedTotal

All Pairwise Comparisons among Levels of StackedYear

StackedYear = 2004 subtracted from:

StackedYear Lower Center Upper -------+---------+---------+---------

2005 -3.531 -2.775 -2.019 (---*---)

2006 -2.365 -1.675 -0.985 (-*--)

2009 -6.731 -5.975 -5.219 (--*--)

-------+---------+---------+---------

-5.0 -2.5 0.0

Zero is not contained in the intervals. Each year is statistically different. (2004 got the most!)


StackedYear = 2005 subtracted from: are different is correct

StackedYear Lower Center Upper -------+---------+---------+---------

2006 0.410 1.100 1.790 (-*--)

2009 -3.956 -3.200 -2.444 (--*--)

-------+---------+---------+---------

-5.0 -2.5 0.0

StackedYear = 2006 subtracted from:

StackedYear Lower Center Upper -------+---------+---------+---------

2009 -4.990 -4.300 -3.610 (--*--)

-------+---------+---------+---------

-5.0 -2.5 0.0


Implications for design
Implications for design are different is correct

  • Is there a difference in production performance between the plain and peanut M&Ms?


Individual Quiz are different is correct

Name:____________ Section No:__________ CM:_______

You will be given a bag of M&M’s. Do NOT eat the M&M’s.

Count the number of M&M’s in your bag. Record the number of each color, and the overall total. You may approximate if you get a piece of an M&M. When finished, you may eat the M&M’s. Note: You are not required to eat the M&M’s.


Instructions for minitab installation
Instructions for Minitab Installation are different is correct


Minitab on dfs
Minitab on DFS: are different is correct


ad