basic concept of statistics l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Basic concept of statistics PowerPoint Presentation
Download Presentation
Basic concept of statistics

Loading in 2 Seconds...

play fullscreen
1 / 90

Basic concept of statistics - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on

Basic concept of statistics. Measures of central tendency. Measures of dispersion & variability. Measures of tendency central. Arithmetic mean (= simple average). Best estimate of population mean is the sample mean, X. measurement in population. summation. sample size.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Basic concept of statistics' - madeline


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
basic concept of statistics
Basic concept of statistics
  • Measures of central tendency
  • Measures of dispersion & variability
measures of tendency central
Measures of tendency central

Arithmetic mean (= simple average)

  • Best estimate of population mean is the sample mean, X

measurement in population

summation

sample size

index of measurement

measures of variability
Measures of variability

All describe how “spread out” the data

  • Sum of squares,sum of squared deviations from the mean
  • For a sample,
slide4

Why?

  • Average or mean sum of squares = variance, s2:
  • For a sample,
slide5
n – 1 represents the degrees of freedom, , or number of independent quantities in the estimate s2.

Greek letter “nu”

  • therefore, once n – 1 of all deviations are specified, the last deviation is already determined.
standard deviation s
Standard deviation, s
  • Variance has squared measurement units – to regain original units, take the square root
  • For a sample,
standard error of the mean
Standard error of the mean
  • Standard error of the mean is a measure of variability among the means of repeated samples from a population.
  • For a sample,
n 28 44 1 214

Body Weight Data (Kg)

N = 28μ = 44σ² = 1.214

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

repeated random sampling each with sample size n 5 values

Body Weight Data (Kg)

repeated random sampling, each with sample size, n = 5 values …

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

43

repeated random sampling each with sample size n 5 values10

Body Weight Data (Kg)

repeated random sampling, each with sample size, n = 5 values …

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

43 44

repeated random sampling each with sample size n 5 values11

Body Weight Data (Kg)

repeated random sampling, each with sample size, n = 5 values …

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

43 44 45

repeated random sampling each with sample size n 5 values12

Body Weight Data (Kg)

repeated random sampling, each with sample size, n = 5 values …

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

43 44 45 44

repeated random sampling each with sample size n 5 values13

Body Weight Data (Kg)

repeated random sampling, each with sample size, n = 5 values …

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

43 44 45 44 44

repeated random sampling each with sample size n 5 values14

Body Weight Data (Kg)

repeated random sampling, each with sample size, n = 5 values …

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

repeated random samples each with sample size n 5 values

Body Weight Data (Kg)

Repeated random samples, each with sample size, n = 5 values

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

46

repeated random samples each with sample size n 5 values16

Body Weight Data (Kg)

Repeated random samples, each with sample size, n = 5 values

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

46 44

repeated random samples each with sample size n 5 values17

Body Weight Data (Kg)

Repeated random samples, each with sample size, n = 5 values

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

46 44 46

repeated random samples each with sample size n 5 values18

Body Weight Data (Kg)

Repeated random samples, each with sample size, n = 5 values

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

46 44 46 45

repeated random samples each with sample size n 5 values19

Body Weight Data (Kg)

Repeated random samples, each with sample size, n = 5 values

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

46 44 46 45 44

repeated random samples each with sample size n 5 values20

Body Weight Data (Kg)

Repeated random samples, each with sample size, n = 5 values

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

repeated random samples each with sample size n 5 values21

Body Weight Data (Kg)

Repeated random samples, each with sample size, n = 5 values

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

42

repeated random samples each with sample size n 5 values22

Body Weight Data (Kg)

Repeated random samples, each with sample size, n = 5 values

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

42 42

repeated random samples each with sample size n 5 values23

Body Weight Data (Kg)

Repeated random samples, each with sample size, n = 5 values

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

42 42 43

repeated random samples each with sample size n 5 values24

Body Weight Data (Kg)

Repeated random samples, each with sample size, n = 5 values

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

42 42 43 45

repeated random samples each with sample size n 5 values25

Body Weight Data (Kg)

Repeated random samples, each with sample size, n = 5 values

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

42 42 43 45 43

repeated random samples each with sample size n 5 values26

Body Weight Data (Kg)

Repeated random samples, each with sample size, n = 5 values

A Population of Values

44

45

43

44

44

43

42

46

44

44

44

46

43

44

44

43

42

44

43

44

43

46

44

43

44

45

45

46

slide27
For a large enough number of large samples, the frequency distribution of the sample means (= sampling distribution), approaches a normal distribution.
testing statistical hypotheses between 2 means
Testing statistical hypotheses between 2 means
  • State the research question in terms of statistical hypotheses.

It is always started with a statement that hypothesizes “no difference”, called the null hypothesis = H0.

  • E.g., H0: Mean bill length of female hummingbirds is equal to mean bill length of male hummingbirds
slide30
Then we formulate a statement that must be true if the null hypothesis is false, called the alternate hypothesis = HA .
  • E.g., HA: Mean bill length of female hummingbirds is not equal to mean bill length of male hummingbirds

If we reject H0 as a result of sample evidence, then we conclude that HA is true.

slide31

William Sealey Gosset

(a.k.a. “Student”)

  • Choose an appropriate statistical test that would allow you to reject H0 if H0 were false.

E.g., Student’s t test for hypotheses about means

slide32

Mean of sample 1

Mean of sample 2

Standard error of the difference between the sample means

To estimate s(X1 - X2), we must first know

the relation between both populations.

t Statistic,

how to evaluate the success of this experimental design class
How to evaluate the success of this experimental design class
  • Compare the score of statistics and experimental design of several student
  • Compare the score of experimental design of several student from two serial classes
  • Compare the score of experimental design of several student from two different classes
comparing the score of statistics and experimental experimental design of several student
Comparing the score of Statistics and experimental experimental design of several student

Similar Student

Dependent populations

Identical Variance

Not Identical Variance

Different Student

Independent populations

Identical Variance

slide35

Comparing the score of experimental design of several student from two serial classes

Not Identical Variance

Independent populations

Different Student

Identical Variance

comparing the score of experimental design of several student from two classes
Comparing the score of experimental design of several student from two classes

Not Identical Variance

Different Student

Independent populations

Identical Variance

relation between populations
Relation between populations
  • Dependent populations
  • Independent populations
  • Identical (homogenous ) variance
  • Not identical (heterogeneous) variance
slide38

Dependent Populations

Sample

Null hypothesis:

The mean difference is equal too

Null distribution

t with n-1 df

*n is the number of pairs

Test statistic

compare

How unusual is this test statistic?

P > 0.05

P < 0.05

Reject Ho

Fail to reject Ho

slide41
When sample sizes are small, the sampling distribution is described better by the t distribution than by the standard normal (Z) distribution.

Shape of t distribution depends on degrees of freedom,  = n – 1.

slide42

Z = t(=)

t(=25)

t(=5)

t(=1)

t

slide43

For  = 0.05

0.025

0.95

0.025

The distribution of a test statistic is divided into an area of acceptance and an area of rejection.

Area of Acceptance

Area of Rejection

Area of Rejection

0

Lower critical value

Upper critical value

t

independent t test
Independent T-test
  • Compares the means of one variable for TWO groups of cases.
  • Statistical formula:

Meaning: compare ‘standardized’ mean difference

  • But this is limited to two groups. What if groups > 2?
    • Pair wised T Test (previous example)
    • ANOVA (ANalysis Of Variance)
from t test to anova
From T Test to ANOVA

1. Pairwise T-Test

If you compare three or more groups using t-tests with the usual 0.05 level of significance, you would have to compare each pairs (A to B, A to C, B to C), so the chance of getting the wrong result would be:

1 - (0.95 x 0.95 x 0.95)   =   14.3%

Multiple T-Tests will increase the false alarm.

from t test to anova49
From T Test to ANOVA

2. Analysis Of Variance

  • In T-Test, mean difference is used. Similar, in ANOVA test comparing the observed variance among means is used.
  • The logic behind ANOVA:
    • If groups are from the same population, variance among means will be small (Note that the means from the groups are not exactly the same.)
    • If groups are from different population, variance among means will be large.
what is anova
What is ANOVA?
  • ANOVA (Analysis of Variance) is a procedure designed to determine if the manipulation of one or more independent variables in an experiment has a statistically significant influence on the value of the dependent variable.
  • Assumption
    • Each independent variable is categorical (nominal scale). Independent variables are called Factors and their values are called levels.
    • The dependent variable is numerical (ratio scale)
  • The basic idea is that the “variance” of the dependent variable given the influence of one or more independent variables {Expected Sum of Squares for a Factor} is checked to see if it is significantly greater than the “variance” of the dependent variable (assuming no influence of the independent variables) {also known as the Mean-Square-Error (MSE)}.
slide52

ANOVA TABLE OF 2 POPULATIONS

S V

Mean

square

(M.S.)

SS

DF

SSB

Between populations

SSbetween

1

=

MSB

DFB

SSW

MSW

=

(r1 - 1)+ (r2 - 1)

SSWithin

Within populations

DFW

TOTAL SSTotal r1 + r2 -1

rationale for anova
Rationale for ANOVA
  • We can break the total variance in a study into meaningful pieces that correspond to treatment effects and error. That’s why we call this Analysis of Variance.

The Grand Mean, taken over all observations.

The mean of any group.

The mean of a specific group (1 in this case).

The observation or raw data for the ith subject.

the anova model
The ANOVA Model

Note:

A treatment effect

The grand mean

Error

Trial i

SS Total = SS Treatment + SS Error

analysis of variance
Analysis of Variance
  • Analysis of Variance(ANOVA) can be used to test for the equality of three or more population means using data obtained from observational or experimental studies.
  • Use the sample results to test the following hypotheses.
  • H0: 1=2=3=. . . = k

Ha: Not all population means are equal

  • If H0 is rejected, we cannot conclude that all population means are different.
  • Rejecting H0 means that at least two population means have different values.
assumptions for analysis of variance
Assumptions for Analysis of Variance
  • For each population, the response variable is normally distributed.
  • The variance of the response variable, denoted2, is the same for all of the populations.
  • The effect of independent variable is additive
  • The observations must be independent.
analysis of variance testing for the equality of t population means
Analysis of Variance:Testing for the Equality of t Population Means
  • Between-Treatments Estimate of Population Variance
  • Within-Treatments Estimate of Population Variance
  • Comparing the Variance Estimates: The F Test
  • ANOVA Table
between treatments estimate of population variance
Between-Treatments Estimate of Population Variance
  • A between-treatments estimate ofσ2 is called the mean square due to treatments(MSTR).
  • The numerator of MSTR is called the sum of squares due to treatments(SSTR).
  • The denominator of MSTR represents the degrees of freedom associated with SSTR.
within treatments estimate of population variance
Within-Treatments Estimate of Population Variance
  • The estimate of2based on the variation of the sample observations within each treatment is called the mean square due to error(MSE).
  • The numerator of MSE is called the sum of squares due to error(SSE).
  • The denominator of MSE represents the degrees of freedom associated with SSE.
comparing the variance estimates the f test
Comparing the Variance Estimates: The F Test
  • If the null hypothesis is true and the ANOVA assumptions are valid, the sampling distribution of MSTR/MSE is an F distribution with MSTR d.f. equal to k - 1 and MSE d.f. equal tonT - k.
  • If the means of the k populations are not equal, the value of MSTR/MSE will be inflated because MSTR overestimatesσ2.
  • Hence, we will reject H0 if the resulting value of MSTR/MSEappears to be too large to have been selected at random from the appropriate F distribution.
test for the equality of k population means
Test for the Equality of k Population Means
  • Hypotheses

H0: 1=2=3=. . . = k

Ha: Not all population means are equal

  • Test Statistic

F = MSTR/MSE

test for the equality of k population means62
Test for the Equality of k Population Means
  • Rejection Rule

Using test statistic: RejectH0if F > Fa

Using p-value: RejectH0if p-value < a

where the value of Fa is based on an F distribution with t - 1 numerator degrees of freedom and nT - t denominator degrees of freedom

sampling distribution of mstr mse
Sampling Distribution of MSTR/MSE

The figure below shows the rejection region associated with a level of significance equal towhere Fdenotes the critical value.

Do Not Reject H0

Reject H0

MSTR/MSE

F

Critical Value

anova table
ANOVA Table

Source of Sum of Degrees of Mean

Variation Squares Freedom Squares F

TreatmentSSTR k- 1 MSTR MSTR/MSE

Error SSE nT - k MSE

TotalSST nT - 1

SST divided by its degrees of freedom nT - 1 is simply the overall sample variance that would be obtained if we treated the entire nT observations as one data set.

what does anova tell us
What does Anova tell us?

ANOVA will tell us whether we have sufficient evidence to say that measurements from at least one treatment differ significantly from at least one other.

It will not tell us which ones differ, or how many differ.

anova vs t test
ANOVA vs t-test
  • ANOVA is like a t-test among multiple data sets simultaneously
    • t-tests can only be done between two data sets, or between one set and a “true” value
  • ANOVA uses the F distribution instead of the t-distribution
  • ANOVA assumes that all of the data sets have equal variances
    • Use caution on close decisions if they don’t
anova a hypothesis test
ANOVA – a Hypothesis Test
  • H0: There is no significant difference among the results provided by treatments.
  • Ha: At least one of the treatments provides results significantly different from at least one other.
slide68

Linear Model

Yij=  + j+ ij

By definition, j = 0

t

j=1

The experiment produces

(r x t) Yij data values.

The analysis produces estimates of,t. (We can then get estimates of theijby subtraction).

slide69

1 2 3 4 5 6 … t

Y11 Y12Y13Y14Y15 Y16 … Y1t

Y21 Y22Y23Y24Y25 Y26 … Y2t

Y31 Y32 Y33Y34Y35Y36… Y3t

Y41 Y42 Y43Y44Y45 Y46… Y4t

. . . . . . … .

. . . . . . … .

. . . . . . … .

Yr1 Yr2Yr3Yr4Yr5Yr6… Yrt

_______________________________________________________________________________

__ __ __ __ __ __ __

Y.1 Y.2 Y.3Y.4Y.5 Y.6… Y.t

_

_

Y•1, Y•2, …, are Column Means

slide70

Y• •= Y• j /t= “GRAND MEAN”

(assuming same # data points in each column)

(otherwise, Y• • = mean of all the data)

t

j=1

slide71

MODEL: Yij =  + j + ij

Y• • estimates

Y • j - Y • • estimatesj(= mj – m)

(for all j)

These estimates are based on Gauss’ (1796)

PRINCIPLE OF LEAST SQUARES

and on COMMON SENSE

slide72

MODEL: Yij =  + j + ij

If you insert the estimates into the MODEL,

(1) Yij = Y • • + (Y•j - Y • • ) + ij.

<

it follows that our estimate ofijis

(2) ij = Yij - Y•j

<

slide73

Then, Yij = Y• • + (Y• j - Y• • ) + ( Yij - Y• j)

or, (Yij - Y• • ) = (Y•j - Y• •) + (Yij - Y•j )

{

{

{

(3)

Variability

in Y

associated

with all other factors

Variability

in Y

associated

with X

TOTAL

VARIABILITY

in Y

+

=

slide74

If you square both sides of (3), and double sum both sides (over i and j), you get, [after some unpleasant algebra, but lots of terms which “cancel”]

{{

t

t r

t r

(Yij - Y• • )2 = R •  (Y•j - Y• •)2 + (Yij - Y•j)2

{

j=1

j=1 i=1

j=1 i=1

(

SSW (SSE)

SUM OF SQUARES WITHIN COLUMNS

TSS

TOTAL SUM OF SQUARES

SSBC

SUM OF

SQUARES BETWEEN COLUMNS

+

+

=

=

(

(

(

(

(

slide75

ANOVA TABLE

S V

Mean

square

(M.S.)

SS

DF

Between

Columns (due to brand)

SSBC

SSBc

t - 1

=

MSBC

t- 1

SSW

MSW

=

(r - 1) •t

SSW

Within Columns (due to error)

(r-1)•t

TOTAL TSS tr -1

slide76

Hypothesis,

HO: 1 = 2 = • • • c = 0

HI: not all j = 0

Or

(All column means are equal)

HO: 1 = 2 = • • • • c

HI: not all j are EQUAL

slide77

The probability Law of

MSBC

= “Fcalc” , is

MSW

The F - distribution with (t-1, (r-1)t)

degrees of freedom

Assuming

HO true.

Table Value

example reed manufacturing
Example: Reed Manufacturing

Reed would like to know if the mean number of hours worked per week is the same for the department managers at her three manufacturing plants (Buffalo, Pittsburgh, and Detroit).

A simple random sample of 5 managers from each of

the three plants was taken and the number of hours

worked by each manager for the previous week

example reed manufacturing79
Example: Reed Manufacturing
  • Sample Data

Plant 1 Plant 2 Plant 3

Observation Buffalo Pittsburgh Detroit

1 48 73 51

2 54 63 63

3 57 66 61

4 54 64 54

5 62 74 56

Sample Mean 55 68 57

Sample Variance26.0 26.5 24.5

example reed manufacturing80
Example: Reed Manufacturing
  • Hypotheses

H0: 1= 2= 3

Ha: Not all the means are equal

where:

 1 = mean number of hours worked per week by the managers at

Plant 1

 2 = mean number of hours worked per week by the managers at

Plant 2

 3 = mean number of hours worked per week by the managers at

  • Plant 3
example reed manufacturing81
Example: Reed Manufacturing
  • Mean Square Due to Treatments

Since the sample sizes are all equal

μ= (55 + 68 + 57)/3 = 60

SSTR = 5(55 -60)2+ 5(68 -60)2+ 5(57 -60)2= 490

MSTR = 490/(3 - 1) = 245

  • Mean Square Due to Error

SSE = 4(26.0) + 4(26.5) + 4(24.5) = 308

MSE = 308/(15 - 3) = 25.667

=

example reed manufacturing82
Example: Reed Manufacturing
  • F - Test

If H0 is true, the ratio MSTR/MSE should be

near 1 because both MSTR and MSE are estimating2.

If Ha is true, the ratio should be significantly larger than 1 because MSTR tends to overestimate2.

example reed manufacturing83
Example: Reed Manufacturing
  • Rejection Rule

Using test statistic: Reject H0 if F > 3.89

Using p-value : Reject H0 if p-value < .05

where F.05 = 3.89 is based on an F distribution with 2 numerator degrees of freedom and 12 denominator degrees of freedom

example reed manufacturing84
Example: Reed Manufacturing
  • Test Statistic

F = MSTR/MSE = 245/25.667 = 9.55

  • Conclusion

F = 9.55 > F.05 = 3.89, so we reject H0. The mean number of hours worked per week by department managers is not the same at each plant.

example reed manufacturing85
Example: Reed Manufacturing
  • ANOVA Table

Source of Sum of Degrees of Mean

Variation Squares Freedom Square F

Treatments 490 2 245 9.55

Error 308 12 25.667

Total 798 14

using excel s anova single factor tool
Using Excel’s Anova: Single Factor Tool
  • Step 1Select the Tools pull-down menu
  • Step 2Choose the Data Analysis option
  • Step 3 Choose Anova: Single Factor

from the list of Analysis Tools

using excel s anova single factor tool87
Using Excel’s Anova: Single Factor Tool
  • Step 4When the Anova: Single Factor dialog box appears:

Enter B1:D6 in the Input Range box

Select Grouped By Columns

Select Labels in First Row

Enter .05 in the Alpha box

Select Output Range

Enter A8 (your choice) in the Output Range box

Click OK

using excel s anova single factor tool88
Using Excel’s Anova: Single Factor Tool
  • Value Worksheet (top portion)
using excel s anova single factor tool89
Using Excel’s Anova: Single Factor Tool
  • Value Worksheet (bottom portion)
using excel s anova single factor tool90
Using Excel’s Anova: Single Factor Tool
  • Using the p-Value
    • The value worksheet shows that the p-value is .00331
    • The rejection rule is “Reject H0 if p-value < .05”
    • Thus, we reject H0 because the p-value = .00331 <a= .05
    • We conclude that the mean number of hours worked per week by the managers differ among the three plants