- 76 Views
- Uploaded on
- Presentation posted in: General

MGMT 276: Statistical Inference in Management Spring, 2014

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Welcome

Green sheets

Please click in

My last name starts with a

letter somewhere between

A. A – D

B. E – L

C. M – R

D. S – Z

For our class

Due Tuesday

April 29th

For our class

Due Tuesday

April 29th

Remember…

In a negatively skewed distribution:

mean < median < mode

97 = mode = tallest point

87 = median = middle score

83 = mean = balance point

Frequency

Score on Exam

Note:

Always “frequency”

Mean

Mode

Median

Note:

Label and Numbers

Readings for next exam

(Exam 4: May 1st)

Lind

Chapter 13: Linear Regression and Correlation

Chapter 14: Multiple Regression

Chapter 15: Chi-Square

Plous

Chapter 17: Social Influences

Chapter 18: Group Judgments and Decisions

- Exam 4 – Optional Times for Final
- Two options for completing Exam 4
- Thursday (5/1/14) – The regularly scheduled time
- Tuesday (5/6/14) – The optional later time
- Must sign up to take Exam 4 on Tuesday (4/29)

- Only need to take one exam – these are two optional times

Homework due – Thursday (April 24th)

- On class website:
- Please print and complete homework worksheet #18
- Hypothesis Testing with Correlations

Use this as your

study guide

Logic of hypothesis testing with Correlations

Interpreting the Correlations and scatterplots

Simple and Multiple Regression

Using correlation for predictions

r versus r2

Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent)Coefficient of correlation is name for “r”Coefficient of determination is name for “r2”(remember it is always positive – no direction info)Standard error of the estimate is our measure of the variability of the dots around the regression line(average deviation of each data point from the regression line – like standard deviation)

Coefficient of regression will “b” for each variable (like slope)

Rory is an owner of a small software company and employs 10 sales staff. Rory send his staff all over the world consulting, selling and setting up his system. He wants to evaluate his staff in terms of who are the most (and least)productive sales people and also whether more sales calls actually result in more systems being sold. So, he simply measures the number of sales calls made by each sales person and how many systems they successfully sold.

50

40

Number of

systems sold

30

20

10

0

0 1 2 3 4

Number of

sales calls made

Ava

70

Emily

Isabella

60

Do more sales calls result

in more sales made?

Emma

Step 1: Draw scatterplot

Ethan

Step 2: Estimate r

Joshua

Jacob

Dependent

Variable

Independent

Variable

Do more sales calls result

in more sales made?

Step 3: Calculate r

Step 4: Is it a significant correlation?

- Step 4: Is it a significant correlation?
- n = 10, df = 8
- alpha = .05
- Observed r is larger than critical r
- (0.71 > 0.632)
- therefore we reject the null hypothesis.
- Yes it is a significant correlation
- r (8) = 0.71; p < 0.05

Step 3: Calculate r

Step 4: Is it a significant correlation?

Step 1: Draw prediction line

r = 0.71

b= 11.579 (slope)

a = 20.526 (intercept)

Draw a regression line

and regression equation

What are we predicting?

Step 1: Draw prediction line

r = 0.71

b= 11.579 (slope)

a = 20.526 (intercept)

Draw a regression line

and regression equation

Step 1: Draw prediction line

r = 0.71

b= 11.579 (slope)

a = 20.526 (intercept)

Draw a regression line

and regression equation

Step 1: Draw prediction line

r = 0.71

b= 11.579 (slope)

a = 20.526 (intercept)

Draw a regression line

and regression equation

Interpret slope & intercept

You should sell 32.105 systems

Step 1: Predict sales for a certain number of sales calls

Madison

Step 2: State the regression equation

Y’ = a + bx

Y’ = 20.526 + 11.579x

Joshua

If make one sales call

Step 3: Solve for some value of Y’

Y’ = 20.526 + 11.579(1)

Y’ = 32.105

What should you expect from a salesperson who makes 1 calls?

They should sell 32.105 systems

If they sell more over performing

If they sell fewer underperforming

You should sell 43.684 systems

Step 1: Predict sales for a certain number of sales calls

Isabella

Step 2: State the regression equation

Y’ = a + bx

Y’ = 20.526 + 11.579x

Jacob

If make two sales call

Step 3: Solve for some value of Y’

Y’ = 20.526 + 11.579(2)

Y’ = 43.684

What should you expect from a salesperson who makes 2 calls?

They should sell 43.68 systems

If they sell more over performing

If they sell fewer underperforming

You should sell 55.263 systems

Ava

Step 1: Predict sales for a certain number of sales calls

Emma

Step 2: State the regression equation

Y’ = a + bx

Y’ = 20.526 + 11.579x

If make three

sales call

Step 3: Solve for some value of Y’

Y’ = 20.526 + 11.579(3)

Y’ = 55.263

What should you expect from a salesperson who makes 3 calls?

They should sell 55.263 systems

If they sell more over performing

If they sell fewer underperforming

You should sell 66.84 systems

Step 1: Predict sales for a certain number of sales calls

Emily

Step 2: State the regression equation

Y’ = a + bx

Y’ = 20.526 + 11.579x

If make four sales calls

Step 3: Solve for some value of Y’

Y’ = 20.526 + 11.579(4)

Y’ = 66.842

What should you expect from a salesperson who makes 4 calls?

They should sell 66.84 systems

If they sell more over performing

If they sell fewer underperforming

Step 1: Compare expected sales levels to actual sales levels

Ava

Emma

Isabella

Emily

Madison

What should you expect from each salesperson

Joshua

Jacob

They should sell x systems depending on sales calls

If they sell more over performing

If they sell fewer underperforming

Step 1: Compare expected sales levels to actual sales levels

70-55.3=14.7

Difference between

expected Y’ and actual Y

is called “residual”

(it’s a deviation score)

Ava

14.7

How did

Ava do?

Ava sold 14.7 more than expected taking into account how many sales calls she made over performing

Step 1: Compare expected sales levels to actual sales levels

20-43.7=-23.7

Difference between

expected Y’ and actual Y

is called “residual”

(it’s a deviation score)

Ava

How did

Jacob do?

-23.7

Jacob sold 23.684 fewer

than expected taking into account how many sales calls he

made under performing

Jacob

Step 1: Compare expected sales levels to actual sales levels

Difference between

expected Y’ and actual Y

is called “residual”

(it’s a deviation score)

Ava

14.7

Emma

Isabella

Emily

Madison

-23.7

Joshua

Jacob

Step 1: Compare expected sales levels to actual sales levels

Difference between

expected Y’ and actual Y

is called “residual”

(it’s a deviation score)

Ava

14.7

Emma

Isabella

-6.8

Emily

Madison

-23.7

7.9

Joshua

Jacob

Does the prediction line perfectly the predicted variable when using the predictor variable?

No, we are wrong sometimes…

How can we estimate how much “error” we have?

Difference between

expected Y’ and actual Y

is called “residual”

(it’s a deviation score)

14.7

The green lines show how

much “error” there is in our

prediction line…how much

we are wrong in our predictions

-23.7

Any

Residuals?

Perfect correlation = +1.00 or -1.00

Each variable perfectly

predicts the other

No variability in the scatterplot

The dots approximate a straight line

Residual scores

How do we find the average amount of error in our prediction

Ava is 14.7

Jacob is -23.7

Emily is -6.8

Madison is 7.9

The average amount by which actual scores

deviate on either side of the predicted score

Step 1: Find error for each value

(just the residuals)

Y – Y’

Difference between

expected Y’ and actual Y

is called “residual”

(it’s a deviation score)

Step 2: Add up the residuals

Big problem

Σ(Y – Y’) = 0

Square the deviations

Σ(Y – Y’)

2

How would we find our “average residual”?

Square root

Σ(Y – Y’)

2

Σx

The green lines show how

much “error” there is in our

prediction line…how much

we are wrong in our predictions

N

Divide by df

n - 2

How do we find the average amount of error in our prediction

Deviation scores

Diallo is 0”

Preston is 2”

Mike is -4”

Step 1: Find error for each value

(just the residuals)

Hunter is -2

Y – Y’

Sound familiar??

Step 2: Find average

√

Difference between

expected Y’ and actual Y

is called “residual”

(it’s a deviation score)

∑(Y – Y’)2

n - 2

How would we find our “average residual”?

Σx

The green lines show how

much “error” there is in our

prediction line…how much

we are wrong in our predictions

N

Standard error

of the estimate (line)

=

These would be helpful to know by heart – please memorize

these formula

When we calculate the regression line we try to:

- minimize distance between predicted Ys and actual (data) Y points (length of green lines)
- remember because of the negative and positive values cancelling each other out we have to square those distance (deviations)
- so we are trying to minimize the “sum of squares of the vertical distances between the actual Y values and the predicted Y values”

How well does the prediction line predict the predicted variable when using the predictor variable?

Standard error

of the estimate (line)

Standard error of the estimate:

- a measure of the average amount of predictive error

- the average amount that Y’ scores differ from Y scores

- a mean of the lengths of the green lines

- Slope doesn’t give “variability” info
- Intercept doesn’t give “variability info

- Correlation “r” does give “variability info

- Residuals do give “variability info

A note about curvilinear

relationships and patterns

of the residuals

How well does the prediction line predict the Ys from the Xs?

Residuals

- Shorter green lines suggest better prediction – smaller error

- Longer green lines suggest worse prediction – larger error

- Why are green lines vertical?
- Remember, we are predicting the variable on the Y axis
- So, error would be how we are wrong about Y (vertical)

- For each value of X, there is a group of Y values

- These Y values are normally distributed.
- The means of these normal distributions of Y values all lie on the straight line of regression.

- The standard deviations of these normal distributions are equal.

Which minimizes error better?

Is the regression line better than just guessing the mean of the Y variable?How much does the information about the relationship actually help?

How much better does the regression line predict the observed results?

r2

Wow!

r2 = The proportion of the total variance in one variable that is

predictable by its relationship with the other variable

Examples

If mother’s and daughter’s heights are

correlated with an r = .8, then what amount (proportion or percentage)

of variance of mother’s height is accounted for by daughter’s height?

.64 because (.8)2 = .64

r2 = The proportion of the total variance in one variable that is

predictable for its relationship with the other variable

Examples

If mother’s and daughter’s heights are

correlated with an r = .8, then what proportion of variance of mother’s height

is not accounted for by daughter’s height?

.36 because (1.0 - .64) = .36

or

36% because 100% - 64% = 36%

r2 = The proportion of the total variance in one variable that is

predictable for its relationship with the other variable

Examples

If ice cream sales and temperature are correlated with an

r = .5, then what amount (proportion or percentage) of variance of ice cream sales is accounted for by temperature?

.25 because (.5)2 = .25

r2 = The proportion of the total variance in one variable that is

predictable for its relationship with the other variable

Examples

If ice cream sales and temperature are correlated with an

r = .5, then what amount (proportion or percentage) of variance of ice cream sales is not accounted for by temperature?

.75 because (1.0 - .25) = .75

or

75% because 100% - 25% = 75%

- Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent)
- Coefficient of correlation is name for “r”
- Coefficient of determination is name for “r2”(remember it is always positive – no direction info)
- Standard error of the estimate is our measure of the variability of the dots around the regression line(average deviation of each data point from the regression line – like standard deviation)

1. What is regression used for?

- Include and example

2. What is a residual? How would you find it?

3. What is Standard Error of the Estimate (How is it related to residuals?)

4. Give one fact about r2

5. How is regression line like a mean?

r2

1. What is regression used for?

- Include and example

Regressions are used to take advantage of relationships

between variables described in correlations. We choose a value

on the independent variable (on x axis) to predict values for

the dependent variable (on y axis).

2. What is a residual? How would you find it?

Residuals are the difference between our predicted y (y’)

and the actual y data points. Once we choose a value on our

independent variable and predict a value for our dependent

variable, we look to see how close our prediction was. We

are measuring how “wrong” we were, or the amount of “error”

for that guess.

Y – Y’

3. What is Standard Error of the Estimate (How is it related to residuals?)

The average length of the residuals

The average error of our guess

The average length of the green lines

The standard deviation of the regression line

4. Give one fact about r2

5. How is regression line like a mean?

Thank you!

See you next time!!