class 20 thurs nov 18 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Class 20: Thurs., Nov. 18 PowerPoint Presentation
Download Presentation
Class 20: Thurs., Nov. 18

Loading in 2 Seconds...

play fullscreen
1 / 23

Class 20: Thurs., Nov. 18 - PowerPoint PPT Presentation


  • 158 Views
  • Uploaded on

Class 20: Thurs., Nov. 18. Specially Constructed Explanatory Variables Dummy variables for categorical variables Interactions involving dummy variables I will e-mail you HW8 tomorrow. It will be due Tuesday, Nov. 30 th . Schedule: Tuesday, Nov. 23 rd : One-way ANOVA

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Class 20: Thurs., Nov. 18' - uyen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
class 20 thurs nov 18
Class 20: Thurs., Nov. 18
  • Specially Constructed Explanatory Variables
    • Dummy variables for categorical variables
    • Interactions involving dummy variables
  • I will e-mail you HW8 tomorrow. It will be due Tuesday, Nov. 30th.
  • Schedule:
    • Tuesday, Nov. 23rd: One-way ANOVA
    • Tuesday, Nov. 30th: Review
    • Thursday, Dec. 2nd: Midterm II
    • Tuesday, Dec. 7th, Thursday, Dec. 9th: Two-way ANOVA
categorical variables
Categorical variables
  • Categorical (nominal) variables: Variables that define group membership, e.g., sex (male/female), color (blue/green/red), county (Bucks County, Chester County, Delaware County, Philadelphia County).
  • How to use categorical variables as explanatory variables in regression analysis:
    • If the variable has two categories (e.g., sex (male/female), rain or not rain, snow or not snow), we have defined a variable that equals 1 for one of the categories and 0 for the other category.
predicting emergency calls to the aaa club
Predicting Emergency Calls to the AAA Club

Rain forecast=1 if rain is in

forecast, 0 if not

Snow forecast=1 if snow is in

forecast, 0 if not

Weekday=1 if weekday, 0 if

not

comparing toy factory managers
Comparing Toy Factory Managers
  • An analysis has shown that the time required to complete a production run in a toy factory increases with the number of toys produced. Data were collected for the time required to process 20 randomly selected production runs as supervised by three managers (A, B and C). Data in toyfactorymanager.JMP.
  • How do the managers compare?
marginal comparison
Marginal Comparison
  • Marginal comparison could be misleading. We know that large production runs with more toys take longer than small runs with few toys.
slide6
How can we be sure that Manager c’s advantage is not due to simply having supervised smaller production runs?
  • Solution: Run a multiple regression in which we include size of the production run as an explanatory variable, along with manager, in order to control for size of the production run.
including categorical variable in multiple regression wrong approach
Including Categorical Variable in Multiple Regression: Wrong Approach
  • We could assign codes to the managers, e.g., Manager A = 0, Manager B=1, Manager C=2.
  • This model says that for the same run size, Manager B is 31 minutes faster than Manager A and Manager C is 31 minutes faster than Manager B.
  • This model restricts the difference between Manager A and B to be the same as the difference between Manager B and C – we have no reason to do this.
  • If we use a different coding for Manager, we get different results, e.g., Manager B=0, Manager A=1, Manager C=2

Manager A 5 min.

faster than

Manager B

including categorical variable in multiple regression right approach
Including Categorical Variable in Multiple Regression: Right Approach
  • Create an indicator (dummy) variable for each category.
  • Manager[a] = 1 if Manager is A

0 if Manager is not A

  • Manager[b] = 1 if Manager is B

0 if Manager is not B

  • Manager[c] = 1 if Manager is C

0 if Manager is not C

slide9
For a run size of length 100, the estimated time for run of Managers A, B and C are
  • For the same run size, Manager A is estimated to be on average 38.41-(-14.65)=53.06 minutes slower than Manager B and

38.41-(-23.76)=62.17 minutes slower than Manager C.

categorical variables in multiple regression in jmp
Categorical Variables in Multiple Regression in JMP
  • Make sure that the categorical variable is coded as nominal. To change coding, right clock on column of variable, click Column Info and change Modeling Type to nominal.
  • Use Fit Model and include the categorical variable into the multiple regression.
  • After Fit Model, click red triangle next to Response and click Estimates, then Expanded Estimates (the initial output in JMP uses a different, more confusing, coding of the dummy variables).
slide11
The coefficients on Manager A, Manager B and Manager C add up to zero. So the positive coefficient on Manager A means that Manager A is slower than the average (of Manager A, B and C) and the negative coefficients on Manager B and Manager C mean that these two managers are faster than the average (of Manager A, B and C).
  • The coefficients on the indicator variables will always add up to zero in JMP.
  • Caution: Different software uses different coding for indicator variables. It doesn’t change the predictions from the multiple regression but does change the interpretation.
slide12
Equivalence of Using One 0/1 Dummy Variable and Two 0/1 Dummy Variables when Categorical Variable has two categories

Two models give equivalent predictions. The difference in mean number of

Emergency calls between a day with a rain forecast and a day without a rain forecast

holding all other variables fixed is 429.71=214.85-(-214.85).

effect tests
Effect Tests
  • Effect test for manager:

vs. Ha: not all manager[a],manager[b],manager[c] equal. Null hypothesis is that all managers are the same (in terms of mean run time) when run size is held fixed, alternative hypothesis is that not all managers are the same (in terms of mean run time) when run size is held fixed.

  • p-value for Effect Test <.0001. Strong evidence that not all managers are the same when run size is held fixed.
  • Note: equivalent to

because JMP has constraint that manager[a]+manager[b]+manager[c]=0.

  • Effect test for Run size tests null hypothesis that Run Size coefficient is 0 versus alternative hypothesis that Run size coefficient isn’t zero. Same p-value as t-test.
slide14
Effect tests shows that managers are not equal.
  • For the same run size, Manager C is best (lowest mean run time), followed by Manager B and then Manager C.
  • The above model assumes no interaction between Manager and run size – the difference between the mean run time of the managers is the same for all run sizes.
interaction model in jmp
Interaction Model in JMP
  • To add interactions involving categorical variables in JMP, follow the same procedure as with two continuous variables. Run Fit Model in JMP, add the usual explanatory variables first, then highlight one of the variables in the interaction in the Construct Model Effects box and highlight the other variable in the interaction in the Columns box and then click Cross in the Construct Model Effects box.
interaction model17
Interaction Model
  • Interaction between run size and Manager: The effect on mean run time of increasing run size by one is different for different managers.
  • Effect Test for Interaction:
  • Manager*Run Size Effect test tests null hypothesis that there is no interaction (effect on mean run time of increasing run size is same for all managers) vs. alternative hypothesis that there is an interaction between run size and managers. p-value =0.0333. Evidence that there is an interaction.
slide18
The runs supervised by Manager A appear abnormally time consuming. Manager b has higher initial fixed setup costs than Manager c (186.565>149.706) but has lower per unit production time (0.136<0.259).
interaction profile plot
Interaction Profile Plot

Lower left hand plot shows mean time for run vs. run size for the three managers

a, b and c.

interactions involving categorical variables general approach
Interactions Involving Categorical Variables: General Approach
  • First fit model with an interaction between categorical explanatory variable and continuous explanatory variable. Use effect test on interaction to see if there is evidence of an interaction.
  • If there is evidence of an interaction (p-value <0.05 for effect test), use interaction model.
  • If there is not strong evidence of an interaction (p-value >0.05 for effect test), use model without interactions.
example a sex discrimination lawsuit
Example: A Sex Discrimination Lawsuit
  • Did a bank discriminatorily pay higher starting salaries to men than to women. Harris Trust and Savings Bank was sued by a group of female employees who accused the bank of paying lower starting salries to women. The data in harrisbank.JMP are the starting salaries for all 32 male and all 61 female skilled, entry-level clerical employees hired by the bank between 1969 and 1977, as well as the education levels and sex of the employees.
discrimination case regression results
Discrimination Case Regression Results
  • Strong evidence that there is a difference in the mean starting salaries of women and men of the same education level.
  • Estimated difference: Men have 345.904+345.904=$691.81 higher mean starting salaries than women of the same education level.
  • 95% confidence interval for mean difference = (2*$214.55,2*$477.25)=($429.10,$854.50).
  • Bank’s defense: Omitted variable bias. Variables such as Seniority, Age, Experience also need to be controlled for.