Lecture 3-3

Lecture 3-3 Summarizing ｒrelationships among variables

Topics covered in this lecture note • We will cover several topics about ordinary least square estimation. • Testing the statistical significance of the estimated coefficient using t-statistics (i.e., testing whether advertisement spending has any effect on revenue). • Ordinary Least Square estimation when there are more explanatory variables. • An introduction to panel data (repeated observations over time)

1. Testing the statistical significance of the estimated coefficient: Example • The graph above shows a relationship between advertisement spending and revenue along with the estimated linear equation. • The estimated slope coefficient is 13.4. This means that every 1000 yen you spend on advertisement, revenue increases by 13.4 thousand yen. Next Page

Testing the statistical significance of the estimated coefficient: Example, contd However, the graph also seems to indicate that there is not much relationship between advertisement spending and revenue. When we estimate a linear equation, we typically would like to know if advertisement has any effect on the revenue. To answer such a question, just estimating β0 and β1 is not enough. We need more information.

Testing the statistical significance of the estimated coefficient: Example, contd The following slides describe the procedure to answer the following question: “Would the advertisement have any impact on the revenue?”

Testing the statistical significance of the estimated coefficient: Example, contd • To test if advertisement spending has any impact on the revenue, we need to test whether the slope coefficient is “significantly” different from zero. • If the slope coefficient is significantly different from zero, we may conclude that advertisement spending has some effect on the revenue. • If the slope coefficient is not significantly different from zero, we may conclude that advertisement spending has no effect on the revenue. • Then, what would be the criterion to decide whether the slope coefficient is “significantly” different from zero? See next slide

Testing the statistical significance of the estimated coefficient: Example, contd • To decide whether the slope coefficient is significantly different from zero, we use “t-statistic”. • OLS estimation procedure estimates much more than β0 andβ1 , also it includes t-statistic. Now, we will obtain some of extra information from OLS estimation using Excel.

Testing the statistical significance of the estimated coefficient: Example, contd • Open Data set “OLS Exercise 2-Advertisement and Revenue”. This is the data set used to produce the graph in the previous slides. Now, use “Data Analysis” to estimate the following Model (Revenue)= β0+β1(Advertisement Spending)

Testing the statistical significance of the estimated coefficient: Example, contd • The table above is the result of OLS regression. • Intercept Coefficient (β0)=15440.18 • Slope Coefficient(β1)=13.45 • We have some extra information, such as standard error and t statistic (t-Stat in the table). These are pieces of information needed to test whether slope coefficient is significantly different from zero. See next slides

Testing the statistical significance of the estimated coefficient: Example-Standard Error- Since data contain a lot of noise (unexpected rises and falls in revenue, etc), the effect of advertisement on revenue (β1) is estimated with some error. Standard errors show the expected error in the estimation of the coefficients. Next Slides

Testing the statistical significance of the estimated coefficient: Example-Standard Error, contd- • For example, the standard error for the slope coefficient is 60.3. This means that there would be an error in the estimate of the slope coefficient (β1) of about ± 60.3 on average. • Thus, the smaller the standard error for (β1) is , the more precise the estimate of the impact of advertisement is.

Testing the statistical significance of the estimated coefficient: Example-t statistic- • t-statistic is obtained by dividing the coefficient by its standard error. For example, t-statistic for the slope coefficient is • 13.45107/60.32825=0.222965 • Our confidence that the advertisement spending has some impact on revenue increases if t-statistic increases (because this happens when the standard error decreases or the coefficient increases) • We use t-statistic to test whether the slope coefficient is significantly different from zero.

The procedure to test the statistical significance of the estimated coefficient • The following is the procedure to test if a coefficient is significantly different from zero. • Obtain t-statistic • Check if the absolute value of the t-statistic is greater than or equal to 2 (that is, t-stat≤‒2 or t-stat≥+2) • If the absolute value of the t-statistic is greater than (or equal to) 2, the coefficient is statistically significantly different from zero • If the absolute value of the t-statistic is smaller than 2, then the coefficient is not statistically significantly different from zero

A note on the test of statistical significance of the estimated coefficient 1 • When the coefficient is statistically significantly different from zero, we simply say “the coefficient is statistically significant”. • If the coefficient is statistically significant, we conclude that the advertisement spending has some impact on the revenue. • If the coefficient is not statistically significant, we concluded that the advertisement spending has no impact on the revenue.

A note on the test of statistical significance of the estimated coefficient 2 (Optional) The criterion value for t-statistic that we used for testing the statistical significance was 2. More precisely speaking, this criterion value depends on the number of observations and the number of parameters to be estimated. This topic will be discussed more in detail later in the class. When you use the criterion value of 2, roughly speaking, you are testing the statistical significance of the slope coefficient at the 5% significance level.

Exercise • Exercise 1: Open data “Statistical Significance Exercise”. Use Product A data to estimate the effect of promotion on the revenue by estimating the following model. Pay particular attention to the statistical significance of the slope coefficient. (Revenue)=β0+β1(Number of promotion) • Exercise 2: Use data “Statistical Significance Exercise”. Use Product C data to estimate the same model.

Exercise 1 Answer The estimated effect of the promotion on the revenue is 99060.15, with t-statistic equal to 5.07. Since t-statistic is greater than 2, we conclude that the effect of the promotion on the revenue is statistically significant. Given the statistical significance of the coefficient, the estimated slope coefficient of 99060 indicates that, if we increase the number of promotion by one, the revenue is likely to increase by 99060 yen.

Exercise 2 Answer The estimated effect of promotion on the revenue is -11751.1 with t-statistic equal to -1.3. Since the absolute value of t-statistic is smaller than 2, we conclude that the slope coefficient is not statistically significant. In other word, we did not find evidence that promotion has any impact on the revenue from the product C.

2. OLS with multiple explanatory variablesIntroduction • So far, we have considered a model with only one explanatory variable. Y=β0+β1X • Often, we have more than one explanatory variable. For example, in addition to promotion, the company may increase the number of sales persons. If we have data about the number of sales persons, we can also incorporate such a variable.

OLS with multiple regressors-Example: Returns on Education- • Suppose you are considering to pursue more education (going to graduate school, etc). Then you may want to know if this is worth your effort.

OLS with multiple regressors-Example: Returns on Education- • To investigate by how much the extra education increases your future salary we can utilize OLS regression. • Open data “Returns on education”. This data contain three variables. These are data collected for 935 persons. For each person, data contain information about weekly wage in dollars, number of years of education, and number of years of work experience. • As an exercise, find the mean, variance and standard deviation for the three variables.

OLS with multiple regression-Example: Returns on Education- • To investigate the effect of education on wage, we may estimate the OLS regression: (wage)=β0+β1(education). • However, wage is affected not only by education, but also the number of years of work experience. Therefore, it seems better to incorporate “work experience” in the model. • The simplest way to incorporate experience in the model is the following: (wage)=β0+β1(education)+β2(experience) • Notice, that this OLS equation has two explanatory variables on the right hand side of the equation.

OLS with multiple regressors-Example: Returns on Education- • Excel estimates coefficients β0, β1 and β2 automatically (wage)=β0+β1(education)+β2(experience) • The estimated β1 is the effect of education on wage, holding experience constant. This is the big advantage of OLS with multiple explanatory variables. When we look at data, education and experience vary at the same time, so it is difficult see the effect of education separately from the effect of experience just by looking at the data. By incorporating these two variables we can separate the effect of experience from the effect of education. • Exercise: Estimate the model above using Excel.

OLS with multiple regressors-Example: Returns on Education- • Estimated β0=-272.5, β1=76.2 and β2=17.6 • Also notice that t-statistic for β1 is 12.1, which is bigger than 2. Therefore, the estimated β1 is statistically significant. Therefore, education does have an impact on wage. • Given the statistical significance of β1, we can say that, holding experience constant, increasing the year of education by one year would increase the weekly wage by $76.2. • This also means that if you go to graduate school for 2 years, your annual salary would increase by $76.2*(52 weeks)*(2 years)=$7924.8

Exercise 2 • Open Data “Returns on education 2” • This is the same data set as “Returns on education 1”, except that it has more variables. This data set contains information about the age of the person, and IQ test score of the person. Exercise: Add IQ to the model. Does this change the results?

OLS with multiple variables: Application-Making a model more flexible- • When you specify a model for OLS estimation, the first criterion is the simplicity. (Revenue)=β0+β1(Promotion) • Such a simple equation gives a clear idea of the effect of promotion on revenue. • However, simplicity comes with a cost: It is often not flexible.

OLS with multiple variables: Application-Making a model more flexible- • The model implicitly assumes that the effect of increasing the number of promotion by one does not change revenue. That is, the model assumes that the effect of increasing the number of promotion from 10 to 11 is the same as the effect of increasing the number of promotion from 40 to 41. • However, it is reasonable to think that the effect of promotion would diminish due to the law of diminishing marginal return. • See the next example.

-Making a model more flexible. An example • Open the data set “Making a model more flexible”. This data show the relationship between number of promotion and revenue for product D. • Plot the relationship between the number of promotion and revenue, then describe the relationship.

-Making a model more flexible: An example • The relationship seems to be a curve, not a straight line. • The effectiveness of promotion seems to be diminishing as the number of promotion increases. • How do we incorporate the“diminishing effectiveness” of promotion in the model?

-Making a model more flexible: An example- • To incorporate the “diminishing effectiveness” in the model we need to specify the model that can “curve”. • A simple way to achieve this is to estimate the following model: (Revenue)=β0+β1(Number of promotion) +β2(Number of promotion)2

-Making a model more flexible: Exercise- • Use the data “Making a model more flexible” and estimate the following model: (Revenue)=β0+β1(Number of promotion) +β2(Number of promotion)2

Exercise: Answer • The estimated equation is • (Revenue)=-295299.7+181554.72(Number of promotion) • ‒2629.38(Number of promotion)2 • Note the both β1 and β2 are statistically significant.

More exercises • Exercise 1: Using the estimated equation compute “predicted” revenue for each observation. • Exercise 2: Now plot the predicted revenue and the number of promotions. Also plot the actual revenue and promotions, on the same graph. See how well the model predicts the outcome.

More exercises • Exercise 3: Using the estimated results, compute the expected increases in revenue when you increase the number of promotion from 10 to 11, and 25 to 26.

OLS with multiple variables:Application 2-Dummy Variables- • Often, our data contain qualitative variables. For example, if you have data about your clients, for each client you may have data about whether the person is male or female. Such data (about gender) is not a quantitative variable but a qualitative variable.

OLS with multiple variables:Application 2-Dummy Variables- • However, such a qualitative variable is also important in analyzing data. For example, you would like to answer the following question: “which gender consumes more?”

To incorporate such a qualitative variable into the OLS equation, we first convert qualitative information into a quantitative variable called a “dummy variable”. • A dummy variable is a variable that takes 1 if a particular criterion is satisfied, and takes 0 otherwise. • If you would like to incorporate gender information in your model, create the following dummy variable: Female =1 if the client is female =0 if the client is male Then you can estimate (Consumer spending)=β0+β1(Number of promotion) +β2(Female)

OLS with multiple variables:Application 2-Dummy Variables- • A dummy variable is very versatile. Suppose you would like to know if there is any wage differentials among different races (for example between white and black), then you can use a dummy variable that takes 1 if the person is black, and 0 otherwise. • A dummy variable can be created for many other occasions. The use of a dummy variable is one of the most important techniques in regression analysis.

Dummy variable exercise • Open Data. “Dummy variable Exercise”. This data set contains three dummy variables. Black =1 if the person is black =0 otherwise Married =1 if the person is married =0 otherwise South =1 if the person lives in South of USA =0 otherwise Urban =1 if the person lives in urban area =0 otherwise.

Dummy variable exercise • Exercise 1: Estimate the following model: (Wage)=β0+β1(Education)+β2(Experience) +β3(Age)+ β4(IQ) +β5(Black) Then interpret the results.

Dummy variable exercise、Answer The coefficient for the dummy variable for black person is -124.6. The t-statistic is -3.19;the absolute value of t-statistic is greater than 2. Therefore, the coefficient is statistically significant. The results indicate that, holding education, experience, age, and IQ constant, the weekly wage is lower for a black person by $124.6. There seems to exist a large wage gap among white and black races.

Dummy variable:More exercises • Use data “Dummy Variable Exercise”. Specify your own model, estimate, and interpret the results.

Lecture 3-3

Lecture 3-3

Presentation Transcript

Lecture 3

Lecture 3 Chapter 3

Lecture #3

Week 3 Lecture 3

Lecture -3 Week 3

Lecture 3

Lecture 3:

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture 3

Lecture # 3

Lecture 3