1 / 35

# Discovering and Describing Relationships - PowerPoint PPT Presentation

Discovering and Describing Relationships. Farideh Dehkordi-Vakil. Exploring Relationships between Two Quantitative Variables. Scatter plots Represent the relationship between two different continuous variables measured on the same subjects.

Related searches for Discovering and Describing Relationships

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Discovering and Describing Relationships' - august

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Discovering and Describing Relationships

Farideh Dehkordi-Vakil

• Scatter plots

• Represent the relationship between two different continuous variables measured on the same subjects.

• Each point in the plot represents the values for one subject for the two variables.

• Example:

Data reported by the organization for Economic Development and Cooperation on its 29 member nations in 1998.

• Per capita gross domestic product is on x-axis

• Per capita health care expenditures is on y-axis.

• We can describe the overall pattern of scatter plot by

• Form or shape

• Direction

• strength

• Form or shape

• The form shown by the scatter plot is linear if the points lie in a straight-line pattern.

• Strength

• The relation ship is strong if the points lie close to a line, with little scatter.

• Direction

• Positive and negative association

• Two variables are positively associated when above-average values of one variable tend to occur in individuals with above average values for the other variable, and below average values of both also tend to occur together.

• Two variable are negatively associated when above average values for one tend to occur in subjects with below average values of the other, and vice-versa

• Per capita health care example

• “subjects” studied are countries

• Form of relationship is roughly linear

• The direction is positive

• The relationship is strong.

• It is often useful to have a measure of degree of association between two variables. For example, you may believe that sales may be affected by expenditures on advertising, and want to measure the degree of association between sales and advertising.

• Correlation coefficient is a numeric measure of the direction and strength of linear relationship between two continuous variables

• The notation for sample correlation coefficient is r.

• There are several alternative ways to write the algebraic expression for the correlation coefficient. The following is one.

• X and Y represent the two variables of interest. For example advertising and sales or per capita gross domestic product, and the per capita health care expenditure.

• n is the number of subjects in the sample

• The notation for population correlation coefficient is .

• r has no unit.

• r > 0 indicates a positive association; r < 0 indicates a negative association

• r is always between –1 and +1

• Values of r near 0 imply a very weak linear relationship

• Correlation measures only the strength of linear association.

• We could perform a hypothesis test to determine whether the value of a sample correlation coefficient (r) gives us reason to believe that the population correlation () is significantly different from zero

• The hypothesis test would be

H0:  = 0

Ha:   0

• The test statistic would be

• The test statistic has a t-distribution with n-2 degrees of freedom.

• Reject H0 if

• Many factors affect the wages of workers: the industry they work in, their type of job, their education and their experience, and changes in general levels of wages. We will look at a sample of 59 married women who hold customer service jobs in Indiana banks. The following table gives their weekly wages at a specific point in time also their length of service with their employer, in month. The size of the place of work is recorded simply as “large” (100 or more workers) or “small.” Because industry, job type, and the time of measurement are the same for all 59 subjects, we expect to see a clear relationship between wages and length of service.

• The correlation between wages and length of service for the 59 bank workers is r = 0.3535.

• We expect a positive correlation between length of service and wages in the population of all married female bank workers. Is the sample result convincing that this is true?

• To compute correlation: we need:

• Replacing these in the formula

• We want to test

H0:  = 0 Ha:  > 0

The test statistic is

• Comparing t = 2.853 with critical values from the t table with n - 2 = 57 degrees of freedom help us to make our decision.

• Conclusion:

• Since P( t > 2.853) < .005, we reject H0.

• There is a positive correlation between wages and length of service.

• In evaluating time series data, it is useful to look at the correlation between successive observations over time.

• This measure of correlation is called autocorrelation and may be calculated as follows:

• rk = autocorrelation coefficient for a k period lag.

• mean of the time series.

• yt = Value of the time series at period t.

• y t-k = Value of time series k periods before period t.

• Autocorrelation coefficient for different time lags can be used to answer the following questions about a time series data.

• Are the data random?

• In this case the autocorrelations between yt and y t-k for any lag are close to zero. The successive values of a time series are not related to each other.

• Is there a trend?

• If the series has a trend, yt and y t-k are highly correlated

• The autocorrelation coefficients are significantly different from zero for the first few lags and then gradually drops toward zero.

• The autocorrelation coefficient for the lag 1 is often very large (close to 1).

• A series that contains a trend is said to be non-stationary.

• Is there seasonal pattern?

• If a series has a seasonal pattern, there will be a significant autocorrelation coefficient at the seasonal time lag or multiples of the seasonal lag.

• The seasonal lag is 4 for quarterly data and 12 for monthly data.

• Is it stationary?

• A stationary time series is one whose basic statistical properties, such as the mean and variance, remain constant over time.

• Autocorrelation coefficients for a stationary series decline to zero fairly rapidly, generally after the second or third time lag.

• To determine whether the autocorrelation at lag k is significantly different from zero, the following hypothesis and rule of thumb may be used.

• H0: k= 0, Ha: k  0

• For any k, reject H0 if

• Where n is the number of observations.

• This rule of thumb is for  = 5%

• The hypothesis test developed to determine whether a particular autocorrelation coefficient is significantly different from zero is:

• Hypotheses

• H0: k= 0, Ha: k  0

• Test Statistic:

• The plot of the autocorrelations versus time lag is called Correlogram.

• The horizontal scale is the time lag

• The vertical axis is the autocorrelation coefficient.

• Patterns in a Correlogram are used to analyze key features of data.

• Correlograms for the mobile home shipment

• Note that this is quarterly data

• As the world’s economy becomes increasingly interdependent, various exchange rates between currencies have become important in making business decisions. For many U.S. businesses, The Japanese exchange rate (in yen per U.S. dollar) is an important decision variable. A time series plot of the Japanese-yen U.S.-dollar exchange rate is shown below. On the basis of this plot, would you say the data is stationary? Is there any seasonal component to this time series plot?

• Here is the autocorrelation structure for EXRJ.

• With a sample size of 12, the critical value is

• This is the approximate 95% critical value for rejecting the null hypothesis of zero autocorrelation at lag K.

• The Correlograms for EXRJ is given below

• Since the autocorrelation coefficients fall to below the critical value after just two periods, we can conclude that there is no trend in the data.

• To check for seasonality at  = .05

• The hypotheses are:

• H0; 12 = 0 Ha:12  0

• Test statistic is:

• Reject H0 if

• Since

• We do not reject H0 , therefore seasonality does not appear to be an attribute of the data.