- 96 Views
- Uploaded on
- Presentation posted in: General

Ch 2 and 9.1 Relationships Between 2 Variables

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- More than one variable can be measured on each individual.
- Examples:
- Gender and Height
- Size and Cost
- Eye color and Major

- Examples:
- We want to look at the relationship among these variables.
- Is there an association between these two variables?
- Two variables measured on the same individuals are associated if some values tend to occur more often with some values of the second variable than with other values of that variable.

- If we expect one variable to influence another, we call it the ___________ variable.
- Explains or influences changes in the response variable

- The variable that is influenced is called the ____________ variable.
- Measures an outcome of a study

- In each of the following examples, identify the explanatory and response variables
- Gender and blood pressure
- Class attendance and course grade
- Number of beers and BAC

- We may be interested in relationships of different types of variables.
- Categorical and Numeric
- Categorical and Categorical
- Numeric and Numeric

- We are interested in comparing the numerical variable across each of the levels of the categorical variable.
- Examples:
- Compare high speeds for 4 different car brands
- Compare sucrose levels for 5 different types of fruit
- Compare GPR for 20 different majors

- Graphical Comparison
- Example: Sucrose levels of fruits (fictitious data)

- Numerical Comparison
- We could also look at summary statistics for each group.

- Depending on the situation, one of the variables is the explanatory variable and the other is the response variable.
- In this case, we look at the percentages of one variable for each level of the other variable.
- Examples:
- Gender and Soda Preference
- Country of Origin and Marital Status
- Smoking Habits and Socioeconomic Status

- Two-way tables come about when we are interested in the relationship between two categorical variables.
- One of the variables is the _____________.
- The other is the _______________.
- The combination of a row variable and a column variable is a ______________.

Column variable

Cells

Row Totals

Column Totals

Row variable

Overall Total

- Example:

- Example: Gender and Highest Degree Obtained
- Joint Distribution: How likely are you to have a bachelor’s degree and be a male? _____________
- Marginal Distribution: What is the least likely highest degree obtained? _____________
- Conditional Distribution: If you are a female, how likely are you to have obtained a graduate degree? ______________

Shows the percentages

for the joint, marginal,

and conditional distributions.

- Depending on the situation, one of the variables is the explanatory variable and the other is the response variable.
- There is not always an explanatory-response relationship.
- Examples:
- Height and Weight
- Income and Age
- SAT scores on math exam and on verbal exam
- Amount of time spent studying for an exam and exam score

- Scatterplots
- Look for overall pattern and any striking deviations from that pattern.
- Look for outliers, values falling outside the overall pattern of the relationship
- You can describe the overall pattern of a scatterplot by the form, direction, and strength of the relationship.
- Form: Linear or clusters
- Direction
- Two variables are _____________________ when above-average values of one tend to accompany above-average values of the other and likewise below-average values also tend to occur together.
- Two variables are _____________________ when above-average values of one variable accompany below-average values of the other variable, and vice-versa.

- Strength-how close the points lie to a line

___________Association

- Example:
- Response: MPG
- Explanatory: Weight

Response Variable (y-axis)

Explanatory Variable (x-axis)

__________

Association

- Relationships between two numeric variables
- Example
- Vehicle Weight
- Horsepower

- Example

- ___________ or r: measures the direction and strength of the linear relationship between two numeric variables
- General Properties
- It must be between -1 and 1, or (-1≤r≤ 1).
- If r is negative, the relationship is negative.
- If r = –1, there is a perfect negative linear relationship (extreme case).
- If r is positive, the relationship is positive.
- If r = 1, there is a perfect positive linear relationship (extreme case).
- If r is 0, there is no linear relationship.
- r measures the strength of the linear relationship.
- If explanatory and response are switched, r remains the same.
- r has no units of measurement associated with it
- Scale changes do not affect r

- General Properties

- Examples of extreme cases

r = 1

r = 0

r = -1

- Match the correlation with to the scatterplot

r = 0.04

r =0.43

r = -0.84

r = 0.76

r = 0.21

It is possible for there to be a strong relationship between two variables and still have r ≈ 0.

EX.

- Important notes:
- Association does not imply causation
- Correlation does not imply causation
- Slope is not correlation
- A scale change does not change the correlation.
- Correlation doesn’t measure the strength of a non-linear relationship:

- A regression line is a straight line that describes how a response variable y changes as an explanatory variable x changes.
- A regression line summarizes the relationship between two variables, but only in a specific setting: when one of the variables helps explain or predict the other.
- We often use a regression line to predict the value of y for a given value of x.
- Regression, unlike correlation, requires that we have an explanatory variable and a response variable

- Fitting a line to data means drawing a line that comes as close as possible to the points.
- Extrapolation-the use of a regression line for prediction far outside the range of values of the explanatory variable x that you used to obtain the line.
- Such predictions are often not accurate.

- The least-squares regression line of y on xis the line that makes the sum of squares of the vertical distances of the data points from the line as small as possible.
- These vertical distances are called the residuals, or the error in prediction, because they measure how far the point is from the line:
where y is the point and is the predicted point.

- The equation of the least-squares regression line of y on xis

- The expression for slope, b1, says that along the regression line, a change of one standard deviation in x corresponds to a change of r standard deviations in y.
- The slope, b1, is the amount by which y changes when x increases by one unit.
- The intercept, b0, is the value of y when
- The least-squares regression line ALWAYS passes through the point

- The square of the correlation, r2, is the fraction of the variation in the values of y that is explained by the least-squares regression of y on x.
- Use r2 as a measure of how successfully the regression explains the response.
- Interpret r2 as the “percent of variation explained”
- For Simple Linear Regression, r2is simply the square of the correlation coefficient.

- Example
How much of the variation is explained

by the least squares line of y on x? ______

What is the correlation coefficient? ______

Horsepower = -10.78 + 0.04*weight (Equation of the line.)

__________: y-value or response (horsepower) when line crosses the y-axis.

_______: increase in response for a unit increase in explanatory variable.

So if weight increases by one pound, horsepower increases by 0.04 units (on average).

Lurking Variable: A variable that is not among the explanatory or response variables in a study and yet may influence the interpretation of relationships among those variables.

Simpson’s Paradox: An association or comparison that holds for all of several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson’s Paradox. This can happen when a lurking variable is present. Please see Examples 9.9 and 9.10 in the text.

- An outlier is an observation that lies outside the overall pattern of the other observations.
- An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation.
- Points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line.

Child 18 is an outlier in the x direction. Because of its extreme position on the age scale, this point has a strong influence on the position of the regression line.

r2 is also affected by the influential observation. With Child 18, r2 = 41%, but without Child 18, r2 = 11%. The apparent strength of the association was largely due to a single influential observation.

The dashed line was calculated leaving out Child 18. The solid line is with Child 18.