A linear transformation changes the original value x into a new variable x new . x new is given by an equation of the form, Example 1.21 on page 45 in IPS. (i) A distance x measured in km. can be expressed in miles as follow, .
A linear transformation changes the original value x into a new variable xnew.
xnew is given by an equation of the form,
Example 1.21 on page 45 in IPS.
(i) A distance x measured in km. can be expressed in
miles as follow, .
(ii) A temperature x measured in degrees Fahrenheit can be
converted to degrees Celsius by
Multiplying each observation in a data set by a number b multiplies both the measures of center (mean, median, and trimmed means) by b and the measures of spread (range, standard deviation and IQR) by |b| that is the absolute value of b.
Adding the same number a to each observation in a data set adds a to measures of center, quartiles and percentiles but does not change the measures of spread.
Linear transformations do NOT change the overall shape of a distribution.
A sample of 20 employees of a company was taken and their salaries were recorded. Suppose each employee receives a $300 raise in the salary for the next year.
State whether the following statements are true or false.
The IQR of the salaries will
increase by $300
be multiplied by $300
The mean of the salaries will
increase by $300
be multiplied by $300
Density curve is a curve that
The median of a distribution described by a density curve is the point that divides the area under the curve in half.
A mode of a distribution described by a density curve is a peak point of the curve, the location where the curve is highest.
Quartiles of a distribution can be roughly located by dividing the area under the curve into quarters as accurately as possible by eye.
deviation is denoted by N(, ).
In the normal distribution with mean and standard deviation ,
Approx. 68% of the observations fall within of the mean .
Approx. 95% of the observations fall within 2 of the mean .
Approx. 99.7% of the observations fall within 3 of the mean .
The distribution of heights of women aged 18-24 is approximately N(64.5, 2.5), that is ,normal with mean = 64.5 inches and standard deviation = 2.5 inches.
The 68-95-99.7 rule says that the middle 95% (approx.) of women are between 64.5-5 to 64.5+5 inches tall.
The other 5% have heights outside the range from 59.5 to 69.5 inches, and 2.5% of the women are taller than 69.5 .
1) The middle 68% (approx.) of women are between ____to ___
2) ___% of the women are taller than 66.75.
3) ___% of the women are taller than 72.
If x is an observation from a distribution that has mean and standard deviation , the standardized value of
x is given by
A standardized value is often called a z-score.
A z-score tells us how many standard deviations the original observation falls away from the mean of the distribution.
Standardizing is a linear transformation that transform the data into the standard scale of z-scores. Therefore, standardizing does not change the shape of a distribution, but changes the value of the mean and stdev.
The heights of women is approximately normal with mean = 64.5 inches and standard deviation = 2.5 inches.
The standardized height is
The standardized value (z-score) of height 68 inches is
or 1.4 std. dev. above the mean.
A woman 60 inches tall has standardized height
or 1.8 std. dev. below the mean.
has the standard normal distribution.
P( Z ≤ z).
e.g. P( Z ≤ 1.4 ) = 0.9192
The table shows area to left of ‘z’ under standard normal curve
a) less than z = 1.4 ?
b) greater than z = 1.4 ?
c) greater than z = -1.96 ?
d) between z = 0.43 and z = 2.15 ?
P(a ≤ Z ≤ b) = P(Z ≤ b) – P(Z ≤ a)
has a standard normal distribution and any calculations about X
can be done using the following rules:
k = μ + σzp
Where zpis the value z from the standard normal table that has area (and cumulative proportion) p below it, i.e. zp is the pth percentile of the standard normal distribution.
1.The marks of STA221 students has N(65, 15) distribution. Find the proportion of students having marks
(a) less then 50.
(b) greater than 80.
(c) between 50 and 80.
2. Example 1.30 on page 65 in IPS:
Scores on SAT verbal test follow approximately the N(505, 110) distribution. How high must a student score in order to place in the top 10% of all students taking the SAT?
3.The time it takes to complete a stat220 term test is normally distributed with mean 100 minutes and standard deviation 14 minutes. How much time should be allowed if we wish to ensure that at least 9 out of 10 students (on average) can complete it?(final exam Dec. 2001)
5. In a survey of patients of a rehabilitation hospital the mean length of stay in the hospital was 12 weeks with a std. dev. of 1 week. The distribution was approximately normal.
If the points on a normal quantile plot lie close to a straight line, the plot indicates that the data are normal.
Systematic deviations from a straight line indicate a nonnormal distribution.
Outliers appear as points that are far away from the overall pattern of the plot.
A scatterplot shows the relationship between two quantitative variables measured on the same individuals.
Each individual in the data appears as a point in the plot fixed by the values of both variables for that individual.
Always plot the explanatory variable, if there is one, on the horizontal axis (the x axis) of a scatterplot.
Examining and interpreting Scatterplots
Look for overall pattern and striking deviations from that pattern.
The overall pattern of a scatterplot can be described by the form, direction and strength of the relationship.
An important kind of deviation is an outlier, an individual value that falls outside the overall pattern.
There is some evidence that drinking moderate amounts of wine helps prevent heart attack. A data set contain information on yearly wine consumption (litters per person) and yearly deaths from heart disease (deaths per 100,000 people) in 19 developed nations. Answer the following questions.
What is the explanatory variable?
What is the response variable?
Examine the scatterplot below.
Graph > Plot
To add a categorical variable to a scatterplot, use a different colour or symbol for each category.
The scatterplot below shows the relationship between the
world record times for 10,000m run and the year for both men
Family income and annual savings in thousand of $ for a sample of eight families are given below.
savings income C3 C4 C5
1 36 -1.42887 -1.45101 2.07331
2 39 -1.02062 -1.03144 1.05271
2 42 -0.61237 -0.61187 0.37469
5 45 -0.20412 -0.19230 0.03925
5 48 0.20412 0.22727 0.04433
6 51 0.61237 0.64684 0.39611
7 54 1.02062 1.06641 1.08840
8 56 1.42887 1.34612 1.92343
Sum of C5 = 6.99429
r = 6.99429/7 = 0.999185
MINITAB command: Stat > Basic Statistics > Correlation
away from 0. Values of r close to –1 or 1 indicates that the
points lie close to a straight line.
MINITAB analyses of math and verbal SAT scores is given below.
Variable N Mean Median TrMean StDev SE Mean
Verbal 200 595.65 586.00 595.57 73.21 5.18
Math 200 649.53 649.00 650.37 66.35 4.69
GPA 200 2.6300 2.6000 2.6439 0.5803 0.0410
Variable Minimum Maximum
Verbal 361.00 780.00
Math 441.00 800.00
GPA 0.3000 3.9000
Stem-and-leaf of Verbal N = 200
Leaf Unit = 10
1 3 6
4 4 034
19 4 566888888889999
52 5 000000122222222333333333444444444
(56) 5 55555555555556666666777777777777778888888888888889999999
92 6 00000000011111111222222333333333444444444444444
45 6 555555666666666778888888889999
15 7 0011112244
5 7 55568
Stem-and-leaf of Math N = 200
Leaf Unit = 10
1 4 4
3 4 79
12 5 001222234
38 5 55555666677777778888889999
(63) 6 000000000000001111111111112222222222222222333333333344444444444
99 6 555555555666666666666667777777777788888889999999
51 7 000000000011111111111112222222333334444
12 7 5566777789
2 8 00
g)Give a rough sketch of how a normal probability plot would look if the verbal scores were
h)For verbal scores, aside from running through the data and tallying, can you determine the approx. percentage of scores which fall between 523 and 668? If so give the percentage.