360 likes | 469 Views
In this lecture, we delve into various data transformation techniques, emphasizing the importance of properly displaying and interpreting raw data. We explore unit conversions, such as transforming height from feet to centimeters, and introduce transformations like square roots to spread out data. We discuss how these methods can either clarify insights or misrepresent information. With practical examples drawn from student datasets (height, shoe size, etc.), the lecture illustrates how different transformations impact the mean, median, and variance of the data, ensuring a deeper understanding of effective data analysis.
E N D
STAT131Week 3 Lecture 1a Transformations Anne Porter
Lecture Outline • Review • Exploring and Displaying data • Making sense of Raw Data • Good Graphics
Measuring Units and transformations Task 1: A sample of data collected from students Height ShoeSize Sex MobilePhone 5’6” 8 f y 6’ 11 m y 6’6” 15 m y 180cm 13 m y 170 9.5 m y • What do you notice with this data? • What went wrong?
Transforming Height • 6foot= cm
Transforming Height • 6foot= 12*6inches=72inches= 72*2.54 cm=182.88
Activity 2Transformation: Square root • Given a set of points x Transform each x by taking the square root and mark it on Z X 1 4 9 16 25 36 Z What does taking the square root do to this data?
Z 1 2 3 4 5 6 Activity 2Transformation: Square root • Given a set of points x X 1 4 9 16 25 36 Transform each x by taking the square root and call it Z What does taking the square root do to this data? Draws back a tail of high values
X Activity 2Transformation: Square root • Given a set of points Z transform them by squaring them and mark them on X Z 1 2 3 4 5 6 What can we do to spread out a set of data?
X 1 4 9 16 25 36 Activity 2Transformation: Square root • Given a set of points Z Z 1 2 3 4 5 6 What can we do to spread out a set of data? Square each value
Power transformations Table: Common Transformations (Griffiths,1998, p40)
Lies, damned lies and statistics • Is it cheating, misinterpreting when we transform data?
Revisiting Transformations • Convert units Why else do we transform data? • Spread out dense clusters • Contract values that are widely spaced • Reduce asymmetry and make numerical values more • representative of the data
Revisiting Transformations • Transformations allow us to see data from a different perspective • It is simpler to explain our data in terms of height in cm • ...but there is no reason that we cannot measure on other scales log(height)… • We may see different things when data is measured differently
Specific transformations • Square root • Square • Logarithm • Add and subtract constants • Multiply and divide by constants • Z scores to standardise data
Ex: Choose your marker If X and Y are the marks given by two markers who do you want to mark your work? Why? What is the mean mark for both?
Ex: Choose your marker If X and Y are the marks given by two markers who do you want to mark your work? Why? What is the mean mark for both? 15 25 5 3
What might you do to fix the problem? • Add 2 marks to everyone marked by X
Ex: Compare Z and Y • What is the median mark for Z? • What is the mean for Z? 25 Median y=5 5
Ex: Calculate mean • What is the mean for Z? • What is the median mark for both? Who do you want to mark your work Y or Z. Why? 25 25 5 5 Median Z = 5 Median y=5
Ex: Calculate mean • Who do you want to • mark your work • Y or Z. Why? • The spread of marks is different • Good students want 11 & have Y • Students low in confidence will take Z as the lowest is 3 not 1 25 25 5 5 Median Z = 5 Median y=5
Transforming by adding and subtracting constants Changes occur in • Mean • Median • Quartiles No changes occur in the • Range • Interquartile range • Variance • Standard deviation • Check this by doing the relevant calculations on Z and X
How might we alter the spread of marks? • Divide or multiply the scores? • We’ll use an easier set of data!
Ex: Calculate standard deviationsvariance and range Z 4 8 16 10 2
Ex: Calculate standard deviationsvariance and range Z 4 8 16 10 2 -4 0 8 2 -6 16 0 64 4 36 Range= 16-2=14 120
Ex: Let y= Z/2 and calculate mean of y Dividing Z by 2 we have halved the mean
Ex: Calculate standard deviation of y y = Z/2 2 4 8 5 1
Ex: Calculate standard deviation of y y = Z/2 2 4 8 5 1 -2 0 4 1 -3 4 0 16 1 9 Variance = 7.5 Range = 8-1=7 30
Ex: Compare Z and Y Y=Z/2 The standard deviation of y is half the standard deviation of z The variance of y is ¼ of variance of z The range of y is half the range of z
Dividing a data set by a positive constant 2 Changed in the same way • Mean is halved • Median is halved • Range is halved • Standard deviation is halved • But for the variance • The variance is quartered
Multiplying or Dividing by a positive constant • Changed in the same way • Mean • Median • Quartiles • Standard deviation But for variance the situation is different
Z score transformation • Transforms data so that it has • a mean of 0 and • Standard deviation of 1 • When two sets of scores are standardised in this manner, each with their own mean and standard deviation we can compare the standardised scores (Standardised to have a mean 0 and standard deviation 1).
Introduction to Correlation • Video Unit 13, Correlation • Note the use of standardized scores • Examines correlation as a measure of similarity • We will use the Z score transformation a lot!