Lecture Outline 2 Organizing Data Please note: Not all slides/material from the lectures are included here. This presents only a detailed outline of the lecture.
Bibliography for This Lecture: Greer, B. & Mulhern, G. (2002). Making Sense of data and Statistics in Psychology. Houndmills: Palgrave. Levin, J. & Fox, J. A. (2006). Elementary Statistics in Social Research: The Essentials (2nd Edition). Pearson Education, Inc. Walsh, A. & Ollenburger, J. C. (2001). Essential Statistics for the Social and Behavioral Sciences: A Conceptual Approach. Prentice Hall. Weiss, N. A. (2008). Elementary Statistics (7th Edition). Pearson Education, Inc.
From last week--a couple of important points • Reliability & Validity in measurement: Reliability refers to repeatability of results; validity refers to whether or not the variable actually measures what our research question intends. • Interval-ratio levels: For most of the statistical tests, we will treat interval and ratio levels as the same. • Treatment of ordinal as interval: Sometimes ordinal level is treated as a quantitative variable; but this might question the validity in our measurement.
Example: • An often used ex. for reliability is a scale--when you measure your weight, the scale should be accurate (should measure your weight correctly) and should be reliable (each time you step on it, granted that your weight didn’t change, it should show the same amount). A scale could be reliable but not valid --if it systematically shows the weight 2 kg’s less, for ex. it will show the same weight each time, but it will not be accurate.
Further Issues on LoM: • For most statistical tests, interval and ratio levels may not need be distinguished—i.e. if a variable seems to be at least at the interval level, use the relevant test. • Sometimes, ordinal level variables are treated as interval, to be able to have more statistical options (Shively recommends this approach). Do this, only when • You can assume the distance between the categories are fairly even: • Ex: How interested do you consider yourself in politics? • Not at all interested • Somewhat interested • Moderately interested • Fairly interested • Very interested On a scale of 1 to 5, how interested would you consider yourself in politics? 1 2 3 4 5
Organizing Data • A teacher asked her students how many novels they had read in the previous six months. The results were: 0, 1, 5, 4, 2, 1, 3, 2, 2, 7, 2, 5, 0, 1, 0, 1, 1, 2, 6, 0, 2, 3, 1, 2, 7, 1, 4, 2, 3, 1, 7, 0, 0, 2, 1, 1, 0, 6, 1, 7. How can we conclude here; what can we say about novel reading habits of these students?
Frequency Distribution-Definitions • Frequency : Number of times a value of the variable occurs (f). • Cumulative frequency: The value of each new category is added to the one preceding it. Helps to identify the total frequencies below or above a certain category. • Proportion/relative frequency: Compares a part of the distribution with the whole. • Percentage: Proportion standardized to 100. • Ratio: Comparison of two quantities relative to one another.
Frequency Distribution Ex: Table 1: Number of novels read in the past 6 months
Frequency Distribution Ex: The following table gives the number of TV sets per household for 50 randomly selected households. How do we organize these findings?
Table 4: Frequency distribution of number of TVs per household Frequency Table
Grouped Frequencies Terms Used in Grouping Classes/intervals: Categories for grouping data. Lower cutpoint/limit: The smallest value that could go in a class. Upper cutpoint/limit: The highest value that could go in a class. Midpoint: The middle of a class, found by averaging its cutpoints. Width: The difference between the cutpoints of a class.
Grouped Frequency Distributions of Interval Data a The percentages as they appear add to only 99.99%. We write the sum as 100% instead, because we know that .01% was lost in rounding.
Summarizing Data How can we summarize what is going on in our data set? Basic descriptive statistics: Such as percentages, proportions, and rates. Measures of central tendency: Mean, median and mode. Give us an idea about the typical values. Measures of dispersion: range, variance, quartiles, standard deviation. Show us how the data is dispersed.
Percentages and Proportions • Report relative size. • Compare the number of cases in a specific category to the number of cases in all categories. • Compare a part (specific category) to a whole (all categories). • The part is the numerator (f ). • The whole is the denominator (N).
Percentages and Proportions-Ex. • In a group of 97 male and 132 female social science majors, what percentage of them is male? • The whole is the number of people in the group. • The part is the number of males.
What % of social science majors is male? • of (whole) = all social science majors • 97 + 132 = 229 • is (part) = male social science majors • 97 • (97/229) * 100 = (.4236) * 100 = 42.36% • 42.36% of social science majors are male
Ratios • Compare the relative sizes of categories. • Compare parts to parts. • Ratio = f1 / f2 • f1 - number of cases in first category • f2 number of cases in second category
Ratios-Ex. • In a class of 23 females and 19 males, the ratio of males to females is: • 19/23 = 0.83 • For every female, there are 0.83 males. • In the same class, the ratio of females to males is: • 23/19 = 1.21 • For every male, there are 1.21 females.
Rate • Expresses the number of actual occurrences of an event (births, deaths, homicides) vs. the number of possible occurrences per some unit of time.
Rates • Birth rate is the number of births divided by the population size times 1000 per year. • If a town of 2300 had 17 births last year, the birth rate is: • (17/2300) * 1000 = (.00739) * 1000 = 7.39 • The town had 7.39 births for every 1000 residents.
Expressing change over time • Measures the relative increase or decrease in a variable over time.
Change over time • f1 is the first (or earlier) frequency. • f2 is the second (or later) frequency. • Change can also be calculated with percentages, rates, or other values.
Percentage Change: Example • In 1990, a state had a murder rate of 7.3. • By 2000, the rate had increased to 10.7. • What was the relative change? • (10.7 – 7.3 / 7.3) * 100 = (3.4 / 7.3) * 100 = 46.58% • The rate increased by 46.58%.