1 / 70

Action Research Measurement Scales and Descriptive Statistics

Action Research Measurement Scales and Descriptive Statistics. INFO 515 Glenn Booker. Measurement Needs. Need a long set of measurements for one project, and/or many projects to examine statistical trends Could use measurements to test specific hypotheses

zenia
Download Presentation

Action Research Measurement Scales and Descriptive Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Action ResearchMeasurement Scales andDescriptive Statistics INFO 515 Glenn Booker Lecture #2

  2. Measurement Needs • Need a long set of measurements for one project, and/or many projects to examine statistical trends • Could use measurements to test specific hypotheses • Other realistic uses of measurement are to help make decisions and track progress • Need scales to make measurements! Lecture #2

  3. Measurement Scales • There are four types of measurement scales • Nominal • Ordinal • Interval • Ratio • Completely optional mnemonic: to remember the sequence, I think of ‘NOIR’ like in the expression ‘film noir’ (‘noir’ is French for ‘black’) Lecture #2

  4. Nominal Scale • A nominal (“name”) scale groups or classifies things into categories, which: • Must be jointly exhaustive (cover everything) • Must be mutually exclusive (one thing can’t be in two categories at once) • Are in any sequence (none better or worse) • So a nominal variable is putting things into buckets which have no inherant order to them Lecture #2

  5. Nominal Scale • Examples include • Gender (though some would dispute limitations of only male/female categories) • Dewey decimal system • The Library of Congress system • Academic majors • Makes of stuff (cars, computers, etc.) • Parts of a system Lecture #2

  6. Ordinal Scale • This measurement ranks things in order • Sequence is important, but the intervals between ranks is not defined numerically • Rank is relative, such as “greater than” or “less than” • E.g. letter grades, urgency of problems, class rank, inspection ratings • So now the buckets we’re using have some sense or order or direction Lecture #2

  7. Interval Scale • An interval scale measures quantitative differences, not just relative • Addition and subtraction are allowed • E.g. common temperature scales (°F or C), a single date (Feb 15, 1999), maybe IQ scores • Let me know if you find any more examples • A zero point, if any, is arbitrary (90 °F is *not* six times hotter than 15 °F!) Lecture #2

  8. Ratio Scale • A ratio scale is an interval scale with a non-arbitrary zero point • Allows division and multiplication • The “best” type of scale to use, if possible • E.g. defect rates for software, test scores, absolute temperature (Kelvin or Rankine), the number or count of almost anything, size, speed, length, … Lecture #2

  9. Summary of Scales • Nominal • names different categories, not ordered, not ranked: Male, Female, Republican, Catholic.. • Ordinal • Categories are ordered: Low, High, Sometimes, Never, • Interval • Fixed intervals, no absolute zero: IQ, Temperature • Ratio • Fixed intervals with an absolute zero point: Age, Income, Years of Schooling, Hours/Week, Weight • Age could be measured as ratio (years), ordinal (young, middle, old), or nominal (baby boomer, gen X) • Scale of measurement affects (may determine) type of statistics that you can use to analyze the data Lecture #2

  10. Scale Hierarchy • Measurement scales are hierarchical:ratio (best) / interval / ordinal / nominal • Lower level scales can always be derived from data which uses a higher scale • E.g. defect rates (a ratio scale) could be converted to {High, Medium, Low} or {Acceptable, Not Acceptable} (ordinal scales) Lecture #2

  11. Reexamine Central Tendencies • If data are nominal, only the mode is meaningful • If data are ordinal, both median and mode may be used • If data are ratio or interval (called “scale” in SPSS), you may use mean, median, and mode Lecture #2

  12. Reexamine Variables • Discrete variables use counting units or specific categories • Example: makes of cars, grades, … • Use Nominal or Ordinal scales • Continuous = Integer or Real Measurements • Example: IQ Test scores, length of a table, your weight, etc. • Use Ratio or Interval scales Lecture #2

  13. Refine Research Types • Qualitative Research tends to use Nominal and/or Ordinal scale variables • Quantitative Research tends to use Interval and/or Ratio scale variables Lecture #2

  14. Frequency Distributions • Frequency distributions describe how many times each value occurs in a data set • They are useful for understanding the characteristics of a data set • Frequencies are the count of how many times each possible value appears for a variable (gender = male, or operating system = Windows 2000) Lecture #2

  15. Frequency Distributions • They are most useful when there is a fixed and relatively small number of options for that variable • They’re harder to use for variables which are numbers (either real or integer) unless there are only a few specific options allowed (e.g. test responses 1 to 5 for a multiple choice question) Lecture #2

  16. Generating Frequency Distributions • Select the command Analyze / Descriptive Statistics / Frequencies… • Select one or more “Variable(s):” • Note that the Frequency (count) and percent are included by default; other outputs may be selected under the “Statistics...” button • A bar chart can be generated as well using the “Charts…” button; see another way later Lecture #2

  17. Sample Frequency Output Lecture #2

  18. Analysis of Frequency Output • The first, unlabeled column has the values of data – here, it first lists all Valid values (there are no Invalid ones, or it would show those too) • The Frequency column is how many times that value appears in the data set • The Percent column is the percent of cases with that value; in the fourth row, the value 15 appears 116 times, which is 24.5% of the 474 total cases (116/474*100 = 24.5%) Lecture #2

  19. Round-off error Analysis of Frequency Output • The Valid Percent column divides each Frequency by the total number of Valid cases (= Percent column if all cases valid) • The Cumulative Percent adds up the Valid Percent values going down the rows; so the first entry is the Valid Percent for first row, the second entry is from 11.2 + 40.1 = 51.3%, next is 51.3 + 1.3 = 52.5% and so on Lecture #2

  20. Generating Frequency Graphs • Frequency is often shown using a bar graph • Bar graphs help make small amounts of data more visible • To generate a frequency graph alone • Click on the Charts menu and select “Bar…” • Leave the “Simple” graph selected, and leave “Summaries are for groups of cases” selected; click the “Define” button Lecture #2

  21. Generating Frequency Graphs • Let the Bars Represent remain “N of cases” • Click on variable “Educational Level (years)” and move it into the Category Axis field • Click “OK” • You should get the graph on the next slide.Notice that the text below the X axis is the Label for the Category Axis. Lecture #2

  22. Sample Frequency Output Notice that the exact same graph can be generated from Frequencies, or just as a bar graph Lecture #2

  23. Frequency Distributions • A frequency distribution is a tabulation that indicates the number of times a score or group of scores occurs • Bar charts best used to graph frequency of nominal & ordinal data • Histograms best used to display shape of interval & ratio data Lecture #2

  24. Frequency Distribution Example SPSS for Windows, Student Version Lecture #2

  25. Basic Measures - Ratio • Used for two exclusive populations (every case fits into one OR the other) • Ratio = (# of testers) / (# of developers) • E.g. tester to developer ratio is 1:4 Lecture #2

  26. Proportions and Fractions • Used for multiple (> 2) populations • Proportion = (Number of this population) / (Total number of all populations) • Sum of all proportions equals unity (one) • E.g. survey results • Proportions are based on integer units • Fractions are based on real numbered units Lecture #2

  27. Percentage • A proportion or fraction multiplied by 100 becomes a percentage • Only report percentages when N (total population measured) is above ~30 to 50; and always provide N for completeness • Why? Otherwise a percentage will imply more accuracy than the data supports • If 2 out of 3 people like something, it’s misleading to report that 66.667% favor it Lecture #2

  28. Percents • Percent = the percentage of cases having a particular value. • Raw percent = divide the frequency of the value by the total number of cases (including missing values) • Valid percent = calculated as above but excluding missing values Lecture #2

  29. Percent Change • The percent increase in a measurement is the new value, minus the old one, divided by the old value; negative means decrease:% increase = (new - old) / old • The percent change is the absolute value of the percent increase or decrease:% change = | % increase | Lecture #2

  30. Percent Increase • Later Value – Earlier Value Earlier Value • So if a collection goes from 50,000 volumes in 1965 to 150,000 in 1975, the percent increase is: • 150,000-50,000 = 2 = 200% 50,000 • Always divide by where you started Carpenter and Vasu, (1978) Lecture #2

  31. Percentiles • A percentile is the point in a distribution at or below a given percentage of scores. • The median is the 50% percentile • Think of the SAT scores - what percentile were you for verbal, math, etc. - means what percent of people did worse than you Lecture #2

  32. Rate • Rate conveys the change in a measurement, such as over time, dx/dt. Rate = (# observed events) / (# of opportunities)*constant • Rate requires exposure to the risk being measured • E.g. defects per KSLOC (1000 lines of code) = (# defects)/(# of KSLOC)*1000 Lecture #2

  33. Exponential Notation • You might see output of the form +2.78E-12 • The ‘E’ means ‘times ten to the power of’ • This is +2.78 * 10-12 (+2.78*10**-12) • A negative exponent, e.g. –12, makes it a very small number • 10-12 = 0.000000000001 • 10+12 = 1,000,000,000,000 • The leading number, here +2.78, controls whether it is a positive or negative number Lecture #2

  34. Exponential Notation +5*10**+12 (a positive number >>1) Pos. +5*10**-12 (a positive number <<1) 0 -5*10**-12 (a negative number <<1) Neg. -5*10**+12 (a negative number >>1) Lecture #2

  35. Precision • Keep your final output to a consistent level of precision (significant digits) • Don’t report one value as “12” and another as “11.86257523454574123” • Pick a level of precision to match the accuracy of your inputs (or one digit more), and make sure everything is reported that way consistently (e.g. 12.0 and 11.9) Lecture #2

  36. Data Analysis • Raw data is collected, such as the dates a particular problem was reported and closed • Refined data is extracted from raw data, e.g. the time it took a problem to be resolved • Derived data is produced by analyzing refined data, such as the average time to resolve problems Lecture #2

  37. Descriptive Statistics • Descriptive statistics describes the key characteristics of one set of data (univariate) • Mean, median, mode, range (see also last week) • Standard deviation, variance • Skewness • Kurtosis • Coefficient of variation Lecture #2

  38. Mean • A.k.a.: Average Score • The mean is the arithmetic average of the scores in a distribution • Add all of the scores • Divide by the total number of scores • The mean is greatly influenced by extreme scores; they pull it off center Lecture #2

  39. Mean Calculation HOLDINGS IN 7 DIFFERENT LIBRARIES X Mean = X N 7400 6500 39200 = 5600 6200 7 5900 5100 4300 Here, sum every data value 3800  X= 39200 Lecture #2

  40. Mean with a Frequency Distribution X (IQ)F=FreqFX = F*X 140 2 280 135 1 135 132 2 264 130 1 130 128 1 128 126 1 126 125 4 500 123 1 123 120 4 480 110 3 330 101 1101 21 2597 Mean =∑FX = 2597 = 123.67 = 124 (round off) N 21 N = SF Lecture #2

  41. Central Tendency Example Staff Salaries $4100 6000 6000 Mode = $6000 6000 8000 Median = 9 + 1 = 5th value = $8000 9000 2 10000 11000 Mean = ∑X = 80100 = $8900 20000 N 9 Carpenter and Vasu, (1978) Lecture #2

  42. Handling Extreme Values • In cases where you have an extreme value (high or low) in a distribution, it is helpful to report both the median and the mean • Reporting both values gives some indication (through comparison) of a skewed distribution Lecture #2

  43. Measures of Variation • Measures which indicate the variation, or spread of scores in a distribution • Range (see last week) • Variance • Standard Deviation Lecture #2

  44. Standard Deviation, Variance • Standard deviation is the average amount the data differs from the mean (average)SD = ( S (Xi-X)**2 / (N-1) )SD = ( Variance ) • Variance is the standard deviation squaredVariance = S (Xi-X)**2 / (N-1) [per ISO 3534-1, para 2.33 and 2.34] Lecture #2

  45. Standard Deviation • The standard deviation is the square root of the variance. It is expressed in the same units as the original data. • Since the variance was expressed “squared units” it doesn’t make much practical sense. For example, what are “squared books” or “squared man-hours?” Lecture #2

  46. Computing the VarianceS2 = ∑(X – Mean)2 N • 1. Subtract the mean from each score • 2. Square the result • 3. Sum the squares for all data points • 4. Divide by the N of cases Lecture #2

  47. Divide by N or N-1??? • You’ll see different formulas for variance and standard deviation – some divide by N, some by N-1 (e.g. slides 43 and 45); why? • If your data covers the entire population (you have all of the possible data to analyze), then divide by N • If your data covers a sample from the population, divide by N-1 Lecture #2

  48. Standard Deviation for Freq Dist. XFFXX2FX2 17 2 34 289 578 16 4 64 256 1024 14 5 70 196 980 10 2 20 100 200 9 3 27 81 243 6 1 6 36 36 221 3061 σ = √ (∑FX2 – (∑FX)2/N) = √ (3061- (221)2/17) N 17 = √ ((3061- 2873)/17) = 3.3 Notice that FX2is F*(X2), not (F*X)2 Standard Deviation of Bookmobile Distribution Lecture #2

  49. Std Dev Reflects Consistency Distance from Target Frequency In MetersBattery ABattery B 200 2 0 150 4 1 100 5 5 50 7 10 0 9 13 -50 7 10 -100 5 5 -150 4 1 -200 2 0 Mean =0 Mean =0 Standard D. = Standard D. = 102.74 65.83 Runyon and Haber (1984) Lecture #2

  50. Standard Deviation vs. Std. Error • To be precise, the standard error is the standard deviation of a statistic used to estimate a population parameter [per ISO 3534-1, para 2.56 and 2.50] • So standard error pertains to sample data, while standard deviation should describe the entire population • We often use them interchangeably  Lecture #2

More Related