Understanding Statistics: Importance and Everyday Applications

Section I. Statistics What do they mean and why are they important?

What do stats mean? • To be an intelligent consumer of statistics, your first reflex must be to question the statistics that you encounter. The British Prime Minister Benjamin Disraeli famously said, "There are three kinds of lies -- lies, damned lies, and statistics." • It is important to think about the numbers, their sources, and most importantly, the procedures used to generate them.

Top 10 ways you use statistics every day • Weather forecasts • Emergency preparedness • Predicting disease • Medical studies • Genetics • Political campaigns • Insurance • Consumer goods • Quality testing • Stock market

But I’m never going to do research! • Six good reasons to study statistics • to be able to effectively conduct research, • to be able to read and evaluate journal articles, • to further develop critical thinking and analytic skills, • to act as an informed consumer, • and to know when you need to hire outside statistical help. • Even Florence Nightingale did it!

Why nursing research • Increasing emphasis on evidence based practice • Informs nurses’ decisions and actions • Empowers nurses to make clinical decisions which benefit their patients, whether individual or community • Friendly nursing research environment required for Magnet status • Increases recognition for nursing contribution in health care and policy

Variables • The characteristics we are measuring • Varies according to the population, patient, event, intervention • Data levels of measurement help us measure the variables • Nominal • Ordinal • Interval • Ratio

Data levels of measurement: Nominal • sometimes called categorical or qualitative • Permissible statistics: mode, chi-squared • Lowest form of data, least sophisticated • Names • Characteristics/Descriptive (i.e. pain - throbbing, stabbing, dull) • Letters (i.e. M/F, Y/N) • Numbers may be assigned to designate categories but have no numerical meaning (i.e. M=1, F=2)

Data Levels of measurement: Ordinal • Permissible statistics: median, percentile • Can’t be added • Rank order • 1st, 2nd, 3rd • Rating • Pain rating 0-10 • Likert scale

Likert scales • Dissatisfied, somewhat dissatisfied, neither satisfied nor dissatisfied, somewhat satisfied, very satisfied • No numerical data to quantify • Answers run on a continuum

Data Levels of measurement: Interval • Permissible statistics: mean, SD, correlation, regression, ANOVA • Rank ordering of objects. • Equivalent distance between each measurement • The Fahrenheit scale is a clear example of the interval scale of measurement • Arbitrary zero does not represent the lowest value

Data Levels of measurement : Ratio • Highest level of measurement • Permissible statistics: same as interval plus more • The ratio scale of measurement is similar to the interval scale in that it also represents quantity and has equality of units. • has an absolute zero (no numbers exist below zero). Very often, physical measures will represent ratio data (for example, height and weight). Example: measuring a length of a piece of wood in centimeters: you have quantity, equal units, and the measure can’t go below zero centimeters.

Examples of data levels of measurement

Question 1 • The colors of M&M candies would be which type of measurement? • Interval • Nominal • Ordinal • Ratio

Question 2 • Height, weight, lab test results, and age are examples of which type of data measurement? A. Ratio B. Nominal C. Interval D. Ordinal

Rankin Scale • The Rankin scale is used to assess functional status after stroke. Measurements are: • 0 = no symptoms at all • 1 = symptoms with no significant disability • 2 = slight disability; unable to carry out previous activities • 3 = moderate disability; needs some assistance, can walk alone • 4 = moderately severe disability; unable to walk or attend bodily functions without assistance • 5 = severe disability; bedridden, incontinent, needs constant nursing care • 6 = dead

Question 3 • The Rankin scale is which type of measurement? A. Ratio B. Nominal C. Interval D. Ordinal

Section II. Descriptive Statistics and Intro to the Normal Distribution

Descriptive Statistics= Describing the Data • For any study, consider what parts would be useful to describe in numbers • Sample • Variables of interest • In any study where the data are numerical, data analysis should begin with descriptive statistics. • The appropriate choice of descriptive statistics depends on the level of data that was collected!

Types of Summary Statistics • Frequency distributions • Ungrouped • Grouped • Percentages • Measures of central tendency • Measures of dispersion

Ungrouped Frequency Distributions • The number of times something happened. • Used with categorical data (ordinal, nominal) • As simple as a tally or count http://www.gigawiz.com/histograms.html

Example • Using ungrouped frequency distributions to describe research variables • How often newborns fit each demographic criteria or birth attendant reported a particular behavior (ex. using CHG vs. not) From Rhee et al. (2008). Maternal and birth attendant hand washing and neonatal mortality in Southern Nepal. Archives of Pediatrics and Adolescent Medicine, 162(7), 603-608

Grouped Frequency Distributions • The number of times something happened. • Used to break continuous data (often things like age, weight, income) into groups. • You will always loose some information by doing this • There are conventions for groupings • Groups ideally have equal ranges but may see open ended at ends of data spectrum • All data points must fit into a group • Not too many, not too few (you don’t want to loose patterns in the data)

Percentage Distributions • What percentage of the time something happened. • Useful when comparing to studies with different numbers of participants • Often presented with other frequency distributions in the following format: No.(%) • Often graphically represented using pie charts, bar charts

Example • Questionnaires given to parents of under-immunized children. • The tables indicate the number and percentage of participants selecting each response. Luthy, K., Beckstrand, R., & Peterson, N. (2009). Parental hesitation as a factor in delayed Childhood Immunization

Question • Which measure of central tendency is being used here to summarize participant’s age: • A- Mode • B- Median • C- Mean • D- Standard deviation

Measure of Central Tendency • Used to describe a “typical” result or the middle of the dataset • Most common measures: • Median • Mode • Mean

Median • Literally the number in the middle of the dataset (odd # scores) • 50% of scores above and 50% of scores below this point (known as the 50th percentile) • Most appropriately used for ordinal data • Because focus is on middle score, the median is less affected by outliers

Mode • The most common score(s) • May or may not be in the “middle” but is always a number in the dataset • Most appropriate for nominal data (ex. Most answers are “yes”).

Mean • = Sum of Scores / Total # ofScores • Also known as an average • Data must be continuous to generate a mean (interval and ratio level data only!) • Most affected by outliers • May be denoted in a number of ways (M, X mean)

Measures of Variance • How spread out is the data? Or how different are the scores from one another? • Range • Subtract the lowest number from the highest number in the set. Tells the total distance between ends of the data set. • Variance (interval or ratio levels only!) • Computed mathematically and provides data on dispersion or spread • Standard deviation (interval or ratio levels only!) • Relates dispersion of values to the mean • Is an average of variance • Usually reported as SD

Normal Distribution • In a true normal distribution, the mean, median, and mode are equal • No real distribution exactly fits • However, in most sets of data, the distribution is similar to the normal curve

Normal Distribution • Unique properties • All possible values fall under the curve • Probability of any score occurring is related to its location under the curve • Important SDs: • 68.3% of all values within 1 SD from mean • 95.5% within 2SD from mean • 99.7% within 3 SD from mean +/- 1 SD +/- 2 SD

Section III. Stat theory Hypotheses Type 1 and 2 Errors Level of Significance Power

Probability Theory (p values) Deductive Used to explain: Extent of a relationship Probability of an event occurring Probability that an event can be accurately predicted Expressed as lowercase p with values expressed as percents

Probability If probability is 0.23, then p = 0.23. There is a 23% probability that a particular event will occur. Probability is usually expected to be p < 0.05. Example? Patients who cardiac arrest in the operating room have a 5% chance of death.

Decision Theory Inductive reasoning Assumes all groups in a study are the same Up to the researcher to provide evidence (NEVER use the words PROVE!) that there really is a difference To test the assumption of no difference, a cutoff point is selected before analysis.

Hypothesis • Statement of the expected outcome • Example? • Nursing students who study in the library have higher GPAs than nursing students who study in their dorm rooms/apartments.

Characteristics of a Hypothesis • Testable • Logical • Directly related to the research problem • Theoretically or Factually based • States relationship between variables • Stated so that it can be accepted or rejected

Research Hypothesis • Directional • explains and predicts the direction and existence of a specific relationship • relationship will be either positive or negative • more specific than the non-directional hypothesis • cause-and-effect hypothesis • Non - Directional

Null hypothesis • Statistical statement that there is no difference between the groups under study

Cutoff Point level of significance or alpha (α) Point at which the results of statistical analysis are judged to indicate a statistically significant difference between groups For most nursing studies, level of significance is 0.05.

Cutoff Point (cont’d) Absolute NO “CLOSE ENOUGH” - If value is only a fraction above the cutoff point, groups are from the same population. Results that reveal a significant difference of 0.001 are not considered more significant than the cutoff point.

Inference A conclusion/judgment based on evidence Judgments are made based on statistical results Statistical inferences must be made cautiously and with great care

Generalization A generalization is the application of information that has been acquired from a specific instance to a general situation. Example?

Normal Curve A theoretical frequency distribution of all possible values in a population . Levels of significance and probability are based on the logic of the normal curve.

Normal Curve

One-Tailed Test (cont’d)

Two-Tailed Test

Type I and Type II Errors Type I error occurs when the researcher rejects the null hypothesis when it is true. The results indicate that there is a significant difference, when in reality there is not. Type II error occurs when the researcher regards the null hypothesis as true but it is false. The results indicate there is no significant difference, when in reality there is a difference.

Reasons for Errors • Type I • Greater @.05 level than .01 • Type II • Greater @.01 level than .05 • Flaws in research methods • Multiple variables interact • Precision of instruments • Small samples

Understanding Statistics: Importance and Everyday Applications

Understanding Statistics: Importance and Everyday Applications

Presentation Transcript

Section I

Section I

AP Statistics Section 10.3

Section I

AP Statistics: Section 2.2 B

AP Statistics Section 14.

AP Statistics Section 15 A

AP Statistics: Section 2.2 C

Section I

Statistics – OR 155, Section 2

SECTION -I

Crime Section, Central Statistics Office.

SECTION I

SECTION -I

Section I

Yongyi Min Environment Statistics Section UN Statistics Division

Section I

AP Statistics Section 12.1 A

AP Statistics: Section 12.2

Section I

Introduction to Statistics Section 1A

Section 1B Descriptive Statistics