BASIC STATISTICS For the HEALTH SCIENCES Fifth Edition

BASIC STATISTICSFor the HEALTH SCIENCESFifth Edition By Kuzma

CHAPTER 1 Statistics and How They Are Used

OUTLINE 1.1The Meaning of Statistics Formally defines the term statistics and illustrates by describing what a statistic does 1.2 The uses of statistics Shows how descriptive statistics are used to describe data and how inferential statistics are used to reach conclusions from the analysis of data. 1.3 Why study statistics? Explains how the study of statistics is important for research, for writing publishable reports, for understanding scientific journals, and for discriminating between appropriate and inappropriate uses of statistics. 1.4 Sources of Data Discusses surveys and experiments, two main sources of data, and further classifies surveys as retrospective or prospective and as descriptive or analytical. 1.5 Clinical Trials Describes the use of a clinical trial to determine the value of a new drug procedure. 1.6 Planning of Surveys Previews some hints on how to maximize the value of survey data. 1.7 How to Succeed in Statistics Offers some tips on getting the most out of class and other resources.

LEARNING OBJECTIVES • 1.Define statistics • 2.List several reasons for studying statistics • 3.Distinguish clearly between • a.descriptive and inferential statistics • b.surveys and experiments • c.retrospective and prospective studies • d.descriptive and analytical surveys • 4.Define bias • 5.Describe the purpose and components of a clinical trial

THE MEANING OF STATISTICS • A.What Does Statistics Mean? • 1.Refers to a recorded number • 2.Denotes characteristics calculated for a set of data • a.Standard deviation • b.Correlation coefficient • 3.A body of techniques and procedures dealing with the collection, organization, analysis, • interpretation, and presentation of information that can be stated numerically • B.What Do Statisticians Do? • 1.Works on challenging scientific tasks • 2.Primarily concerned with developing and applying methods that can be used in collecting and analyzing data • 3.Tasks are as follows • a.To guide the design of an experiment or survey • b.To analyze data • c.To present and interpret results

THE USES OF STATISTICS • A.Descriptive Statistics – deals with the enumeration, organization, and graphical representation of data • Example: Census • B.Inferential Statistics – concerned with reaching conclusions from incomplete information – generalizing from the specific • Example: Opinion Poll (Gallup Poll) • C.Statistical methods provide a logical basis for making decisions in a variety of areas when incomplete information is available

WHY STUDY STATISTICS? • A.Essential for both understanding and conducting research • B.Used to analyze data • C.Can help to discriminate between fact and fiction • D.Helpful in knowing when, and for what purpose, a statistician should be consulted

SOURCES OF DATA • A.Surveys • B.Experiments • C.Retrospective Studies (case-control studies) • 1.Disadvantage – usually collected for other purposes and may be incomplete • 2.Advantages • a.economical • b. answers usually obtained relatively quickly

SOURCES OF DATA • D. Prospective Studies (cohort studies) • 1.Advantages • a.collect relevant data • b.collect data under uniform conditions and for specific reasons • c.better opportunities to draw appropriate conclusions or make appropriate comparisons while limiting or controlling the amount of bias • 2.Disadvantage – typically not used to establish or “prove” a causal relationship because variables cannot be randomly assigned or manipulated

SOURCES OF DATA • E.Comparison of Ratios • F.Descriptive Surveys – provide estimates of a populations characteristics • G.Analytical Surveys – Seek to determine the degree of association between a variable and a factor in the population

CLINICAL TRIALS • A.Definition: a carefully designed experiment that is generally considered to be the best method for evaluating the effectiveness of a new drug or treatment • B.Protocol • 1.Describes in detail the design of proposed research • 2.Clearly defined hypothesis • 3.Detailed delineation of inclusion and exclusion criteria for study subjects • 4.Descriptions of the proposed interventions and the randomization process • 5.Detailed explanation of how bias may be minimized • 6. Description of the procedures to minimize errors in the collection and analysis of data

CLINICAL TRIALS • C.Two key features • 1.Blinding – study subjects and/or the investigators do not know who is in the control group and who is in the experimental group with the purpose of reducing bias • 2.Randomization – subjects are randomly assigned to either the experimental or control group

PLANNING SURVEYS • A.Formulate a clear plan of action before starting a survey • B.Outline major steps to be followed

HOW TO SUCCEED IN STATISTICS • A.Scan the chapter outline • B.Read the conclusion and vocabulary list • C.Review the learning objectives before coming to class • D.After class learn relevant terms, concepts, principles, and formulas • E.After doing the assigned exercises, try to reformulate the objectives as questions and then answer them • F.Read essays dealing with the application of statistics to a variety of fields

CONCLUSION A statistician designs efficient and unbiased investigations that provide data that he or she then analyzes, interprets, and presents to others so that decisions can be made. To do this work the statistician uses techniques that are collectively called “statistics.”

CHAPTER 2 Populations and Samples

OUTLINE 2.1 Selecting Appropriate Samples Explains why the selection of an appropriate sample has an important bearing on the reliability of inferences about a population 2.2 Why Sample? Gives a number of reasons sampling is often preferable to census taking 2.3 How Samples are Selected Explains how samples are selected 2.4 How to Select a Random Sample Illustrates with a specific example the method of selecting a random sample using a computer statistical package 2.5 Effectiveness of a Random Sample Demonstrates the credibility of the random sampling process 2.6 Missing and incomplete Data Explains the problem of missing or incomplete data and offers suggestions on how to minimize this problem

LEARNING OBJECTIVES • 1.Distinguish between • a.populations and samples • b.parameters and statistics • c.various methods of sampling • 2.Explain why the method of sampling is important • 3.State why samples are used • 4.Define random sample • 5.Explain why it is important to use random sampling • 6.Select a random sample using a computer statistical program • 7.Suggest methods for dealing with missing data

SELECTING APPROPRIATE SAMPLES • A.Population – a set of persons (or objects) having a common observable characteristic • B.Sample – a subset of a population • C.The WAY a sample is selected is more important than the size of the sample • D.An appropriate sample should be representative of the population • E.A set of observations may be summarized by a descriptive statistic called a parameter

SELECTING APPROPRIATE SAMPLES • F.Random sample • 1.Every subject has an equal opportunity for being selected • 2.Technique most likely to yield a representative sample • 3.Obstacles • a.Response rate – how many will respond • b.Sampling bias – some segment of the population may be over or under represented • c.May be too costly

WHY SAMPLE? • A.Random sampling - Each subject in the population has an equal chance of being selected • 1.Avoids known and unknown biases on average • 2.Helps convince others that the trial was conducted properly • 3.Basis for statistical theory that underlies hypothesis tests and confidence intervals • B.Convenience samples • 1.selected at will or in a particular program • 2.seldom representative of the underlying population • 3.used when random samples are virtually impossible to select

WHY SAMPLE? • C.Systematic sampling • 1.used when a sampling frame – a complete, nonoverlapping list of the persons or objects constituting the population is available • 2.randomly select a first case then proceed by selecting every case • D.Stratified sampling – used when we wish the sample to represent the various strata (subgroups) of the population proportionately or to increase the precision of the estimate • E.Cluster sampling • 1.select a simple random sample (number of city blocks) • 2.More economical than random selection of persons throughout the city

HOW TO SELECT A RANDOM SAMPLE • Computer statistical packages – most widely used

EFFECTIVENESS OF A RANDOM SAMPLE • A.Reliability is usually demonstrated by • 1.defining fairly small population • 2.selecting from it all conceivable samples of a particular size • 3.mean average is computed • 4.the variation for the population is observed • 5.a comparison of these sample means (statistics) with the population mean (population) neatly demonstrates the credibility of the sampling scheme

MISSING AND INCOMPLETE DATA • A.Bias may be introduced because of possible differences between respondents and nonrespondents • B.Limits the ability to accurately draw inferences about the population • C.Subjects may drop out of the study • D.Ways to deal with missing data • 1.Last observation carry-forward – take the last observed value prior to dropout and treat them as final data

CONCLUSION Assessing all individuals in a population may be impossible, impractical, expensive, or inaccurate, so it is usually to our advantage to instead study a sample from the original population. To do this, we must clearly identify the population, be able to list it in a sampling frame, and utilize an appropriate sampling technique. Although several methods of selecting samples are possible, random sampling is usually the most desirable technique. It is easy to apply, limits bias, provides estimates of error and meets the assumptions necessary for many statistical tests. Missing or incomplete data can also introduce bias. Carry-forward analysis is one technique for accounting for missing or incomplete data. The effectiveness of random sampling can easily be demonstrated by comparing sample statistics with population parameters. The statistics obtained from a sample are used as estimates of the unknown parameters of the population.

CHAPTER 3 Organizing and Displaying Data

OUTLINE 3.1 CLASSIFYING AND ORGANIZING DATA Explains and illustrates numerical scales and distinguishes among qualitative data, discrete quantitative data, and continuous qualitative data 3.2 FIGURES, TABLES, AND GRAPHS Gives brief overview of each 3.3 CREATING TABLES Gives instructions on how to organize data in the form of a frequency table 3.4 GRAPHING DATA Discussing and illustrating various methods of graphing with an emphasis on those that apply specifically to frequency distributions

LEARNING OBJECTIVES • 1.Distinguish between • a.qualitative and quantitative variables • b.discrete and continuous variables • c.symmetrical, bimodal, and skewed distributions • d.positively and negatively skewed distributions • 2.Construct and interpret a frequency table that includes class intervals, class frequency, valid percent, and cumulative percent • 3.Indicate the appropriate types of graphs for displaying quantitative and qualitative data • 4. Distinguish which forms of data presentation are appropriate for different situations

CLASSIFYING AND ORGANIZING DATA • A.General Data Organization/Presentation Methods • 1.Tables • 2.Graphs • 3.Numerical Techniques • B.Common Scales used to Measure Data • 1.Qualitative Data –variables that yield nominal level data • a.Nominal – primarily used for grouping or categorizing data • b.Ordinal – ordered series of relationships • 2.Quantitative Data – numerically measured variables • a.Interval – the number zero is an artificial 0, i.e. temperature • b.Ratio - the number zero is true or absolute, total absence of the characteristic being measured, i.e. $ in your wallet

CLASSIFYING AND ORGANIZING DATA • C.Discrete Quantitative Variables • 1.discontinuous variables • 2.must always be integers – whole numbers • D.Continuous Quantitative Variables • 1. may take fractional values • 2.Examples • a.age • b.height • c.weight

CLASSIFYING AND ORGANIZING DATA • E.Spreadsheet Data Hints • 1.Verify the accuracy of manually input data • 2.For nominal or ordinal data – change the computer default decimal setting to zero decimal places • 3.Subject ID numbers • a.usually use the first column • b.set the decimal number to zero

FIGURES, TABLES, AND GRAPHS As defined by Publication Manual of the American Psychological Association (APA), Fifth Edition

FIGURES, TABLES, AND GRAPHS • A.FIGURES • 1. any type of illustration other than a table • 2.examples • a.charts • b.graphs • c.photographs • d.drawing • B.GRAPH - one particular type of figure • C.TABLE – typically used to display quantitative data • D.Primary Purpose of Graphs & Tables To visually display information in a manner that makes it easy for readers to comprehend

FREQUENCY TABLES • A.Frequency – refers to the number of cases with a particular value • B.Percent • 1.Valid Percent – percentage out of 100, using only those subjects with data • 2.Cumulative Percent – percentage of all previous cases plus the current interval • C. Class Intervals – usually equal in length thereby aiding the comparisons between two intervals • D.Interval Width – the number of units between the upper and lower limits or, class limits • E.Range – difference between the highest and lowest numbers • F.Class Boundaries – true limits, points that demarcate the true upper limit of one class and true lower limit of the next

GRAPHING DATA • A.Must be self-explanatory • 1.descriptive title • 2.Labeled axes • 3. Indication of units observation

GRAPHING DATA • B.Histograms • 1.pictorial representation of the frequency table • 2.Components • a.Abscissa • i. Horizontal axis which depicts the class boundaries (no limits) • b.Perpendicular Ordinate • i.vertical axis which depicts the frequency (or relative frequency) of observations • ii.Should begin at zero • c. Height of the vertical scale should be three-fourths the length of the vertical scale

GRAPHING DATA • C.Frequency Polygons • 1.Construction • a.uses the same axes as the histogram • b.constructed by marking a point (at same height as the histogram’s bar) at the midpoint of the class interval • c.These points are then connected • 2.Superior to histograms for comparing two frequency distributions • 3.Shapes • a.Symmetrical Distribution – Bell-Shaped • b.Bimodal Distribution – two peaks • c.Rectangular Distribution – each class interval is equally represented

GRAPHING DATA • D.Cumulative Frequency Polygons • 1.Also called Ogive • 2.Horizontal scale – same as histograph • 3.Vertical scale indicates cumulative or relative cumulative frequency • 4.Construction • a.place a point at the upper class boundary of each class interval • b.Each point represents the cumulative relative frequency for that class • c.Points should then be connected • 5.Percentiles – may be obtained from the ogive

GRAPHING DATA • E.Stem-and-Leaf Displays • 1.Innovative technique of summarizing data that utilizes characteristics of the frequency distribution of the histogram • 2.Stems – represent the class intervals • 3.Leaves – strings of values within each class interval

GRAPHING DATA • F.Bar Charts • 1.Particularly useful for displaying nominal or ordinal data • 2.Relative frequencies are shown by heights • 3.Scale on the vertical axis should begin at zero • G.Pie Charts • 1.A common device for displaying data arranged in categories • 2.Useful for conveying data that consists of a small number of categories

GRAPHING DATA • H.Box-and-Whisker Plots • 1.Uses median and quartile statistics to graphically examine data • 2.Median – the score that divides a ranked series into two equal halves • 3.Mean – the average of the two middle scores if there are an equal number of scores • 4.Quartiles • a.locate the median in the ordered list of observations • -1st quartile is the median of the observations below this median • -3rd quartile is the median of the observations above the original median

GRAPHING DATA • I.Computerized Graphing • 1.Easily generated by a variety of statistical programs • 2.Standard programs can be found at: • a.www.minitab.com • b.www.JMP.com • c.www.spss.com • 3.Microsoft Excel • 4.Freeware sites: • a.www.statsci.org/free.html • b.www.statistics.com

CONCLUSION The principles of tabulating and graphing data are essential if we are to understand and evaluate the flood of data with which we are bombarded. By proper use of these principles, statisticians can present data accurately and lucidly. It is also important to know which method of presentation to choose for each specific type of data. Tables are usually comprehensive, but they do not convey the information as quickly or as impressively as do graphs. Remember that graphs and tables must tell their own story and stand on their own.

CHAPTER 4 Summarizing Data

OUTLINE 4.1MEASURES OF CENTRAL TENDENCY Explains why the selection of an appropriate sample has an imprint bearing the reliability of inferences about a population 4.2MEASURES AND VARIATION Describes several measure of variation or variability including the standard deviation 4.3COEFFICIENT OF VARIATION Defines the coefficient of variation, useful in comparing levels of variation 4.4MEASURING AND INTERPRETING SKEWNESS Explains how to measure skewness and how to determine if a distribution is symmetrical or skewed MEANS AND STANDARD DEVIATIOS OF POPULATIONS Contrasts the equations for the parameters of a population to the statistics of a sample

LEARNING OBJECTIVES • 1.Compute and distinguish between the uses of measures of central tendency: mean, median, and mode • 2.Compute and lists some uses for measures of variation: range, variance, and standard deviation • 3.Compare sets of data by computing their coefficients of variation • 4.Be able to compute the mean and standard deviation for grouped and ungrouped data • 5.Determine if a data set is symmetrical or skewed • 6.Understand the distinction between the population mean and the sample mean

MEASURES OF CENTRAL TENDENCY • A.The Mean • 1.the arithmetic or simple mean is computed by sunning all the observations in the sample and dividing the sum by the number of observations • 2.there are also harmonic and geometric means • 3.The arithmetic mean may also be considered the balance point, or, fulcrum • B.The Median • 1.the observation that divides the distribution into equal parts • 2.considered the most typical observation in the a distribution • 3.that value above which there are the same number of observations below • 4.Symbolically the mean is represented by

MEASURES OF CENTRAL TENDENCY • C.The Mode • 1.Observation that occurs most frequently • 2.If all values are different, there is no mode • D.Which Average Should You Use • 1.Arithmetic mean is the most commonly used • 2.Median gives the typical observation for a distribution – good for income

MEASURES AND VARIATION • A.Range • 1.the difference in value between the highest (maximum) and lowest (minimum) observation • Range = • 2.can be computed easily but is not very useful because it considers only the extremes

BASIC STATISTICS For the HEALTH SCIENCES Fifth Edition