1 / 50

The Nature of Statistics:

The Nature of Statistics:. The art of learning about and understanding our world through data. Initial Q’s: Essentials Terms & Definitions Example Application Sample Problem Additional Topics. Initial Questions: 69.2 76 80. What do you know about these numbers?

scooney
Download Presentation

The Nature of Statistics:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Nature of Statistics: The art of learning about and understanding our world through data. • Initial Q’s: • Essentials • Terms & Definitions • Example Application • Sample Problem • Additional Topics

  2. Initial Questions:69.2 76 80 What do you know about these numbers? What do they mean to you? What is missing?

  3. 69.2 76 80 Revisited 69.2 inches: Average height of an adult male in the United States (female: 63.8 inches). Seventy-six percent (76%) of respondents to a Zogby International survey could identify the Three Stooges. Forty-two percent (42%) of the respondents to the same survey could identify the three branches of the U.S. Government. (August 2006). 80 Percent of the world’s population eats insects intentionally.

  4. Essentials: The Nature of Statistics(a.k.a: The bare minimum I should take along from this topic.) Providing a context for the data Definitions and relationships as presented on the Anatomy of the Basics: Statistical Terms and Relationships sheet Identification of variables and their characteristics Careful review of data and their presentation Why use percentages rather than numeric counts when making comparisons

  5. Terminology to Know:(at least some of them…) Categorical Census Continuous Data Descriptive Stats Discrete Enumeration Experimental Study Inferential Stats Interval Measurement Levels Nominal Observational Study • Ordinal • Parameter • Population • Precision • Qualitative • Quantitative • Ratio • Sample • Simulation Study • Statistic • Unit (Element) • Values • Variable

  6. Okay, so What is Statistics? (or is that What ARE Statistics?) Statistics is the study of how to collect, organize, analyze, interpret and report numerical information in order to make decisions. Statistics are the numeric data we use to better understand our world. They may take the form of frequencies, means, percentages, variances, etc.

  7. What is a Study? • 3 Basic Types of Studies: • Observational– observe and measure; can identify association, not causation. • Experimentation– impose treatment and observe characteristics; can help establish causation. • Simulation– using computers to simulate situations that are not practical to do in real time.

  8. Basic Terminology • DATA: Are numbers with a context - i.e. numbers with meaning. • Examples: not 48.2, but 48.2 kg. not 5.23, but 5.23 inches) • VARIABLE: A characteristic or property of an individual population unit that varies from one person or thing to another. • Examples: age, pulse and blood pressure represent three variables associated with an individual’s health assessment. • Variables have Values. Example: The variable hair color has the values of brown, blonde, red, etc. • UNIT (Element): Any individual member of the defined population. • Examples: Each bottle of soda in a production run is a unit; each penny in a roll of pennies is a unit; each person enrolled in a class is a unit.

  9. Statistics is the study of how to collect, organize, analyze, interpret and report numerical information. Anatomy of the Basics: Statistical Terms and Relationships Descriptive Statistics: methods for organizing and summarizing information. E.g. Number of students in this class by major, baseball standings, housing sales by month. Inferential Statistics: methods for drawing conclusions and measuring the reliability of those conclusions using sample results. E.g. Political views of all 4-year college students. Parameter:numerical characteristic of a population. Census: data collected from ALL members of the population. Population:all individuals, items, or objects whose characteristics are being studied. Population vs. Sample Sample: a portion of the population selected for study. Statistic:numerical characteristic of a sample. Qualitative:a variable that cannot be measured numerically E.g. Gender, eye color. Variable: a characteristic or property of an individual unit. Variables have values. Discrete:a variable whose values are countable. It can only assume certain values, with no intermediate values. E.g. Number of auto accidents in Oneonta in 1998. Quantitative: a variable that can be measured numerically. E.g. Income, height, number of siblings one has. Continuous: a variable that can assume any numerical value over an interval or intervals. E.g.Time. Nominal: grouping individual observations into qualitative categories or classes. E.g. Grouping individuals by whether they are left-handed or right-handed. No Arithmetic Operations: individual observations can only be categorized. Ordinal: individual observations are assigned a number or “ranking.” There is a sense of “more than,” but you cannot say “how much” more than. E.g. Military ranks. Scaling of Variables (Measurement Levels) Interval:variables have no true zero point. Cannot say how much more. E.g. Temperature ( F or C), IQ scores. Arithmetic Operations: individual observations have meaningful numeric values. Ratio:variables have a true zero point. Can say how much more. E.g. Weight, height.

  10. Population Basic Terminology • POPULATION: • Complete collection of all elements or units (usually people, objects, transactions, or events) that we are interested in studying. • In terms of data, a populationis the collection of all outcomes, responses, measurement, or counts that are of interest. • CENSUS: A complete enumeration (or accounting) of the population (i.e. collecting data from every element (or unit) in the population). • PARAMETER: A numeric value associated with a population. (e.g. - the average height of ALL students in this class, given that the class has been defined as a population)

  11. Sample Basic Terminology • SAMPLE: Taken from a population a sample is a subset from which information is collected. • Example: 25 cans of corn (sample) randomly obtained from a full days production (population) • STATISTIC: A numeric value associated with a sample. • Example: the average height of 10 individuals randomly selected from the class (defined population). • INFERENCE:An estimate, prediction, or some other generalization about a population based on information contained in a sample. • Example: Based upon a randomly selected sample of 25 flights at JKF International Airport (the sample; individual flights are units) taken from all flights on Dec. 24, 2009 (defined population), we can state with a degree of confidence the mean delay for the population of the day’s flights was 35 minutes (sample statistic in context being inferred to the population).

  12. In Summary Parameter Population Statistic Sample To include ALL units, you are looking at: POPULATION CENSUS PARAMETERS To work with a subset of all units, you are looking at: SAMPLE STATISTICS INFERENCES to a population

  13. Example: Identifying Data Sets In a recent survey, 1708 adults in the United States were asked if they think global warming is a problem that requires immediate government action. Nine hundred thirty-nine of the adults said yes. Describe the data set. Identify: The population: The sample: A variable being studied: Values of the Variable: Source; Adapted from: Pew Research Center; Larson/Farber 4th ed.

  14. Solution: Identifying Data Sets Responses of adults in the U.S. (population) Responses of adults in survey (sample) • The population consists of the responses of all adults in the U.S. • The sample consists of the responses of the 1708 adults in the U.S. in the survey. • The sample is a subset of the responses of all adults in the U.S. • A variable being studied is … ? • The variable’s values are … ? • The data set consists of 939 yes’s and 769 no’s. Source: Larson/Farber 4th ed.

  15. Examples: Populations & Samples • Smoking: Identify the population and sample. • A survey, 250 college students at Union College were asked if they smoked cigarettes regularly. Thirty-five of the students said yes. Identify the population and the sample. • Student Income: Decide whether the numerical value describes a population parameter or a sample statistic. • A survey of 450 A.O.Fox Hospital patients indicated that 38% indicated their care while in the hospital as “excellent.” • For both of the above studies: • What are the units of the population/sample? • Identify a variable being studied. • Identify values of the variable.

  16. Descriptive Statistics: • DESCRIPTIVE STATISTICS: Organize and summarize information using numerical and graphical methods. • Examples: • Summarizing the age of cars driven by students in a frequency table. • Graphing the ages of students. • Identifying the mean speed of cars driving in a 30 mph zone. • A descriptive statement describes some aspect of the data. (Select a statistical measure and put it into sentence format.) • Examples: • Thirty-eight percent of the orange trees suffered damage due to the cold temperatures. • The average weight for the 23 cars studied was 2,738 lb. • The mean number of days Otsego Lake was frozen per winter was 88.69 days.

  17. Descriptive Statistics at Work: SUNY Oneonta Car Registrations Numeric tables, pictures (graphs & charts), and text are three methods used to present data. During the 2006 year there were 1.346 cars registered at SUNY Oneonta. Car registrations contain many variables, such as car type, car color, year of car, and license plate number. Noted below are ways descriptive statistics are used to convey information about the selected variables: a frequency table of Registrant Type (i.e. who registered the car); a graphic presentation of Vehicle Age; and text (written descriptive statement) presenting the meanVehicle Age, of the registered cars. Frequency Table: Graphic presentation (here a Histogram): Mean & Median:The Mean age of cars driven by students was 7.45 years (vs. 6.19 yrs. for employees). The Median age of registered vehicles for students was 7.0 years (5.0 years for employees).

  18. Inferential Statistics: • INFERENTIAL STATISTICS uses sample data to make estimates, decisions, predictions, or other generalizations about the population. • The aim of inferential statistics is to make an inference about a population, based on a sample (as opposed to a population census), AND to provide a measure of precision for the method used to make the inference. • An inferential statement uses data from a sample and applies it to a population.

  19. Examples of Inferential Statistics: • A Gallup Poll found that 57% of dating teens had been out with somebody of another race or ethnic group (+/- 4.5%; 95% CI) • Interpretation: We are 95% confident that between 52.5% and 61.5% (57% +/- 4/5%) of dating teens have been out with someone of a different race/ethnicity. • A Gallup Poll found that 40% of Americans would quit their job if they won the lottery (+/- 4%; 95% CI). • Interpretation: We are 95% confident that the true population proportion of Americans who would quit their job if they were to win a lottery lies between 36% and 44%).

  20. Example: Descriptive and Inferential Statistics A large sample of men, aged 48, was studied for 18 years. For unmarried men, approximately 70% were alive at age 65. For married men, 90% were alive at age 65. Decide which part of the study represents the descriptive branch of statistics. What conclusions might be drawn from the study using inferential statistics? Source: (The Journal of Family Issues) Larson/Farber 4th ed.

  21. Solution: Descriptive and Inferential Statistics Descriptive statistics involves statements such as “For unmarried men, approximately 70% were alive at age 65” and “For married men, 90% were alive at 65.” A possible inference drawn from the study is that being married is associated with a longer life for men. Source: Larson/Farber 4th ed.

  22. Two Types of Data Qualitative Data:Variablescan be separated into different categories (values) that are distinguished by some nonnumeric characteristic. Qualitative data are also referred to as categorical or attribute data. Examples include gender, eye color, and car brands Note that the values of this type of variable are differentiated by words rather than numeric values. Example: Eye Color values include blue, brown, hazel, etc. Characteristics of Data Before conducting any data analysis the characteristics of the variable under study must be identified. This will result in utilizing appropriate tables, graphs and statistical analysis.

  23. Quantitative Dataare “number-based” and represent counts or measurements. This type of data may be subdivided into two categories... N.B.: Qualitative data cannot be classified as discrete or continuous. • Discrete Data - resultwhen the number of possible values is either a finite or a countably infinite number. • Examples: Siblings, Cars, and Coins in a jar (think of whole number counts here; even if you cannot count them all). • Continuous Data - result from infinitely many possible values corresponding to some continuous scale that covers a range of values without gaps, interruptions, or jumps. Continuous data can assume any value, including fractional parts. • Examples: Height, Weight, Time

  24. Example: Classifying Data by Type The base prices of several vehicles are shown in the table. Which data are qualitative data and which are quantitative data? (Source Ford Motor Company) Source: Larson/Farber 4th ed.

  25. Solution: Classifying Data by Type Qualitative Data (Names of vehicle models are nonnumerical entries) Quantitative Data (Base prices of vehicles models are numerical entries) Source: Larson/Farber 4th ed.

  26. 4 Levels of Measurement Lowest to highest The level of measurement determines which statistical calculations are meaningful. The four levels of measurement are: nominal,ordinal,interval,andratio. Nominal Levels of Measurement Ordinal Interval Ratio

  27. Levels of Measurement (cont.) • Nominal – characterized by data that consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme. Qualitative data. • Examples: Gender, Yes/No, Political Party affiliation, names of students. • Ordinal– characterized by data that can be arranged in some order, but the differences between data values either cannot be determined or are meaningless. These variables may be either qualitative (categorical) data or quantitative (numerical) data. • Examples: Military Rank, Position in a race, Attitude scales.

  28. Levels of Measurement (cont.) • Interval– like the ordinal level, with the additional property that the difference between any two data values is meaningful. However, there is no natural zero starting point. Quantitative data. • Examples: Temperature (F or C); longitude; Calendar Years. • Ratio– is the interval level modified to include the natural zero starting point. At this level, differences and ratios are both meaningful. Quantitative data. • Examples: Height, Weight, Time, Age.

  29. Summary of Levels of Measurement • Level of measurement Put data in categories Arrange data in order Subtract data values Determine if one data value is a multiple of another • Nominal Yes No No No • Ordinal Yes Yes No No • Interval Yes Yes Yes No • Ratio Yes Yes Yes Yes

  30. Example: Classifying Data by Level Two data sets are shown. Which data set consists of data at the nominal level? Which data set consists of data at the ordinal level?(Source: Nielsen Media Research) Source: Larson/Farber 4th ed.

  31. Solution: Classifying Data by Level Nominal level (lists the call letters of each network affiliate. Call letters are names of network affiliates.) Ordinal level (lists the rank of five TV programs. Data can be ordered. Difference between ranks is not meaningful.) Source: Larson/Farber 4th ed.

  32. Example: Classifying Data by Level Two data sets are shown. Which data set consists of data at the interval level? Which data set consists of data at the ratio level?(Source: Major League Baseball) Source: Larson/Farber 4th ed.

  33. Solution: Classifying Data by Level Ratio level (Can find differences and write ratios.) Interval level (Quantitative data. Can find a difference between two dates, but a ratio does not make sense.) Source: Larson/Farber 4th ed.

  34. Statistics is the study of how to collect, organize, analyze, interpret and report numerical information. Anatomy of the Basics: Statistical Terms and Relationships Descriptive Statistics: methods for organizing and summarizing information. E.g. Number of students in this class by major, baseball standings, housing sales by month. Inferential Statistics: methods for drawing conclusions and measuring the reliability of those conclusions using sample results. E.g. Political views of all 4-year college students. Parameter:numerical characteristic of a population. Census: data collected from ALL members of the population. Population:all individuals, items, or objects whose characteristics are being studied. Population vs. Sample Sample: a portion of the population selected for study. Statistic:numerical characteristic of a sample. Qualitative:a variable that cannot be measured numerically E.g. Gender, eye color. Variable: a characteristic or property of an individual unit. Variables have values. Discrete:a variable whose values are countable. It can only assume certain values, with no intermediate values. E.g. Number of auto accidents in Oneonta in 1998. Quantitative: a variable that can be measured numerically. E.g. Income, height, number of siblings one has. Continuous: a variable that can assume any numerical value over an interval or intervals. E.g.Time. Nominal: grouping individual observations into qualitative categories or classes. E.g. Grouping individuals by whether they are left-handed or right-handed. No Arithmetic Operations: individual observations can only be categorized. Ordinal: individual observations are assigned a number or “ranking.” There is a sense of “more than,” but you cannot say “how much” more than. E.g. Military ranks. Scaling of Variables (Measurement Levels) Interval:variables have no true zero point. Cannot say how much more. E.g. Temperature ( F or C), IQ scores. Arithmetic Operations: individual observations have meaningful numeric values. Ratio:variables have a true zero point. Can say how much more. E.g. Weight, height.

  35. Example Application Study: Climate Change in Otsego County, NY, USA (1850-2010): Variable of interest: Time Period Otsego Lake was Frozen Values measured in days Sample or Population data? Descriptive or Inferential? Data are Qualitative or Quantitative? Measurement Level? Possible analysis approaches: Descriptive & Inferential

  36. Data: One variable (here unidentified, i.e. no context), multiple values “Raw” Data (N=160) “Organized” raw data (N=160) Unit 73 “different” numbers

  37. Time Period Otsego Lake was Frozen (days) Grouped Data Raw Data

  38. Time Period Otsego Lake was Frozen (days)

  39. Data: Two Variables: year and days; multiple values

  40. Time Period Otsego Lake was Frozen:Mean Days/Decade

  41. Time Period Otsego Lake was Frozen:Mean Days/Decade

  42. So is the Greenhouse Effect at work here? To be studied through further statistical analysis, such as the use of ANOVA…

  43. Sample Problem In an experiment to determine the effect of a certain drug on serum cholesterol level measured in mg/100 ml) in 30-year-old males, the following data [listed to right] were recorded for the drug treated group. [Duncan, Knapp & Miller p. 17] Determine the following: 1) Purpose of the study 2) Type of study 3) Sample or population 4) Variable being collected [values are…] 5) Qualitative or Quantitative 6) Measurement level 7) Descriptive or inferential

  44. Additional Topics: Misuse of Statistics • Misleading Graphs Visual distortions of data – beware of truncation in the y-axis of tables. • Pictographs The crescive cow – all figures need to be of the same dimensions. • Pollster Pressure Public bathrooms: What’s the politically correct answer? Outside a public bathroom people asked if they washed their hands – high percentage indicated yes. Through observation within the bathroom, the percentage was far less. Small/Bad Samples 67% suspended from special education program for disruptive school students. Sound pretty ineffectiive, but then there were only three students in the program… • Self-Selected Surveys CNN & ESPN phone-in surveys – self-selection negates any ability to make an inference beyond those responding. ah yes… the old torture the data long enough and they will confess to anything routine... • Precise Numbers Tonight’s paid attendance was 56,423 – More meaningful than just noting more than 54,000? • Guesstimates Millions to be in Times Square for the New Year’s Ball Drop. Millions! Really??? It’s not that big a space… • Distorted Percentages New and improved with 50% more ... – 50% might not be a meaningful amount. • Partial Pictures Toyota Camry adds – high percentage of last 10 year’s cars on the road, but most are newer; Car insurance ads – focus on the number joining a company, but what about those leaving? • Loaded Questions - Bias Line item veto: “The President should have the line item veto,” vs. “The President should have the line item veto to eliminate wasteful spending.”

  45. Pictograph: “This year my business profits doubled!”

  46. Visual Presentations of Data – Beware Source: http://findarticles.com

  47. Data Considerations Anecdotal Evidence – basing our conclusions on a few individual cases. e.g. We remember the airplane crash that kills several hundred people and fail to notice that data for all flights show that flying is much safer than driving. Lurking Variables – almost all relationships between two variables are influenced by other variables lurking in the background.

  48. Airline Flights: Alaska Airlines vs. American West Which would you choose to fly? On Time Delayed Alaska Airlines 3274 (86.7%) 501 (13.3%) America West 6438 (89.1%) 787 (10.9%)

  49. We now know that American West has a better “On Time” record, but Alaska Airlines has a better “On Time” record at every airport. How can that be? Departure Location On Time Delayed On Time Delayed Los Angeles 497 (88.9%) 62 (11.1) 694 (85.6) 117 (14.4) Phoenix 221 (94.8) 12 (5.2) 4840 (92.1) 415 (7.9) San Diego 212 (91.4) 20 (8.6) 383 (85.5) 65 (14.5) San Francisco 503 (83.1) 102 (16.9) 320 (71.3) 129 (28.7) Seattle 1841 (85.8) 305 (14.2) 201 (76.7) 61 (23.3) TOTAL 3274 (86.7) 501 (13.3) 6438 (89.1) 787 (10.9) Alaska Air America West

  50. End of Slides

More Related