Uses of Statistics: Descriptive and Inferential Analysis

Review: Two Main Uses of Statistics Descriptive: To describe or summarize a collection of data points The data set in hand = all the data points of interest No “going beyond the data” Inferential: To make decisions, draw inferences about a population, or draw conclusions under conditions of uncertainty The data set in hand = merely a sample from some larger population (that is our real interest) Use these data to make a calculated guess about what we would find if we had full information Use probability theory to make calculated guesses (with some estimated margin of error)

Statistical inferences involve two possible tasks: Estimation: Use sample data to infer population parameter  e.g., lifetime risk of being a victim of a violent crime according to NCVS data Hypothesis Testing: Use sample data to make a decision about the correctness of some hypothesis or prediction  e.g., whether civil orders of protection will lower the risk recurrent violence against spouses

Both tasks rely on using Samples to make statements about populations: Sample: A limited number of cases selected to represent the larger population of data points Key Terms/Ideas in Sampling: Representativeness  degree to which sample is an exact replica in miniature of the population Sampling Error  degree to which sample statistic deviates from population value Sampling Method  procedure used to draw cases from the population of data points

Two main types of sampling methods: Probability Sampling Selection where each data point has a known probability for being selected into the sample Simple Random sample every data point has an equal likelihood of being selected Other types of probability samples? Systematic Stratified Weighted Cluster Doesn’t guarantee representativeness each time

Two main types of sampling methods: Non-probability Sampling: Selection procedure in which probability of selection is unknown Specific types of Non-probability samples? Accidental Convenience Purposive Snowball Volunteer No guarantee of representativeness Inferences become more “iffy”

Why use one sample method versus another? Maximize representativeness of data Minimize sampling error and bias in data Maximize the validity of our statistical inferences Note: Inferential statistics always assume simple random sampling as a basic premise

What exactly is “randomness”? What does it look like? How can we tell if we have it? Randomness is a property of our data selection procedure not the data points Compromises to randomness: Any deliberate departure from sampling Refusals, dropouts, nonresponses are (almost) never random Our data are always a more-or-less imperfect approximation to real random sample

Making inferences from sample statistics involves 3 distributions: Population distribution: unobserved in population from which cases drawn Sample distribution: observed in cases from which data were collected Sampling distribution:unobserved but calculable distribution of statistics for samples of same size/type as ours (drawn from the same population)  This distribution is the key to making inferences

“Sampling Distribution”: what is it? A hypothetical population of samples (and sample statistics) from drawn from the same population Has a describable theoretical distribution (based on repeatedly drawing a sample an infinite number of times) Has certain parameters determined by the population from which the sample is drawn and the size of the sample

e.g.: If we draw a sample of 25 cases and compute the sample mean The sample mean has a theoretical sampling distribution whose characteristics are exactly determined by the distribution of the population (μ & σ) and by the sample size (n=25) The mean of the sampling distribution = the mean of the population (μs = μpop) In this case: the σ of the sampling distribution = σ/5 (i.e., one-fifth the σ of the population)

Important features of Sampling distributions: If the variable is normally distributed in the population, then the sampling distribution of sample means will also be normal The mean of the sampling distribution = the mean of the population The σ of the sampling distribution = σ/√n Use this information to compute the likelihood of any sample mean being drawn from the population (using the standard normal [z] table)

Important features of Sampling distributions to remember: The σ of the sampling distribution will always be smaller than the σ of the population The large the sample size, the smaller the standard error of the sampling distribution The mean of the sampling distribution will always be the population mean The sampling distribution will become more Normal as the sample size gets larger – no matter the distribution of the population! [this is called the Central Limit Theorem]

Using Sample statistics to make inferences about population parameters: The best estimate of the population mean is the sample mean The usual sample estimate of σ is slightly too low; it needs to be adjusted to be unbiased Thus there are two different formulas for the sample variance/standard deviation: (descriptive) (inferential/estimated)

Basic Steps in Estimating Population Parameters: Select valid estimator (unbiased, consistent) Select valid data sample Corresponds to population of interest Random sample Complete (no censoring or omissions) Compute value of statistical estimate Compute standard error for the estimate Compute confidence interval (i.e., plausible margin of sampling error)

Two Approaches to estimation: Point Estimation: Use sample data to infer exact value of population parameter Highly likely to be wrong or off-mark to some degree e.g., infer that 30% of adults will be victims of violent crime in their lifetimes (could actually be 35% or 25%) Interval Estimation: Instead use sample data to compute a range of values (“confidence intervals”) within which the actual parameter is located (with some calculated margin of certainty or confidence) Yields more approximate but more plausible (or confident) estimates.

Confidence Interval Estimation: Compute the sample mean Compute the sample standard error From the population (σ) From the sample (s or ) Compute the confidence interval or

Note: For population value: For sample value:

Uses of Statistics: Descriptive and Inferential Analysis

Uses of Statistics: Descriptive and Inferential Analysis

Presentation Transcript

Uses of FDI statistics

Two Sorts of Statistics

Two main strategies . . .

Statistics Of Two Variables

Chapter 3 Review Two Variable Statistics

Review of Statistics

Two Main Categories of Fossils

Statistics of Two Variables

Statistics of Two Variables

Two main Categories of Drugs

Statistics of Two Variables

Two main objectives

Two main tasks in inferential statistics: (revisited)

Policy Uses of Federal Statistics

Uses of the Energy Statistics.

Two main organizations:

Who Uses Statistics?

Uses of Statistics

3 main uses of nuclear science

Uses of Python | Top 9 main Uses of Python Programming

Main Uses Of Polariscope

Main Uses Of Polariscope