ch 2 probability sampling srs l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Ch 2: probability sampling, SRS PowerPoint Presentation
Download Presentation
Ch 2: probability sampling, SRS

Loading in 2 Seconds...

play fullscreen
1 / 108

Ch 2: probability sampling, SRS - PowerPoint PPT Presentation


  • 885 Views
  • Uploaded on

Ch 2: probability sampling, SRS. Overview of probability sampling Establish basic notation and concepts Population distribution of Y : object of inference Sampling distribution of an estimator under a design: assessing the quality of the estimate used to make inference

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Ch 2: probability sampling, SRS' - wendi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ch 2 probability sampling srs
Ch 2: probability sampling, SRS
  • Overview of probability sampling
  • Establish basic notation and concepts
    • Population distribution of Y : object of inference
    • Sampling distribution of an estimator under a design: assessing the quality of the estimate used to make inference
  • Apply these to SRS
    • Selecting a SRS sample
    • Estimating population parameters (means, totals, proportions)
    • Estimating standard errors and confidence intervals
    • Determining the sample size
assume ideal setting
Assume ideal setting
  • Sampled population = target population
    • Sampling frame is complete and does not contain any OUs beyond the target pop
    • No unit nonresponse
  • Measurement process is perfect
    • All measurements are accurate
    • No missing data (no item nonresponse)
  • That is, nonsampling error is absent
survey error model
Survey error model

Total Survey Error

Sampling Error

Nonsampling Error

=

+

Due to the sampling process (i.e., we observe only part of population)

Measurement errorNonresponse errorFrame error

Assessed via bias and variance

probability sample
Probability sample
  • DEFN: A sample in which each unit in the population has a known, nonzero probability of being included in the sample
  • Known probability  we can quantify the probability of a SU of being included in the sample
    • Assign during design, use in estimation
  • Nonzero probability  every SU has a positive chance of being included in the sample
    • Proper survey estimates represent entire target population (under our ideal setting)
probability sampling relies on random selection methods
Probability sampling relies on random selection methods
  • Random sampling is NOT a haphazard method of selection
    • Involves very specific rules that include an element of chance as to which unit is selected
    • Only the outcome of the probability sampling process (i.e., the resulting sample) is random
  • More complicated than non-random samples, but provides important advantages
    • Avoid bias that can be induced by selector
    • Required to calculate valid statistical estimates (e.g., mean) and measures of the quality of the estimates (e.g., standard error of mean)
representative sample
Representative sample
  • Goal is to have a “representative sample”
  • Probability sampling is used to achieve this by giving each OU in target population an explicit chance to be included in the sample
    • Sample reflects variability in the population
    • Applies to the sample, but does not apply to the OU/SU (don’t expect each observation to be a “typical” pop unit
  • Can create legitimate sample designs that deliberately skew the sample to include adequate numbers of important parts of the variation
    • Common example: oversampling minorities, women
    • MUST use estimation procedures that take into account the sample design to make inferences about the target population (e.g., sample weights)
basic sampling designs
Basic sampling designs
  • Simple selection methods
    • Simple random sampling (Ch 2 & 3)
      • Select the sample using, e.g., a random number table
    • Systematic sampling (2.6, 5.6)
      • Random start, take every k-th SU
    • Probability proportional to size (6.2.3)
      • “Larger” SU’s have a higher chance of being included in sample
  • Selection methods with explicit structure
    • Stratified sampling (Ch 4)
      • Divide population into groups (strata)
      • Take sample in every stratum
    • Cluster sampling (Ch 5 & 6)
      • OUs aggregated into larger units called clusters
      • SU is a cluster
examples
Examples
  • Select a sample of n faculty from the 1500 UNL faculty on campus
    • Goal: estimate total (or average) number of hours faculty spend per week teaching courses
  • Simple random sampling (SRS)
    • Number faculty from 1 to 1500
    • Select a set of n random numbers (integers) between 1 and 1500
    • Faculty with ids that match the random numbers are included in the sample
examples 2
Examples - 2
  • Systematic sampling (SYS)
    • Choose a random number between 1 and 1500/n
    • Select faculty member with that id, and then take every k-th faculty member in the list, with sampling interval k is 1500/n
  • SRS / SYS
    • Each faculty member has an equal chance of being included in sample
    • Each sample of n faculty is equally likely
examples 3
Examples - 3
  • Probability proportional to size (PPS)
    • With pps design, we assign a selection probability to each faculty member that is proportional to the number of courses taught by a faculty member that semester
    • “Size” measure = # of courses taught by faculty member
    • Faculty who teach more courses are more likely to be included in the sample, but those that teach less still have a positive chance of being included
      • Motivation: faculty that spend more hours on courses are more critical to getting good estimate of total hours spent
    • Data from faculty with higher inclusion probabilities will be “down weighted” relative to those with lower probabilities during the estimation process
      • Typically accomplished using weights for each observation in the dataset
examples 4
Examples - 4
  • Stratified random sampling (STS)
    • Organize list of faculty by college
      • Stratum = college
    • Allocate n (divide sample size) among colleges so that we select nh faculty in the h-th college
      • Sum of nhover strata equals n
    • Use SRS, e.g., to select sample in each of the college strata
      • Could use SYS or PPS rather than SRS
      • Could have different selection methods in each stratum
examples 5
Examples - 5
  • Cluster sampling (CS)
    • Aggregate faculty into departments
      • OU = faculty member, SU = dept
    • Select a sample of departments, e.g., using SRS
    • Very common to use PPS for selecting clusters
      • “Size” measure = number of OUs in the the cluster SU
    • Many variants for cluster sampling
      • After selecting clusters, may want to select a sample of OUs in the cluster rather than taking data on every OU
      • E.g., select 15 depts in the first stage of sampling, then select 10 faculty in each dept in a second stage of sampling
      • This is called 2-stage sampling
examples 6
Examples - 6
  • Complex sample designs (Ch 7)
    • Combine basic selection methods (SRS, SYS, PPS) with different methods of organizing the population for sampling (strata, clusters)
  • Typically have more than one stage of sampling (multi-stage design)
    • Often can not create a frame of all OUs in the population
      • Need to select larger units first and then construct a frame
    • Stratification and systematic sampling are often used to encourage spread across the population
      • This improves chances of obtaining a representative sample
    • Costs are often reduced by selecting clusters of OUs, although cluster sampling may lead to less precision in estimates
notation for target population
Notation for target population
  • The total number of OUs in the population (also called the universe) is denoted by N
    • Note UPPER CASE
    • Ideally for SRS, sampling frame is list of N OUs in the pop
    • EX: there are N = 4 households in our class
  • Index set (labels) for all OUs in the population (or universe) is called U
    • U = {1, 2, …, N}
    • A different index set could be our names, or our SSNs
  • Each person has a value for the characteristic of interest or random variable Y , the number of people in the household
    • The value of Y for household i is denoted by yi
    • Values in the population are y1, y2, …, yN
notation for sample
Notation for sample
  • Sample size is denoted by n
    • Note lower case
    • n is always less than or equal to N (n = N is a census)
  • Index set (labels) for OUs in the sample is denoted by S
    • To select a sample, we are selecting n indices (labels) from the universe U , consisting of N indices for the population
    • U is our sampling frame in this simple setting
    • Labels in S may not be sequential because we are selecting a subset of U
class example
Class example
  • Suppose n = 2 households are selected from a population of N = 4 households in the class
    • U = {1, 2, 3, 4}
  • Randomly select sample using SRS and get 2 and 3
    • S =
  • The data collected on OUs in the sample are values for Y = number of people in the household
    • Data:
summary of probability sampling framework
Summary of probability sampling framework
  • Assumptions (for now)
    • Observation unit = sampling unit
  • Target population = sampling universe = sampling frame
    • N = finite number of OUs in the population
    • U = {1, 2, …, N} is the index set for the OUs in the population
  • Sample
    • n = sample size (n is less than or equal to N )
    • S = index set for n elements selected from population of N units (S is a subset of U)
conceptual basis for probability sampling
Conceptual basis for probability sampling
  • Conceptual framework for selecting samples
    • Enumerate all possible samples of size n from the population of size N
    • Each sample has a known probability of being selected
      • P(S) = probability of selecting sampleS
      • Use this probability scheme to randomly choose the sample
    • Using the probability scheme for the samples, can determine the inclusion probability for each SU
      • i = probability that a sample is selected that includes uniti
simple example
Simple example
  • Population of 4 students in study group, take a random sample of 2 students
  • Setting
    • U = {1, 2, 3, 4}
    • N = 4
    • n = 2
  • All possible samples of size n = 2 from N = 4 elements
  • Note: n < N and S U
simple example 2
Simple example - 2
  • All possible samples

S1 = {1, 2} S3 = {1, 4} S5 = {2, 4}

S2 = {1, 3} S4 = {2, 3} S6 = {3, 4}

  • Design is determined by assigning a selection probability to each possible sample

P(S1) = 1/3 P(S3) = 1/2 P(S5) = 0

P(S2) = 1/6 P(S4) = 0 P(S6) = 0

simple example 3
Simple example - 3
  • Inclusion probability definition?
  • What is the probability that student 1 is included in the sample?
    • 1 =
  • Inclusion probability for student 2, 3, 4?
    • 2 =
    • 3 =
    • 4 =
  • Is this a probability sample?
population distribution
Population distribution
  • Response variables represent values associated with a characteristic of interest for i-th OU
    • Y is the random variable for the characteristic of interest (CAP Y)
    • yi = value of characteristic for OU i(small y)
  • The population distribution is the distribution of Y for the target population
    • Y is a discrete random variable with a finite number of possible values (<= N values)
    • Use discrete probability distribution to represent the distribution of Y
population distribution 2
Population distribution - 2
  • A discrete probability distribution is denoted by a series of pairs corresponding to
    • Value of the random variable Y, denoted by y
    • Relative frequency of the value y for the random variable Y in the population, denoted by P(Y = y)
    • Pair is { y , P(Y = y) }
  • Constructing a probability distribution
    • List all unique values y of random variable Y
    • Record the relative frequency of y in the population, P(Y = y)
class example 2
Class example - 2
  • Back to # of people in household for each class member
  • What are the unique values in the pop?
  • What is the frequency of each value?
  • What is the relative frequency of each value?
  • Construct a histogram depicting the variation in values
summarizing the population distribution
Summarizing the population distribution
  • Use population parameters to summarize population distribution
  • Mean or expected value of y (parameter: )
  • Proportion of population having a particular characteristic = mean of a binary (0, 1) variable (parameter: p)
  • For finite populations, population total of y is often of interest (parameter: t)
  • Variance of y (parameter: S 2)
mean of y for population
Mean of Y for population
  • Expected value, or population mean, of Y
    • Mean is in y-units per OU-unit
    • Measure of central tendency (middle of distn)
    • Related to population total (t) and proportion (p)
  • Examples
    • Average number of miles driven per week adults in US
    • Average number of phone lines per household
class example 3
Class example - 3
  • What is the mean household size for people in this classroom?
total of y in population
Total of Y in population
  • Population total of Y
    • Total number of y-units in the population
  • Examples
    • Number of households in market area with DSL
      • yi =1 if household i has DSL, yi = 0 if not
      • N = number of households in market area
    • Number of deer in Iowa
      • yi =number of deer observed in area i
      • N = number of observation areas in Iowa
class example 4
Class example - 4
  • What is the total number of people living in households of people in the classroom?
proportion
Proportion
  • Proportion (p) of population having a particular characteristic
    • Mean of binary variable
class example 5
Class example - 5
  • What proportion of people in the classroom have a cell phone?
population variance of y
Population variance of Y
  • Population variance of Y
  • Measure of spread or variability in population’s response values
    • Analogous to 2in other stat classes
    • Not the standard error of an estimate
    • Note this is CAP S 2
coefficient of variance for y
Coefficient of variance for Y
  • Variation relative to mean (unitless)
class example 6
Class example - 6
  • What is the population variance for number of people in households of people in the classroom?
  • What is the CV?
summary of population distribution of y
Summary of population distribution of Y
  • Basic pop unit: OU (i)
  • Number of units or size of pop: N
  • Random variable: Y
  • Parameters: characterize the target population
    • Mean
    • Total t
    • Proportion (mean) p
    • Variance S2
    • Coefficient of variation CV = S /
  • STATIC: it is the object of inference and never changes with design or estimator
what s next
What’s next
  • Population distribution of Y is object of inference
  • Use SRS to select a sample and estimate the parameters of the population distribution
    • How to select a sample
    • Estimators for population parameters of Y under SRS
      • Sample mean estimates population mean
      • N x sample mean estimates population total
      • Sample variance estimates population variance
    • Assessing the quality of an estimator of a population parameter under SRS
      • Sampling distribution
      • Bias, standard error, confidence intervals for the estimator
simple random sample srs
Simple random sample (SRS)
  • DEFN: A SRS is a sample in which every possible subset of n SUs has an equal chance of being selected as the sample
    •  every sampling unit has equal chance of being included in the sample
    • Example of an “equal probability” sample
    • Does not imply that a sample in which each SU has the same inclusion probability is a SRS
      • Other non-SRS designs can generate equal probability samples
simple random sampling srs
Simple random sampling (SRS)
  • Two types
    • SRSWR (SRS with replacement)
      • Return SU after each step in the selection process
    • SRSWOR (SRS without replacement)
      • Do not return SU after it has been selected
  • Selection probability
    • Probability that a unit is selected in a single draw
      • Constant throughout SRSWR process
      • Changes with each draw in the SRSWOR process
    • NOT an inclusion probability, which considers the probability of drawing a sample that includes unit i
srswr srs with replacement
SRSWR (SRS with replacement)
  • Selection procedure
    • Select one OU with probability 1/N from N OUs
      • This is the selection probability for each draw
    • Returning selected OU to universe
    • Repeat n times
  • Procedure is like drawing n independent samples of size 1
    • Can draw a sampling unit twice – duplicate units
    • Unappealing for finite populations – no additional info in having a duplicate unit
    • Useful in theoretical development for large populations
focus srswor srs without replacement
Focus: SRSWOR (SRS without replacement)
  • Selection procedure
    • Select one OU from universe of size N with probability 1/N
    • DON’T return selected unit to universe
    • Select 2nd OU from remaining units in universe with probability 1/(N - 1)
    • DON’T return selected unit to universe
    • Repeat until n sampling units have been selected
  • Selection probabilities change with each draw
    • 1/N, then 1/(N -1), then 1/(N -2), …, 1/(N – n +1)
srswor srs without replacement
SRSWOR (SRS without replacement)
  • Probability of selecting a sampling unit in a single draw depends on number of SUs already selected (conditional probability)
    • On the c-th step of the process, c-1 s.u.s have already been selected for a sample of size n
    • Probability of selecting any of the remaining N – c + 1 s.u.s in the next draw is
  • Inclusion probability for SU i (unconditional probability)
    • (see p. 44 in text)
srswor srs without replacement42
SRSWOR (SRS without replacement)
  • Number of possible SRSWOR samples of size n from universe of size N
  • Probability of selecting a sample S

(Probability is the same for all samples)

selecting a srs using srswor
Selecting a SRS using SRSWOR
  • Create a sampling frame
    • List of sampling units in the universe or population
    • Assigns an index to each sampling unit
  • Determine a selection procedure that performs SRSWOR
    • Procedure must generate to n unique sampling units such that each SU has an equal chance of being included in the sample
    • Random number generator or table is common basis
    • Need rules to identify when the selected unit is included in the sample or tossed
  • Select random numbers and determine sampled units
using random numbers to select a srswor sample
Using random numbers to select a SRSWOR sample
  • Determine a rule to assign random numbers to the sampling universe index set U
    • Rule must give each unit an equal chance of being included in the sample
  • Select the set of random numbers, e.g., using computer or printed random number table
    • Apply the rule to each random number to determine the sampled OU
    • Check to see if this OU has already been selected
      • If already selected, ignore it
    • Keep going until you have n SUs in the sample
census of agriculture example
Census of Agriculture example

Select 300 counties from 3078 counties in the US

    • N =
    • n =
  • Sampling frame = ?
  • Generate random numbers between 0 and 1 on the computer
    • Need n or more random numbers depending on rule
  • Multiply each random number by N = 3078and round up to the nearest integer
    • Random number = .61663
    • Multiply random # by N = 3078 x .61663 = 1897.98714
    • Round up to 1898
    • Take 1898th county in the frame
estimating population mean under srs
Estimating population mean under SRS
  • Target population mean
  • Estimator of for SRS sample of size n is the sample mean
  • Note
    • “Estimator” refers to the formula
    • “Estimate” refers to the value obtained from using the formula with data
class example 7
Class example - 7
  • Estimate the average household size for our classroom
estimating population total
Estimating population total
  • Target population total
  • Estimator of t for SRS sample of size n
class example 8
Class example - 8
  • Estimate the total number of people living in the households of people in this classroom
estimating population proportion
Estimating population proportion
  • Target population proportion
    • Y takes on values 0 or 1, where 1 means the unit has the characteristic of interest
  • Estimator of p for SRS sample of size n
class example 9
Class example - 9
  • Estimate the proportion of people with cell phones in this class room
estimating population variance
Estimating population variance
  • Target population variance
  • Estimator of S2 for SRS sample of size n is the sample variance

(note lower case s)

class example 10
Class example - 10
  • Estimate the variance of number of people in households of people in this class room
estimating population standard deviation and cv
Estimating population standard deviation and CV
  • Standard deviation of Y, S ?
  • Estimator of standard deviation of Y?
  • CV of population distribution?
  • Estimator of CV?
what would happen if we took another sample
What would happen if we took another sample?
  • S =
  • Data =
  • Estimates
    • Mean
    • Total
    • Proportion
    • Standard deviation
    • CV
sampling distribution
Sampling distribution
  • Need to assess the quality of our estimates
    • Is a good estimator of ?
    • Is a good estimator of p ?
    • Is s2 a good estimator of S2 ?
  • Use the sampling distribution to assess the quality of the estimator
    • Distribution of estimator over all possible samples
    • EX: distribution of over all possible SRS samples of size n from a population of size N
measures of quality
Measures of quality
  • Denote
    • Population parameter as  [think pop mean ]
    • Estimator of  as [think sample mean ]
  • Mean of the sampling distribution is the expected value of the estimator
    • An estimator is unbiased if
  • Variance of the sampling distribution
    • Precision: want variance of estimator to be small
  • Coefficient of variance
    • Relative precision: want CV to be small
sampling distribution of estimator
Sampling distribution of estimator
  • Basic pop unit: sample selected using a specific design, S
  • Number of units or size of pop: number of possible samples
    • Need probability of selecting sample !
  • Random variable: estimator of parameter,
  • Parameters: characterize the quality of the estimator
    • Mean (assesses bias of the estimator),
    • Variance, SE, CV (assesses precision of estimator)
  • DEPENDS on population parameter, estimator of population parameter, sample design
population sampling distribution distribution
Basic unit: OU (i)

Total number of units: N

Random variable: character of interest, Y

Parameters: characterize the target population

Mean , proportion p (central tendency)

Total t

Variance S2, std dev S, CV (spread of distn)

STATIC once you identify Y, pop distribtn is the object of inference and never changes with design or estimator

Basic unit: sample selected using a specific design, S

Total number of units: number of possible samples

Random variable: estimator of parameter,

Parameters: characterize the quality of the estimator

Mean (used to assess bias of the estimator)

Variance , SE, CV (precision of estimator)

DEPENDS on population parameter, estimator of population parameter, sample design

Population Samplingdistribution distribution
conceptual framework for a sampling distribution 1
Conceptual framework for a sampling distribution - 1
  • List out all possible samples of size n from the population of size N
    • A sample is the BASIC UNIT for the population of all possible samples
    • We determine the probability of selecting the sample
      • Unequal probability sample (now)
      • Simple random sample
    • NOTE: sampling distribution depends on the design selected
simple example from earlier lecture not srs
Simple example from earlier lecture (not SRS!)
  • All possible samples

S1 = {1, 2} S3 = {1, 4} S5 = {2, 4}

S2 = {1, 3} S4 = {2, 3} S6 = {3, 4}

  • Design is determined by assigning a selection probability to each possible sample

P(S1) = 1/3 P(S3) = 1/2 P(S5) = 0

P(S2) = 1/6 P(S4) = 0 P(S6) = 0

conceptual framework for a sampling distribution 2
Conceptual framework for a sampling distribution - 2
  • List
  • Using the n data values associated with each sample, calculate the value of the estimator for each sample
    • The estimator is the random variable of our distribution
    • Example: sample mean is calculated for each of the possible samples
    • NOTE: the sampling distribution depends on the estimator selected
simple example from earlier lecture 2
Simple example from earlier lecture - 2
  • Population values for Y
    • i 1 2 3 4
    • yi3 5 1 3
  • All possible samples of size n = 2

S1 = {1, 2}, S2 = {1, 3}, S3 = {1, 4},S4 = {2, 3}, S5 = {2, 4}, S6 = {3, 4}

  • Values of corresponding to each sample
conceptual framework for a sampling distribution 3
Conceptual framework for a sampling distribution - 3
  • List
  • Using
  • Sampling distribution is described by pairs of values for estimator from the sample and relative frequency of obtaining that value
    • We are using the steps we used before for creating a discrete distribution
representing the sampling distribution
Representing the sampling distribution
  • Probability distribution: pairs of
    • is a random variable, c is a valueof
simple example from previous lecture 3
Simple example from previous lecture - 3
  • Number of possible samples
  • Probability of selecting sample
  • Probability distribution: unique values of and relative frequency

c 2.0 3.0 4.0

conceptual framework for a sampling distribution 4
Conceptual framework for a sampling distribution - 4
  • List
  • Using
  • Sampling distribution
  • Parameters summarize sampling distribution
    • Mean of sampling distribution
    • Variance, std dev (SE) of sampling distribution
    • CV of sampling distribution
ex mean and variance of sampling distribution for 4
Ex: mean and variance of sampling distribution for - 4
  • Mean of sampling distribution
    • Same concept of expected value used with population distribution
  • Variance of sampling distribution
    • Use more general formula for variance
    • Later, we’ll use reductions that are easier to calculate
what if we took a srs of size n from n units
What if we took a SRS of size n from N units?
  • List out all possible samples
    • # possible samples:
  • Determine the probability of a sample
  • Calculate estimator for each sample
    • Examples:
  • Create a discrete probability distribution
  • Calculate summary parameters
back to example with srs
Back to example with SRS
  • Number of possible samples
  • Probability of selecting sample
  • Probability distribution: unique values of and relative frequency

c 2.0 3.0 4.0

example mean of sampling distribution for under srs
Example: mean of sampling distribution for under SRS
  • Mean of sampling distribution
  • Mean of population distribution
bias of an estimator
Bias of an estimator
  • Estimation bias of
    • Note that this is the mean of the estimator (from sampling distribution) minus the population parameter (from population distribution)
  • If then is said to be an unbiased estimator of 
variance of sample mean under srs
Variance of sample mean under SRS
  • Don’t have to use the general formula
  • Variance of sample mean (derived stat using theory)
    • Similar to infinite population formula
    • Has an extra factor called the finite population correction factor (FPC)
example
Example
  • Variance of sampling distribution for
  • Other measures of dispersion for sampling distribution
slide76
Finite population correction factor (FPC)
  • Sampling fraction is the proportion of the population sampled, or n/N
  • Larger sample 
    • Larger fraction of population
    • Smaller FPC
    • Smaller variance of sample mean
impact of fpc on estimated variance of parameter estimate
Impact of FPC on estimated variance of parameter estimate
  • Often FPC is very close to 1
    • Sample of 3000 households from total of 1,200,000 households
  • In cases where sampling fraction is very small and FPC is very close to 1, FPC has no practical effect on the SE or estimated variance of the param estimate
  • Sampling fraction n/N is not a good measure of whether your estimate will be precise
  • The sample size n is the most important part of the variance or SE formulas given variance
estimating population variance under srs
Estimating population variance under SRS
  • Do not know variance of population distribution,
  • Unbiased estimator for
  • Estimator for
  • Note thatis the standard error of the sample mean
ag example
Ag example
  • Interested in average number of acres per county devoted to farms
  • Sample 300 counties from list of 3078
  • Collect data and get following summary statistics
  • What are estimated mean and standard error?
rounding rules
Rounding rules
  • Always keep all of the digits while you are doing calculations
  • Round only when you get ready to report the result at the end of the calculation …
    • Round the estimated SE to 2 significant digits
      • 107,789 is rounded to 110,000
      • 0.0325329 is rounded to 0.033
    • Round estimate to precision of the SE
      • If SE is 110,000, round estimate to nearest 10,000 (xx0,000)
      • If SE is 0.033, round estimate to nearest 1/1000 (x.xxx)
    • Estimated variances are usually reported to 5 significant digits
sampling distribution for using srs of size n from n
Sampling distribution for using SRS of size n from N
  • is an unbiased estimator of
    • Mean of sampling distribution is always equal to population mean under SRS
  • Variance of is
  • Estimate the variance of using sample variance s2
sampling distribution of under srs
Sampling distribution of under SRS
  • Mean of for population total t under SRS
  • Expectation of a linear function of a random variable

If a, b are constants & Y , are random variables, then

  • Is an unbiased estimator of t ?
sampling distribution of under srs 2
Sampling distribution of under SRS - 2
  • Variance of estimator of total under SRS
  • Variance of a linear function of a random variable

If a, b are constants & Y , are random variables, then

sampling distribution of under srs 3
Sampling distribution of under SRS - 3
  • Estimator for variance of under SRS
ag example 2
Ag example - 2
  • Estimated total acres devoted to farms in the US in 1992?
  • Estimated Variance of estimated total?
  • Other measures of dispersion for sampling distribution?
    • Estimated SE
sampling distribution of under srs86
Sampling distribution of under SRS
  • Mean of estimator for population proportion p under SRS
  • Is unbiased for p ?
sampling distribution of under srs 287
Sampling distribution of under SRS - 2
  • Variance of sample proportion (derived stat using theory)
    • Very similar to infinite population formula
    • Extra factor arises from finite pop and is NOT the same as the FPC
  • Estimator does have the FPC in the formula
ag example 3
Ag example - 3
  • Suppose we are interested in the proportion of counties with fewer than 200,000 acres devoted to farms in 1992
  • Data from our sample of 300 indicate that 153 counties have less than 200,000 acres devoted to farms
  • Estimated population proportion?
  • Estimated SE of estimated proportion?
quality of estimates fig 2 2 p 29
Quality of estimates (Fig 2.2, p. 29)
  • Estimator under a given design is unbiased
    • On average over a large number of samples, the mean of the estimates “hit” the target population parameter (centered on the bull’s eye)
  • Estimator under a given design is precise
    • Over a large number of samples, estimates will tend to be close to one another, indicating that the variance of the sampling distribution for the estimator is small
    • Clump pattern, but may not be centered on bull’s eye (precise but biased)
  • Estimator under a given design is accurate
    • Estimator comes close to hitting target and is precise
    • Assess this with the mean squared error (MSE)
mean squared error an estimator
Mean Squared Error an Estimator
  • Mean squared error (MSE) of
  • Combines measures of bias and precision to provide an index of the accuracy of an estimator under a given design
    • Sometimes we are willing to accept a little bias to get a more precise estimator, MSE is improved
  • If
mse of srs estimators
MSE of SRS estimators
  • All of these estimators are unbiased under SRS (Bias = 0)
  • So under SRS
confidence intervals
Confidence intervals
  • Estimate variance, SE, CV, MSE of estimator under a design to provide indication of quality of estimate
  • Another approach
    • Estimate a confidence interval to express precision of estimate
book example 2 7 p 35 6
Book example 2.7, p. 35-6
  • True parameter value: t = 40
  • CI of interest:
  • List 70 possible samples of size n = 4
  • Each sample has a probability of selection P(S)
  • For each sample, record value of a variable u that indicates whether CI from sample S includes t = 40
  • Confidence coefficient:
ex 2 assume srswor
Ex – 2: Assume SRSWOR
  • If 60 of the 70 SRSWOR samples resulted in CIs that included the true total, what is the confidence coefficient?
  • What is alpha?
what is a 95 confidence interval ci under srs
What is a 95% confidence interval (CI) under SRS?
  • Heuristic definition
    • Take repeated samples of size n from population of size N
    • Collect data on Y
    • Calculate an estimate of a population parameter using data from n observations
    • Calculate 95% CI for parameter estimate using data from n observations
  • Expect 95% of the CIs to contain the true value of the parameter
interpreting cis in general
Interpreting CIs in general
  • More generally (for any design), a (1-)100% CI has the interpretation
    • There is a (1-)100% chance of selecting a sample for which the CI will include the true population parameter
  • Note
    • The upper and lower limits of the CI are random variables, calculated from the sample data
    • The true parameter value is either included or not included in a single CI
    • Confidence coefficient of a CI has a relative frequency interpretation across samples
confidence interval definition
Confidence interval definition
  • Standard estimator for a (1-)100% confidence interval (CI):
standard normal distribution
Standard normal distribution
  • Z ~ N(0, 1)
    • Z is the random variable
    • Mean E{Z} = 0 and variance V{Z} = 1
  • Two-sided (1-)100% confidence interval
    • Use critical value
infinite vs finite populations
Infinite vs. finite populations
  • In other stat classes …
    • Assume SRS with replacement from infinite pop
    • Justify CI by applying the Central Limit Theorem (CLT)
  • In sample surveys, we have a finite number of possible samples
    • Can calculate exact confidence coefficient 1- for a stated interval (see previous example)
    • In practice, it is not possible to list all possible samples, so we have a special CLT that relies on a “superpopulation” framework
superpopulation framework
Superpopulation framework
  • Asymptotic framework for SRSWOR in finite populations
    • Population is part of a larger superpopulation
    • There is a a series of increasingly larger superpopulations
    • Use superpopulation concept to derive a Central Limit Theorem for SRSWOR
  • Bottom line
    • We will use the standard CI estimator with a different theoretical justification
when is clt justified
When is CLT justified?
  • Confidence coefficient is approximate
    • Quality of approximation depends on n and the distribution of the underlying random variable, Y
    • “n is large enough for CLT” is less clear for finite populations
      • n = 30 rule in other stat classes does NOT apply
  • Rules of thumb
    • If distribution of Y is close to normal, n = 50
    • Need larger n if distribution of Y deviates from normal, e.g., skewed
    • Y categorical: if p is proportion with characteristic of interest, np  5 and n(1-p)  5
determining sample size a general approach
Determining sample size – a general approach
  • Specify tolerable error (level of precision, level of confidence)
  • Identify appropriate equation relating tolerable error (e, ) to sample size (n)
  • Estimate unknown parameters in equation
  • Solve for n
  • Evaluate (and return to first step)
    • Can you afford sample size?
    • What expectations can be altered?
specify tolerable error
Specify tolerable error
  • Two parameters
    • e : margin of error or half-width of CI
    •  : [1-]100% is confidence level
  • Absolute expression (half-width of CI): estimate within e of true pop parameter
  • Relative expression: within 100e% of 
equation linking e and n
Equation linking e, , and n
  • Most common equation is half-width of CI
  • Example: sample mean under SRSWOR

Note for

    • For p , use S2p(1-p)
    • For  = 0.05, use
    • n0 is sample size under SRSWR (ignoring FPC)
estimate unknowns population variance of y s 2
Estimate unknowns: population variance of y, S2
  • Use estimator for variance, s2
    • Pilot study
    • Previous study
      • Careful about comparability
  • Use CV from previous study
    • Careful about comparability
  • Guess variance under normality
    • estimate of S = range for 95% of values / 4
    • estimate of S = range for 99% of values / 6
estimating unknowns population proportion p
Estimating unknowns: population proportion, p
  • Use estimates from pilot or previous study
  • If know nothing of true proportion
    • Use p = 0.5
    • Max possible variance for estimated proportion under SRS, so this is conservative
    • Commonly used
practicalities for determining n
Practicalities for determining n
  • Sampling fraction rarely important
    • Most populations are large enough that sampling fraction n/N is small for practical values of n
  • Subpopulations should influence sample size
  • 95% CI for a proportion ( = 0.05, p = 0.5)
    • Implies
    • n = 400 for e 0.05 (whole sample)
    • n = 100 for e 0.10 (subpopulation)
    • n = 50 for e 0.15 (subpopulation)
    • n = 500 for e 0.04 (little gain over 400)
srs pros and cons
SRS: pros and cons
  • Cons
    • SRS is rarely the “best” design
    • May not have list of all OUs  need different design
    • May have additional info on pop to create a more efficient design (improve precision)
  • Pros / uses
    • Standard stat procedures can be used with little or no bias
    • Mainly interested in regression rather than estimating pop params (ignore sample design – but could still get a better sample)