Fundamentals of Sampling Method

1 / 37

# Fundamentals of Sampling Method - PowerPoint PPT Presentation

Fundamentals of Sampling Method. Week 4 Research Methods & Data Analysis. Tutorials. Thursday 30 th October 9-11 AG GL 20 (M. Mazzocchi) Tuesday 4 th November 11-1pm (H.Neeliah) You may attend: One (the most convenient for you) Both (it may be very useful) None (not really advised…).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Fundamentals of Sampling Method' - valerie

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Fundamentals of Sampling Method

Week 4

Research Methods & Data Analysis

Research Methods & Data Analysis

Tutorials
• Thursday 30th October9-11 AG GL 20 (M. Mazzocchi)
• Tuesday 4th November11-1pm (H.Neeliah)
• You may attend:
• One (the most convenient for you)
• Both (it may be very useful)

Research Methods & Data Analysis

Lecture outline
• Key notions of statistics
• Simple random sampling
• Sampling error
• Sampling size
• Other sampling methods

Research Methods & Data Analysis

Distributions
• A set of values of a set of data together with their
• Absolute frequencies
• Relative frequencies (probabilities)

Research Methods & Data Analysis

Relative and cumulate frequencies

fi=ni/N

Research Methods & Data Analysis

Distributions of random variables
• The distribution of possible values together with their probabilities (probability density function, p.d.f.)

Research Methods & Data Analysis

The normal (Gaussian) distribution
• …is the distribution representing perfect randomness around a mean value
• In statistics, the normal distribution play a key role in the theory of errors
• The central limit theorem implies that “averaging” almost always give origin to a normal distribution (error on the average is random), provided that the number of observation is large (>40)

Research Methods & Data Analysis

The normal distribution

p

95% of values

0,025

0,025

m-1.96s

m

m+1.96s

Research Methods & Data Analysis

The student-t distribution
• When the parameter in the population has a normal distribution (with unknown variance), within the sample the parameter assumes a t distribution
• The t-distribution is similar to the normal distribution, apart from having higher tail-probabilities
• The bigger is the sample, the more similar the t-distribution is to the normal distribution
• For samples with more than 30-40 units, the difference between the two distributions is negligible

Research Methods & Data Analysis

The t-distribution

x-ta/2sx

x

x+ta/2sx

Research Methods & Data Analysis

ta/2 and za/2 – tabled values

Research Methods & Data Analysis

Population parameters(in a population of N elements)
• Mean
• Variance
• Standard deviation

Research Methods & Data Analysis

Sampling
• A sample is a subgroup of the population selected for the study
• Sample statistics allow to make inference about the population parameters, through estimation and hypothesis testing
• The sample space is a complete set of all possible results of the sampling procedure

Research Methods & Data Analysis

Simple random sampling
• Each element of the population has a known and equal probability of selection
• Every element is selected independently from other elements
• The probability of selecting a given sample of n elements is computable (known)
• The Central Limit Theorem guarantees that for simple random samples with sample size (n) sufficiently large (>40), the sample mean in a S.R.S. follows the normal distribution

Research Methods & Data Analysis

Sample statistics
• Sample mean
• Sample variance
• Sample standard deviation

unbiasedness

Research Methods & Data Analysis

Standard deviation and standard error
• The standard deviation measures the variability of a given variable (e.g. X) within the population or sample
• The standard error refers to the accuracy (variability) of the sample statistics (e.g. mean), i.e. the error due to the fact that the statistic is computed on a sample rather than on the population (sampling error)

Research Methods & Data Analysis

Basic SRS sample statistics (unknown pop. variance)

Mean case

Proportion case (p)

Sample standard deviation of X

Standard error of the mean/proportion

ACCURACY of sample estimates

Research Methods & Data Analysis

Finite population correction factor
• For finite population (…i.e. all in social research), large samples (more than 10% of N) tend to overestimate the standard error of the sample mean (proportion)
• In order to account for that, the following correction is necessary

Research Methods & Data Analysis

Level of confidence aand z parameter

The level of confidence a refers to the probability that the true population mean falls in the identified confidence interval

For the normal distribution, given a value of a, the corresponding za/2values is tabulated

a=0.05

za/2 =1.96

a/2

a/2

x

Confidence interval for x at a level of confidence a

Research Methods & Data Analysis

The t-distribution

x-ta/2sx

x

x+ta/2sx

Research Methods & Data Analysis

Confidence intervals
• Calculate the sample mean
• Decide a level of confidence (usually 95% or 99%)
• Choose whether using the Student-t distribution or the Normal distribution
• Compute the sample standard error
• Define the lower and upper bound of the confidence interval

Research Methods & Data Analysis

Exercise
• Suppose that you have interviewed 20 students out of 200 in the agricultural building, asking them how much they paid for lunch yesterday
• You get an average of £ 3.67
• The standard deviation is 1.25
• Compute the 95% confidence interval
• Compute the 99% confidence interval

Research Methods & Data Analysis

Determining sample size

Factors influencing sample size (n):

• Size of the population (N)
• Variability of the population (s)
• Desired level of accuracy (q)
• Level of confidence (a)
• Budget constraint

Research Methods & Data Analysis

Simple random sampling: determining sample size
• Relative sampling error (r.s.e)
• Determining sampling size for a given r.s.e. (approximate formula)

Research Methods & Data Analysis

The sampling design process
• Define the target population, its elements and the sampling units
• Determine the sampling frame (list)
• Select a sampling technique
• Sampling with/without replacement
• Probability/Nonprobability sampling
• Determine the sample size
• Precision versus costs
• The marginal value in terms of precision of additional sampling units is decreasing
• Execute the sampling process

Research Methods & Data Analysis

The sampling techniques
• Probabilistic samples
• Simple random sampling
• Systematic sampling
• Stratified sampling
• Cluster sampling
• Other sampling techniques
• Nonprobabilistic samples
• Convenience sampling
• Judgmental sampling
• Quota sampling
• Snowball sampling

Research Methods & Data Analysis

Representativeness
• A sample can be considered as “representative” when it is expected to exhibit the average properties of the population

Research Methods & Data Analysis

Selection bias
• Improper selection of sample units (ignoring a relevant “control variable” that generate bias), so that the values observed in the sample are biased and the sample is not representative.

Example:

A survey is conducted for measuring goat milk consumption, but the interviewers just select people in urban areas, that on average drink less goat milk.

Research Methods & Data Analysis

Simple random sampling
• Each element of the population has a known and equal probability of selection
• Every element is selected independently from other elements
• The probability of selecting a given sample of n elements is computable (known)
• Statistical inference is possible
• It is easily understood
• Representative samples are large and expensive
• Standard errors are larger than in other probabilistic sampling techniques
• Sometimes it is difficult to execute a really random sampling

Research Methods & Data Analysis

Systematic sampling
• A list of N elements in the population is compiled, ordered according to a specified variable
• Unrelated to the target variable (similar to SRS)
• Related to the target variable (increased representativeness)
• A sampling size n is chosen
• A systematic step of k=N/n is set
• A random number s between 1 and N is extracted and represents the first element to be included
• Then the other elements selected are s+k, s+2k, s+3k…
• Cheaper and easier than SRS
• More representative if order is related to the interest variable (monotone)
• Sampling frame not always necessary
• Less representative (biased) if the order is cyclical

Research Methods & Data Analysis

Stratified sampling
• Population is partitioned in strata through control variables (stratification variables), closely related with the target variable, so that there is homogeneity within each stratum and heterogeneity between strata
• A simple random sampling frame is applied in each strata of the population
• Proportionate sampling: size of the sample from each stratum is proportional to the relative size of the stratum in the total population
• Disproportionate sampling: size is also proportional to the standard deviation of the target variable in each stratum
• Gains in precision
• Include all relevant subpopolation even if small
• Stratification variables may not be easily identifiable
• Stratification can be expensive

Research Methods & Data Analysis

Cluster sampling
• The population is partitioned into clusters
• Elements within the cluster should be as heterogeneous as possible with respect to the variable of interests (e.g. area sampling)
• A random sample of clusters is extracted through SRS (with probability proportional to the cluster size)
• 2a. All the elements of the cluster are selected (one-stage)
• 2b. A probabilistic sample is extracted from the cluster (two-stage cluster sampling)
• Reduced costs
• Higher feasibility
• Less precision
• Inference can be difficult

Research Methods & Data Analysis

### Non probabilistic samples

Research Methods & Data Analysis

Convenience sampling
• Only “convenient” elements enter the sample
• Cheapest method
• Quickest method
• Selection bias
• Non representativeness
• Inference is not possible

Research Methods & Data Analysis

Judgmental sampling
• Selection based on the judgment of the researcher
• Low cost
• Quick
• Non representativeness
• Inference is not possible
• Subjective

Research Methods & Data Analysis

Quota sampling
• Define control categories (quotas) for the population elements, such as sex, age…
• Apply a “restricted judgmental sampling”, so that quotas in the sample are the same of those in the population
• Cheapest method
• Quickest method
• There is no guarantee that the sample is representative (relevance of control characteristic chosen)
• Many sources of selection bias
• No assessment of sampling error

Research Methods & Data Analysis

Snowball sampling
• A first small sample is selected randomly
• Respondents are asked to identify others who belong to the population of interests
• The referrals will have demographic and psychographic characteristics similar to the referrers
• Lower costs
• Low variability
• Useful for “rare” populations
• Inference is not possible

Research Methods & Data Analysis