Introduction to Statistical Inference

Introduction to Statistical Inference (Session 03)

Learning Objectives By the end of this session, you will be able to • explain what is meant by statistical inference • explain what is meant by an estimate of a population parameter • explain what is meant by the sampling distribution of an estimate • calculate and interpret the standard error of a sample mean from data of a simple random sample

What is statistical inference? • Inference is about drawing conclusions about population characteristics using information gathered from the sample • It will be assumed for the remainder of this module that the sample is representative of the population • We shall further assume that the sample has been drawn as a simple random sample from an infinite population

Estimating population parameters • Population characteristics (parameters) are unknown, so use greek letters to denote population mean and standard deviation • Sample characteristics are measurable and known, so use latin letters. They form estimates of the population values.

An example of statistical inference • What is the mean land holding size owned by rural households in district Kilindi in the Tanga region of Tanzania? • Data from 404 households surveyed in this district gave a mean land holding size of 7.62 acres with a standard deviation 6.81. • Our best estimate of the mean landholding size in Kilindi district is therefore 7.62 acres. What results are likely if we sampled again with a different set of households?

A brief return to Practical 2… • In practical 2, you sampled 5 Uganda districts twice. Look back at the mean and standard deviation of each sample. • You will notice the answers are different each time you sample, i.e. there is variability in the sample means. • If we took many more samples, we could produce a histogram of the means of these samples. An example follows…

The distribution of means • Suppose 10 University students were given a standard meal and the time taken to consume the meal was recorded for each. • Suppose the 10 values gave: mean = 11.24, with std.dev.= 0.864 • Let’s assume this exercise was repeated 50 times with different samples of students • A histogram of the resulting 500 obs. appears below, followed by a histogram of the 50 means from each sample

Histogram of raw data The data appear to follow a normal distribution

Histogram of the 50 sample means The distn of the sample means is called its Sampling Distribution Notice that the variability of the above distn is smaller than the variability of the raw data

Back to estimation… The estimate of the mean landholding size in Kilindi district is 7.62 acres. Is this sufficient for reporting purposes, given that this answer is based on one particular sample? What we have is an estimate based on a sample of size 404. But how good is this estimate? We need a measure of the precision, i.e. variability, of this estimate…

Sampling Variability • The accuracy of the sample mean as an estimate of  depends on: • the sample size (n) • since the more data we collect, the more we know about the population, and the • (ii) inherent variability in the data 2 • These two quantities must enter the measure of precision of any estimate of a population parameter. We aim for high precision, i.e. low standard error!

Standard error of the mean Precision of as estimate of  is given by: the standard error of the mean. • Also written as s.e.m., or sometimes s.e. Estimate using sample data: s/n For example on landholding size, s.e.=6.81/404 = 6.81/20.1 = 0.339

Summary If we had repeated samples (same size) taken from the same population: • sample means would vary • standard error of the mean is a measure of variability of sample means over (hypothetically drawn) repeated samples • distribution of sample means over repeated samples is called the sampling distribution of the mean, ~ N(, 2/n) • The lower the value of the standard error, the greater is the precision of the estimate

References SSC (2000b) Confidence and Significance: Key Concepts of Inferential Statistics. Statistical Guidelines Series supporting DFID Natural Resources Projects, Statistical Services Centre, The University of Reading, UK. www.reading.ac.uk/ssc/publications/guides.html Owen, F. and Jones, R. (1990). Statistics. 3rd edn. Pitman Publishing, London, pp 480. Clarke, G.M. and Cooke, D. (2004). A Basic Course in Statistics. 5th edn. Edward Arnold.

Practical work follows to ensure learning objectives are achieved…

Introduction to Statistical Inference