Probability & Statistical Inference Lecture 4. MSc in Computing (Data Analytics). Lecture Outline. Modern statistics uses a number of mathematical results to relate descriptive statistics and probability theory. These can be divided (roughly) under three headings:
MSc in Computing (Data Analytics)
- Central Limit theorem (large samples)
- Maximum Likelihood Methods (large samples)
- Small sample results
We have 2 different estimates
Note that the histogram become more Normal as the sample size increases
Note the spread decreases with increasing sample size
A sample mean ( ) can be considered a random variable sampled from a probability distribution of possible sample means of the same size called the Sampling Distribution of the Mean.
For a sample size of 2, the standard error of the mean should be
= 3959 / √2 = 2,799
can be considered as:
For a Normal distribution, we know that 95% of values will be within 1.96 Standard deviations of
So, given one estimate we can say that this estimate is within 1.96 standard errors of the actual population mean , with 95% confidence
(from large enough sample):
So, we would say that the average lifetime of all components (μ) is between 4,456 and 7,290 hours with 95% confidence
=> So, we need to estimate as well as
=> we get this estimate from the standard
deviation of the sample
When sample size is
e.g. in the rats experiment – different and unrelated rats should be used – not 1 rat tested 100 times.
Example: if the sample size is 15, then use a t distribution with degrees of freedom 15 − 1=14.
The t probability density function with kdegrees of freedom:
t(n-1, /2) is a value from the t distribution with df=n-1, and with a specified level.
100(1 − )% of values lie within that range around the mean.
Note: as gets smaller then CI gets wider
as df gets smaller then CI gets wider
23.01, 22.22, 22.04, 22.62, 22.59
n(1-p) = 100 * (1-0 .25) = 75
both figures are greater than 5 therefore you can used the large number method