Probability & Statistical Inference Lecture 4. MSc in Computing (Data Analytics). Lecture Outline. Modern statistics uses a number of mathematical results to relate descriptive statistics and probability theory. These can be divided (roughly) under three headings:
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Probability & Statistical Inference Lecture 4
MSc in Computing (Data Analytics)
- Central Limit theorem (large samples)
- Maximum Likelihood Methods (large samples)
- Small sample results
We have 2 different estimates
Distribution of the Sample Means varying the sample size
Note that the histogram become more Normal as the sample size increases
Same result but plotted on same scale
Note the spread decreases with increasing sample size
A sample mean ( ) can be considered a random variable sampled from a probability distribution of possible sample means of the same size called the Sampling Distribution of the Mean.
From the simulation above;
For a sample size of 2, the standard error of the mean should be
= 3959 / √2 = 2,799
can be considered as:
95% in shaded area
For a Normal distribution, we know that 95% of values will be within 1.96 Standard deviations of
So, given one estimate we can say that this estimate is within 1.96 standard errors of the actual population mean , with 95% confidence
(from large enough sample):
So, we would say that the average lifetime of all components (μ) is between 4,456 and 7,290 hours with 95% confidence
=> So, we need to estimate as well as
=> we get this estimate from the standard
deviation of the sample
When sample size is
e.g. in the rats experiment – different and unrelated rats should be used – not 1 rat tested 100 times.
Example: if the sample size is 15, then use a t distribution with degrees of freedom 15 − 1=14.
The t probability density function with kdegrees of freedom:
t(n-1, /2) is a value from the t distribution with df=n-1, and with a specified level.
100(1 − )% of values lie within that range around the mean.
Note: as gets smaller then CI gets wider
as df gets smaller then CI gets wider
23.01, 22.22, 22.04, 22.62, 22.59
n(1-p) = 100 * (1-0 .25) = 75
both figures are greater than 5 therefore you can used the large number method