GG 313 Lecture 6. Probability Distributions.
When we sample many phenomena, we find that the probability of occurrence of an event will be distributed in a way that is easily described by one of several well-known functions called “probability distributions”. We will discuss these functions and give examples of them.
The uniform distribution is simple, it’s the probability distribution found when all probabilities are equal. For example: Consider the probability of throwing an “x” using a 6-sided die: P(x), x=1,2,3,4,5,6. If the die is “fair”, then P(x)=1/6, x=1,2,3,4,5,6.
This is a discrete probability distribution, since x can only have integer or fixed values.
If we add up the probabilities for all x, the sum = 1:
In most cases, the probability distribution is not uniform, and some events are more likely than others.
For example, we may want to know the probability of hitting a target x times in n tries. We can’t get the solution unless we know the probability of hitting the target in one try: P(hit)=p. Once we know P(hit), we can calculate the probability distribution:
where q=1-p. This is the number of combinations of n things taken x at a time, nCx.
Recall that this is the probability where the ORDER of the x hits matters, which is not what we want. We want the number of permutations as defined earlier:
This is known as the binomial probability distribution, used to predict the probability of success in x events out of n tries.
Using our example above, what is the probability of hitting a target x times in n tries if p=0.1 and n=10?
% binomial distribution
sum(px) % check to be sure sure that the sum =1
The probability of 1 hit is 0.38 with p=0.1
What if we change the probability of hitting the target in any one shot to 0.3. What is the most likely number of hits in 10 shots?
Continuous populations, such as the temperature of the atmosphere, depth of the ocean, concentration of pollutants, etc., can take on any value with in their range. We may only sample them at particular values, but the underlying distribution is continuous.
Rather than the SUM of the distribution equaling 1.0, for continuous distributions the integral of the distribution (the area under the curve) over all possible values must equal 1.
P(x) is called a probability density function, or PDF.
Because they are continuous, the probability of any particular value being observed is zero. We discuss instead the probability of a value being between two limits, a- and a+:
As 0, the probability also approaches zero.
We also define the cumulative probability distribution giving the probability that an observation will have a value less than or equal to a. This distribution is bounded by 0≤p(x) ≤ 1.
As a, Pc(a) 1.
While any continuous function with an area under the curve of 1 can be a probability distribution, but in reality, some functions are far more common than others. The Normal Distribution, or Gaussian Distribution, is the most common and most valued. This is the classic “bell-shaped curve”. It’s distribution is defined by:
Where µ and are the mean and standard deviation defined earlier.
We can make a Matlab m-file to generate the normal distribution:
% Normal distribution
Or, make an Excel spreadsheet plot:
We can define a new variable that will normalize the distribution to =1 and =0:
And the defining equation reduces to:
The values on the x axis are now equivalent to standard deviations: x=±1 = ±1 standard deviation, etc.
This distribution is very handy. We expect 68.27% of our results to be within 1 standard deviation of the mean, 95.45% to be within 2 standard deviations, and 99.73% to be within 3 standard deviations. This is why we can feel reasonably confident about eliminating points that are more than 3 standard deviations away from the mean.
Erf(x) is called the error function.
For any value z,
Example: Estimates of the strength of olivine yield a normal distribution given by µ=1.0*1011 Nmand =1.0 *1010 Nm. What is the probability that a sample estimate will be between 9.8*1010 Nm and 1.1*1011 Nm?
First convert to normal scores
And calculate with the formula above.
DO THIS NOW, either in Excel or Matlab -
The normal distribution is a good approximation to the binomial distribution for large n (actually, when np and (1-p)n >5). The mean and standard deviation of the binomial distribution become:
I had a difficult time getting this to work in Excel because the term -(x-np)2 is evaluated as (-(x-np))2
One way to think of a Poisson distribution is that it is like a normal distribution that gets pushed close to zero but can’t go through zero. For large means, they are virtually the same.
The Poisson distribution is a good approximation to the binomial distribution that works when the probability of a single event is small but when n is large.
Is the rate of occurrence. This is used to evaluate the probability of rare events.
Note that the Poisson distribution approaches the normal distribution for large .
Example: The number of floods in a 50-year period on a particular river has been shown to follow a Poisson distribution with =2.2. That is the most likely number of floods in a 50 year period is a bit larger than 2.
What is the probability of having at least 1 flood in the next 50 years?
The probability of having NO floods (x=0) is e-2.2, or 0.11.
The probability of having at least 1 flood is (1-P(0))=0.89.
% exponential distribution
lambda=.5 % lambda=rate of occurrence
Example: The height of Pacific seamounts has approximately an exponential distribution with Pc(h)=1-e-h/340, where h is in meters. Which predicts the probability of a height less than h meters. What’s the probability of a seamount with a height greater than 4 km?
Pc(4000)=1-e-4000/340which is approximately 0.99999 so the probability of a large seamount is 0.00001. (Which I don’t believe….)
In some situations, distributions are greatly skewed, a situation seen in some situations, such as grain size distributions and when errors are large and propagate as products, rather than sums.
In such cases, taking the log of the distribution may result in a normal distribution. The statistics of the normal distribution can be obtained and exponentiated to obtain the actual values of uncertainty.