- 71 Views
- Uploaded on
- Presentation posted in: General

The Meaning of Independence

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

The Meaning of Independence

in Probability and Statistics

Henry Mesa

Use your keyboard’s arrow keys to move the slides

forward (▬►) or backward (◄▬)

Use your keyboard’s arrow keys to move the slides

forward (▬►) or backward (◄▬)

If you want to stop the slide show use the Esc key on your keyboard.

As you view the slides have paper and pencil handy. Take down notes, and when asked to guess at a result do so before going on. If something does not make sense, go back through the slides using the backward (◄▬) key on your keyboard. If the slides do not make sense to you then write down your question and ask your instructor.

Often students view independence as a Cause and Effect issue, as evident when one asks a student if two events are independent, the initial response is to say that “one event has nothing to do with this other event, so they are independent.”

The concept of independence is elusive for students.

Other times confusion sets in between disjoint events and independent events; “The two events are disjoint, so they must be independent,” is often a response.

The reality is that if two events are disjoint, then the events can not be independent.

What follows is an attempt to make the meaning absolutely clear, but, in plain words, independence has to do with a change in probability. More to be said as we continue. We need to make sure we understand the basics of how we can measure a probability.

So what is independence then?

If you were asked what is the probability of throwing a fair die and having a three appear you would not hesitate to say one-sixth.

Suppose that you are in a classroom with 20 people, and you are told that four were born on your birthdate (month and day). What is the probability of choosing a person at random and having that person share your birthday?

I am sure you would not have any trouble saying that it is one-fifth.

In both cases you made some major assumptions. For the die you assumed that any side is equally likely, after all, it was a fair die. You also realized that there are six sides of which only one contains a three. Thus,

For the class room situation, you again used the same logic as the die problem. There are twenty people and four share my birth date.

The key is the sample space. The sample space contains all outcomes of some random phenomena. For the die there are six items in the sample space, all equally likely thus,

For the class room situation, there are twenty items in the sample space, of which four meet your criteria, thus,

The sample space represents the whole, everything that can occur when viewing a random phenomena. Think about what a fraction can represent.

Is this important in order to understand independence?

YES!

Why? Because, the concept of independence depends on the sample space.

Here is how independence is going to be explained. What is the chance of rolling a three when you roll two dice and sum up each outcome?

Now ask the same question, “what is the chance of rolling a three,” if you roll three dice?

Why? Because, the concept of independence depends on the sample space.

Here is how independence is going to be explained. What is the chance of rolling a three when you roll two dice and sum up each outcome?

Now ask the same question, “what is the chance of rolling a three,” if you roll three dice?

While both are asking the same question the sample space has changed; in one scenario you are tossing two dice (all possible outcomes of two dice), and in the other three dice. And this is at the heart of the concept of independence.

You ask question based on a particular sample space. Now change the sample space, and ask the same question. If the probability stays the same then we have independence!

Why? Because, the concept of independence depends on the sample space.

Here is how independence is going to be explained. What is the chance of rolling a three when you roll two dice and sum up each outcome?

Now ask the same question, “what is the chance of rolling a three,” if you roll three dice?

If the answer to both questions had been, for example 0.3, then it does not matter that I added another die. While the sample space changed (all possible combinations of three dice) the probability has not changed.

By the way, the probability does change; 0.0556, versus, 0.00463. Which means that we don’t have independence.

The concept of independence depends on a change in probability when we change the sample space.

P(event A) = a

P(event A in a different sample space) = ?

If

P(event A in a different sample space) = P(event A)

We have independence.

Organization in statistics is vital to properly communicate your meaning with others as well as to communicate with yourself. Yes, yourself. Have you ever written something down that was very clear as you were writing it, then hours later, when visiting those same notes you are confused as to the meaning of your writing?

Thus, we need notation to clearly denote when we have switched sample space. In everyday writing this occurs all the time, and it is up to the reader to comprehend when a change in sample space has occurred.

“Ten percent of the adult women in Texas are infected with the human papilloma virus (HPV). Of the 18-24 group, 25% of the women are infected with HPV.”

Notice that the first probability (proportion), the 10%, concerns adult women in Texas.

However, the second probability does not concern all adult women in Texas, but a subgroup of the original group, which consist of women aged 18 to 24 years.

First sample space is all adult women in Texas; P(HPV) = 0.1

The second sample space concerns all adult women in Texas in the age group 18-24; P(HPV) = 0.25.

To denote that there has been a change in sample space with respect to the original probability, I will use this notation called conditional probability notation.

First sample space is all adult women in Texas; P(HPV) = 0.1

The second sample space concerns all adult women in Texas in the age group 18-24. P(HPV) = 0.25.

P(HPV) = 0.1

P(HPV | 18 – 24) = 0.25

<= Conditional Probability

The vertical line can be read “given that.” What it does is alert the reader that the group (sample space) that was the focus recently has changed. The question has not changed but the group has. The vertical line signals the required condition (group change/sample space change) for the question.

The notation P(A | B) is conditional probability notation. It states the probability of event A given that we are now only considering the sample space that is defined by event B.

“I think the New York Yankees have a 70% chance this year of making the World Series.” comments Bob. “Haven’t you heard?” exclaims Tanya. “Derek Jeter, and Alex Rodriguez are both out of the line up for the entire season! I give then a 30% chance.

Both people are giving the odds of the Yankees making the World Series, but both are speaking from two different perspectives (sample space). I could encode the first speakers probability statement as

P(make W.S. ) = 0.70

But to denote a change in sample space for the second speaker I can use conditional notation.

P(make W.S. | No DJ and No AR) = 0.3.

The first speaker assumed that both of the mentioned players are in the lineup but the second probability makes it clear they are not in the line up.

So how does the conditional probability notation help in understanding what is independence?

If P(A | B) = P(A) then we have independence between events A and B. Also, if the above is true, so is P(B | A) = P(B).

What!?!!!!

What the above notation says, is that if P(A) = 0.7, for example, and P(A | B) “the probability of A but from the perspective of the sample space named B,” is also P(A | B) =0.7 our probability of event A has not changed even though we changed sample space.

P(A | B) = P(A)

And thus the events are independent.

Note that a sample space can also be an event. A sample space is defined by the user, just like an event. If I toss a die, but I decide to ignore whenever a 1 shows up, then my sample space is {2, 3, 4, 5, 6}.

If P(A | B) = P(A) then we have independence between events A and B. Also, if the above is true, so is P(B | A) = P(B).

It seems crazy, arbitrary to eliminate an actual possibility but people do this all the time; “If the ball lands beyond this line, it does not count.”

Please, trust me. We are on a journey of discovery, and discovery takes time. What we need is another simple example to start putting these ideas together.

Consider a standard deck of 52 cards.

There are four suits: diamond , Clubs , Hearts , and Spades . Each suit consist of 13 cards.

Now, here is the first question.

I choose a card out of shuffled deck randomly. I don’t let you see it, but I ask you “What is the probability that the card I hold is an ace?”

Since you do not know any better, you would say P(ace) = 4/52, since there are four aces, out of a deck of 52 cards. You are assuming that any of the cards on the deck are equally likely to be chosen.

I peak at my card, and I tell you I am going to give you a hint. The card I hold is a diamond card. By saying this, we have established that we are no longer considering all 52 cards; the sample space has changed from having a deck of 52 cards to a deck of 13 cards all diamond cards.

Consider a standard deck of 52 cards.

There are four suits: diamond , Clubs , Hearts , and Spades . Each suit consist of 13 cards.

Now, here is the first question.

I choose a card out of shuffled deck randomly. I don’t let you see it, but I ask you “What is the probability that the card I hold is an ace?”

Since you do not know any better, you would say P(ace) = 4/52, since there are four aces, out of a deck of 52 cards. You are assuming that any of the cards on the deck are equally likely to be chosen.

I peak at my card, and I tell you I am going to give you a hint. The card I hold is a diamond card. By saying this, we have established that we are no longer considering all 52 cards; the sample space has changed from having a deck of 52 cards to a deck of 13 cards all diamond cards.

Using notation, we have P(ace | diamond) = ?

But wait, has the hint helped at all? P(ace | diamond) = 1/13 which is the same as 4/52. I have changed the sample space but my probability has not changed!

Consider a standard deck of 52 cards.

There are four suits: diamond , Clubs , Hearts , and Spades . Each suit consist of 13 cards.

What does this mean! WE HAVE INDEPENDENCE!

The event, “a card is an ace” and the event, “a card is a diamond card,” are independent.

Using notation, we have P(ace | diamond) = P(ace)

Also note, P(diamond) = 13/52 = ¼, but P(diamond | ace) = ¼.

If P(A | B) = P(A) then we have independence. Also, if the above is true, so is P(B | A) = P(B).

What does this mean at a practical level? It means the new information did not change my odds, and this is very important!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Consider the following exchange.

A glum looking man, and a woman walk into a doctors office. The woman suddenly blurts out, my husband has HIV!

The doctor is taken by surprise. He asks what makes you think so?

He took an over the counter HIV test and it came out positive.

Now the doctor is familiar with this test, and he knows that it produces positive results 75% of the time if you have HIV and it also produces positive result 4% of the time if you do not have HIV.

P(positive result | HIV) = 0.75 P(positive result | not HIV) = 0.04

Notice that both positive results involve two different populations (sample spaces); people with HIV and people without HIV.

Now the HIV rate for all adults in the county that the man is from has a HIV rate of 0.001 a very low rate.

P(positive result | HIV) = 0.75 P(positive result | not HIV) = 0.04

Now the HIV rate for all adults in the county that the man is from has a HIV rate of 0.001 a very low rate.

So the doctor will attempt to find if the man is from a special group within the county. That is, the doctor knows that if someone walks in from the street at random they have a 0.001 chance of being infected. But are all the groups the same?

The wife adds trying out to be helpful, my husband bowls regularly. Now the doctor looks puzzled at that revelation since he can not think why that is relevant. In other words the doctor is going to see if the man in front of him is from a high risk group thus tipping the probability of 0.001 to a higher level and thus making the result of a positive test more meaningful. But bowling is not what the doctor has in mind..

P(positive result | HIV) = 0.75 P(positive result | not HIV) = 0.04

Now the HIV rate for all adults in the county that the man is from has a HIV rate of 0.001 a very low rate.

As a matter of fact the doctor is thinking that if he is a regular bowler the chance of having HIV is 0.001 the same as the population. P(HIV | regular bowler) = 0.001. That is having “HIV” and being a “regular bowler” are independent events. The proportion of HIV cases among those that bowl regularly is the same as the population those bowlers come from. Notice that being told that the person bowls regularly did not add any more information (change in probability); no change in the probability while changing the sample space results in independence.

P(positive result | HIV) = 0.75 P(positive result | not HIV) = 0.04

- The doctor was thinking along the lines of some very potentially embarrassing questions which would put the man in a high risk group and thus give more credibility to the positive test result.
- Does the husband or wife engage in extramarital affairs.
- Does the husband use drugs that involve injection and the potential for using contaminated needles.
- In other words the doctor is thinking along the lines of putting the man in a high risk group (change the man’s grouping is the same as changing the sample space.)

We need another example to help us better understand independence. Suppose a virus is affecting a community. Out of 200,000 people 40000 are affected.

A virus is affecting a community. Out of 200,000 people 40000 are affected.

What is the probability that someone is infected?

P(infected) =

= 0.2

Thus, 20% of the population is infected with the virus.

Suppose we further broke down those that are infected according to their age.

A virus is affecting a community. Out of 200,000 people 40000 are affected.

P(infected) =

= 0.2

Suppose that a person is 18 – 30 years of age. What is the probability that this person is infected?

Notice that the question suggests that the sample space has changed!

Thus, I will use the correct function notation to describe the question.

P(infected | 18 – 30)

P(infected) =

= 0.2

Suppose that a person is 18 – 30 years of age. What is the probability that this person is infected?

Notice that the sample space is not the entire 200000. We are told the person is 18 – 30 years of age. Thus, the population has been reduced to the 50,000 people in that age group.

P(infected | 18 – 30) =

The probability continues to be 0.2, no change from P(infected)

= 0.2

P(infected) =

= 0.2

Suppose that a person is 18 – 30 years of age. What is the probability that this person is infected?

Notice that the sample space is not the entire 200000. We are told the person is 18 – 30 years of age. Thus, the population has been reduced to the 50,000 people in that age group.

P(infected | 18 – 30) =

The probability continues to be 0.2, no change from P(infected)

= 0.2

P(infected) =

P(infected | 18 – 30) =

= 0.2

= 0.2

This means that the event, being an 18-30 year old, is independent of being infected. That is 18-30 year olds get infected at the same rate as the entire population.

We changed the sample space from the entire population of 200,000 to the 50,000 within the 200,000.

The questions that follow are all of that same type.

A person is chosen at random from this population. What is the probability that the person is under 18? Try and find the answer first before continuing.

P(under 18) =

= 0.17

An infected person is chosen at random. What is the probability that this person is under 18? Try and find the answer first before continuing.

P(under 18 | infected) =

= 0.17

Since P(under 18 | infected) = P(under 18) we have independence between the two events. This implies that the percentage of infected under 18 year olds is the same as the population of under 18 year olds in the population.

Lets compare this following question with the two previous questions.

What is the probability that a person is chosen at random is a 30 to 50 year old that is not infected? Attempt to write the question using function notation with the correct conjunction.

P(30 – 50 AND Not Infected) =

= 0.312

Notice that in this question we are not assuming either event is occurred. The sample space continues to be the original population.

The table below shows the class of passenger aboard the Titanic and who survived the accident.

Is the survival rate on the Titanic independent of passenger class?

To answer the question lets restate it more specifically. Is the survival rate independent of being a first class passenger for example? One way to answer this is to show that P(Alive) = P(Alive | First) or P(First | Alive) = P(First). If one is true so is the other. Try and answer this on your own first.

This says that about 32.26% of the passengers on the Titanic survived; roughly one-third.

P(Alive) =

≈ 0.3226

The second result says that 62.15% of the first class passengers survived. Clearly, we do not have independence. The chance of surviving on the Titanic was better if you were a first class passenger. Notice that both questions concern surviving the ship accident, but on the second question we have changed the sample space.

P(Alive | First) =

≈ 0.6215

The table below shows the class of passenger aboard the Titanic and who survived the accident.

Lets ask a similar question again.

Are the events a person is Alive and a person is a Third class passenger independent? Try and answer the question on your own first.

This says that about 32.26% of the passengers on the Titanic survived; roughly one-third.

P(Alive) =

≈ 0.3226

It turns out of the third class passengers (new sample space) only 25.21% survived. Clearly, the events are not independent. While aboard the Titanic about 1/3 survived, only 25% of the third class survived.

P(Alive | Third) =

≈ 0.2521

The table below shows the class of passenger aboard the Titanic and who survived the accident.

Does it matter what I make the new sample space?

No, let’s answer the same question again.

Are the events a person is Alive and a person is a Third class passenger independent?

This says that about 32.08% of the passengers on the Titanic were Third class; roughly one-third.

P(Third) =

≈ 0.3208

So, if you started to interview a survivor of the Titanic at random, there would be a roughly ¼ chance that this person was from third class. The result is the same. We do not have independence.

P(Third| Alive) =

≈ 0.2507

From the examples, you can see that not having independence has to do with a change in probability of some given event due to change in the sample space.

We ask a probability question, and get a response. Change the sample space and ask the same question. If the probability does not change, then we have independence.

If P(A | B) = P(A) then we have independence. Also, if the above is true, so is P(B | A) = P(B).

If the probability does change P(A | B) ≠ P(A) then we do not have independence.

Why is independence important?

For probability theory, it enables us to calculate probabilities in a different manner.

For Statistics having independence indicates that no new information is provided by changing sample space.

«The Effect of Country Music on Suicide»(S. Stack and J. Gundlach; Wayne State University and Auburn University; 1992)"The greater the airtime devoted to country music, the greater the white suicide rate"

According to the authors, Steven Stack and Jim Gundlach, the paper "assesses the link between country music and metropolitan suicide rates. Country music is hypothesized to nurture a suicidal mood through its concerns with problems common in the suicidal population, such as marital discord, alcohol abuse, and alienation from work. The results of a multiple regression analysis of 49 metropolitan areas show that the greater the airtime devoted to country music, the greater the white suicide rate. The effect is independent of divorce, southernness, poverty, and gun availability. The existence of a country music subculture is thought to reinforce the link between country music and suicide.

Notice that the second to last sentence attempts to change the sample space to see if the suicide rate changes. But the authors suggest that the new sample spaces (divorced people, or living in a Southern state, or classified as living in Poverty, or owning a gun) did not alter the rate (probability).

The End

To make the most of these slides, read your text, attempt some problems and regardless of how you do in those homework problems view these slides again. Obviously if you did well in the homework assignments you will feel comfortable with the ideas presented, but make sure that you actively view these slides; paper and pencil in hand. You should be able to anticipate the answers to the questions posed. If you did not do well in the homework, then see if a missing part of your understanding can be found in the slides; you should write down as specifically as possible what you believe is missing in your understanding. What is it that you do not understand.