Loading in 5 sec....

Clustering and Probability (Chap 7)PowerPoint Presentation

Clustering and Probability (Chap 7)

- By
**gwyn** - Follow User

- 130 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Clustering and Probability (Chap 7)' - gwyn

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Review from Last Lecture

Defined the K-means problem for formalizing the notion of clustering.

Discussed the K-means algorithm.

Noted that the K-means algorithm was “quite good” in discovering “concepts” from data (based on features).

Noted the important distinction between “attributes” and “features”.

Example of K-means -1

Let initial centroids be C1 = (1,1) and C2 = (2,1)

Example of K-means-2

C1: = (1,1); C2 = ((2+3+4)/3, (1+4+5)/3)) = (3,3.33)

Example of K-means-3

C1: = ((1+2)/2, (1+1)/2) = (1.5,1); C2 = ((3 +4)/2), (4+5)/2) = (3.5, 4.5)

Example of K-means-4

C1: = ((1+2)/2, (1+1)/2) = (1.5,1); C2 = ((3 +4)/2), (4+5)/2) = (3.5, 4.5)

Example: 2 Clusters

A(-1,2)

B(1,2)

c

c

4

(0,0)

C(-1,-2)

D(1,-2)

c

c

2

K-means Problem: Solution is (0,2) and (0,-2) and the clusters are {A,B} and

{C,D}

K-means Algorithm: Suppose the initial centroids are (-1,0) and (1,0) then

{A,C} and {B,D} end up as the two clusters.

Several other issues regarding clustering

How do you select the initial centroids?

How do you select the right number of clusters ?

How do you deal with non-Euclidean distance/similarity measures ?

Other approaches (like hierarchical, spectral etc.)

Curse of high-dimensionality.

Question

What should the “prediction” be for the flower ?

Prediction and Probability

- When we make predictions we should assign “probabilities” with the prediction.
- Examples:
- 20% chance it will rain tomorrow.
- 50% chance that the tumor is malignant.
- 60% chance that the stock market will fall by the end of the week.
- 30% that the next president of the United States will be a Democrat.
- 0.1% chance that the user will click on a banner-ad.

- How do we assign probabilities to complex events.. using smart data algorithms…and counting.

Probability Basics

- Probability is a deep topic…..but for most cases the rules are straightforward to apply..
- Terminology
- Experiment
- Sample Space
- Events
- Probability
- Rules of probability
- Conditional Probability
- Bayes Rule

Probability: Sample Space

- Consider an experimentand let S be the space of possible outcomes.
- Example:
- Experiment is tossing a coin; S={h,t}
- Experiment is rolling a pair of dice: S={(1,1),(1,2),…(6,6)}
- Experiment is a race consisting of three cars: 1,2 and 3. The sample space is {(1,2,3),(1,3,2),(2,1,3),(2,3,1),(3,1,2),(3,2,1)}

Probabilities

Let Sample Space S = {1,2,…m}

Consider numbers

pi is the probability that the outcome of the experiment is i.

Suppose we toss a fair coin. Sample space is S={h,t}. Then ph = 0.5 and pt = 0.5.

Probability

- Experiment: Will it rain or not in Sydney : S = {rain, no-rain}
- Prain = 138/365 =0.38; Pno-rain = 227/365

- Assigning (or rather how to) probabilities is a deep philosophical problem.
- What is the probability that the “green object standing outside my house is a burglar dressed in green.”

Probability

- An Event A is a set of possible outcomes of the experiment. Thus A is a subset of S.
- Let A be the event of getting a seven when we roll a pair of dice.
- A = {(1,6),(6,1),(2,5),(5,2),(4,3),(3,4) }
- P(A) = 6/36 = 1/6

- In general

Probability

- The sample space S and events are “sets”.
- P(S) = 1;
- P(Φ) = 0
- Addition:
- Often

- Complement:

Example

- Suppose the probability of raining today is 0.4 and tomorrow is also 0.4 and on both days is 0.1. What is the probability it does not rain on either day.
- S={(R,N), (R,R),(N,N),(N,R)}
- Let A be the event that it will rain today and B it will rain tomorrow. Then
- A ={(R,N), (R,R)} ; B={(N,R),(R,R)}

- Rain at least today or tomorrow:
- Will not rain on either day: 1 – 0.7 = 0.3

Conditional Probability

- One of the most important concepts in all of Data Mining and Machine Learning
- P(A|B) = P(AB)/P(B) ..assuming P(B) not equal 0.
- Conditional probability of A given B has occurred.

- Probability it will rain tomorrow given it has rained today.
- P(A|B) = P(AB)/(B) = 0.1/0.4 = ¼ = 0.25
- In general P(A|B) is not equal to P(B|A)

We need conditional probability to answer….

What should the “prediction” be for the flower ?

Bayes Rule

- P(A|B) = P(AB)/P(B); P(B|A) = P(BA)|P(A)
- Now P(AB) = P(BA)
- Thus P(A|B)P(B) = P(B|A)P(A)
- Thus P(A|B) = [P(B|A)P(A)]/[P(B)]
- This is called Bayes Rule
- Basis of almost all prediction
- Latest theories hypothesize that human memory and action is Bayes rule in action.

Bayes Rule: Example

The ASX market goes up 60% of the days of a year. 40% of the time it stays the same or goes down. The day the ASX is up, there is a 50% chance that the Shanghai Index is up. On other days there is 30% chance that Shanghai goes up. Suppose The Shanghai market is up. What is the probability that ASX was up.

Define A1 as “ASX is up”; A2 is “ASX is not up”

Define S1 as “Shanghai is up”; S2 is “Shanghai is not up”

We want to calculate P(A1|S1) ?

P(A1) = 0.6; P(A2) = 0.4;

P(S1|A1) = 0.5; P(S1|A2) = 0.3

P(S2|A1) = 1 – P(S1|A1) = 0.5;

P(S2|A2) = 1 –P(S1|A2) = 0.7;

Bayes Rule: Example

We want to calculate P(A1|S1) ?

P(A1) = 0.6; P(A2) = 0.4;

P(S1|A1) = 0.5; P(S1|A2) = 0.3

P(S2|A1) = 1 – P(S1|A1) = 0.5;

P(S2|A2) = 1 –P(S1|A2) = 0.7;

P(A1|S1) = P(S1|A1)P(A1)/(P(S1))

How do we calculate P(S1) ?

Bayes Rule: Example

P(S1) = P(S1,A1) + P(S1,A2) [Key Step]

= P(S1|A1)P(A1) + P(S1|A2)P(A2)

= 0.5 x 0.6 + 0.3 x 0.4

= 0.42

Finally,

P(A1|S1) = P(S1|A1)P(A1)/P(S1)

= (0.5 x 0.6)/0.42 = 0.71

Example: Iris Flower

F=Flower; SL=Sepal Length; SW = Sepal Width;

PL=Petal Length; PW =Petal Width

Data

choose the

maximum

Example: Iris Flower

- So how do we compute: P(Data|F=A) ?
- This is a a non-trivial question…[subject to much research]
- How many times does “Data” appear in the “database” when F=A.
- In this case “Data” is a 4-dimensional “data vector.” Each component takes 3 values (small, medium, large). Thus number of combinations 3^4 = 81.

Example: Iris Flower

- Conditional Independence
- P(Data|F=A) = P(SL=Large,SW=Small,PL=Medium,PW=Small|F=A)
- ~= P(SL=Large|F=A)P(SW=Small|F=A)P(PL=Medium|A)P(PW=Small|A)
- The above is an assumption to make the “computation easier.”
- Surprisingly evidence suggest that it works reasonably well in practice.

- This prediction method (which exploits conditional independence) is called “Naïve Bayes Classifier.”

Download Presentation

Connecting to Server..