- By
**nirav** - Follow User

- 123 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Ch10 Machine Learning: Symbol-Based' - nirav

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Machine Learning Outline

- The book present four chapters on machine learning, reflecting four approaches to the problem:
- Symbol Based
- Connectionist
- Genetic/Evolutionary
- Stochastic

Ch.10 Outline

- A framework for Symbol-Based Learning
- ID3 Decision Tree
- Unsupervised Learning

The Framework Example

- Data
- The representation:
- Size(small)^color(red)^shape(round)
- Size(large)^color(red)^shape(round)

The Framework Example

- A set of operations:

Based on

- Size(small)^color(red)^shape(round)

replace a single constant with a variable produces the generalizations:

Size(X)^color(red)^shape(round)

Size(small)^color(X)^shape(round)

Size(small)^color(red)^shape(X)

The Framework Example

- The concept space
- The learner must search this space to find the desired concept.
- The complexity of this concept space is a primary measure of the difficulty of a learning problem

The Framework Example

- Heuristic search:

Based on

- Size(small)^color(red)^shape(round)

The learner will make that example a candidate “ball” concept; this concept correctly classifies the only positive instance

If the algorithm is given a second positive instance

- Size(large)^color(red)^shape(round)

The learner may generalize the candidate “ball” concept to

- Size(Y)^color(red)^shape(round)

Learning process

- The training data is a series of positive and negative examples of the concept: examples of blocks world structures that fit category, along with near misses.
- The later are instances that almost belong to the category but fail on one property or relation

Learning process

- This approach is proposed by Patrick Winston (1975)
- The program performs a hill climbing search on the concept space guided by the training data
- Because the program does not backtrack, its performance is highly sensitive to the order of the training examples
- A bad order can lead the program to dead ends in the search space

Ch.10 Outline

- A framework for Symbol-Based Learning
- ID3 Decision Tree
- Unsupervised Learning

ID3 Decision Tree

- ID3, like candidate elimination, induces concepts from examples
- It is particularly interesting for
- Its representation of learned knowledge
- Its approach to the management of complexity
- Its heuristic for selecting candidate concepts
- Its potential for handling noisy data

ID3 Decision Tree

- The previous table can be represented as the following decision tree:

ID3 Decision Tree

- In a decision tree, each internal node represents a test on some property
- Each possible value of that property corresponds to a branch of the tree
- Leaf nodes represents classification, such as low or moderate risk

ID3 Decision Tree

- A simplified decision tree for credit risk management

ID3 Decision Tree

- ID3 constructs decision trees in a top-down fashion.
- ID3 selects a property to test at the current node of the tree and uses this test to partition the set of examples
- The algorithm recursively constructs a sub-tree for each parturition
- This continues until all members of the partition are in the same class

ID3 Decision Tree

- For example, ID3 selects income as the root property for the first step

ID3 Decision Tree

- How to select the 1st node? (and the following nodes)
- ID3 measures the information gained by making each property the root of current subtree
- It picks the property that provides the greatest information gain

ID3 Decision Tree

- If we assume that all the examples in the table occur with equal probability, then:
- P(risk is high)=6/14
- P(risk is moderate)=3/14
- P(risk is low)=5/14

ID3 Decision Tree

- The information gain form income is:

Gain(income)= I[6,3,5]-E[income]= 1.531-0.564=0.967

Similarly,

- Gain(credit history)=0.266
- Gain(debt)=0.063
- Gain(colletral)=0.206

ID3 Decision Tree

- Since income provides the greatest information gain, ID3 will select it as the root of the tree

Attribute Selection Measure: Information Gain (ID3/C4.5)

- Select the attribute with the highest information gain
- Let pi be the probability that an arbitrary tuple in D belongs to class Ci, estimated by |Ci, D|/|D|
- Expected information (entropy) needed to

classify a tuple in D:

Attribute Selection Measure: Information Gain (ID3/C4.5)

- Information needed (after using A to split D into v partitions) to classify D:
- Information gained by branching on attribute A

Decision Tree Example

- Info(Tenured)=I(3,3)=
- log2(12)=log12/log2=1.07918/0.30103=3.584958.
- Teach you what is log2http://www.ehow.com/how_5144933_calculate-log.html
- Convenient tool: http://web2.0calc.com/

Decision Tree Example

- InfoRANK (Tenured)=

3/6 I(1,2) + 2/6 I(1,1) + 1/6 I(1,0)=

3/6 * ( ) + 2/6 (1) + 1/6 (0)= 0.79

- 3/6 I(1,2) means “Assistant Prof” has 3 out of 6 samples, with 1 yes’s and 2 no’s.
- 2/6 I(1,1) means “Associate Prof” has 2 out of 6 samples, with 1 yes’s and 1 no’s.
- 1/6 I(1,0) means “Professor” has 1 out of 6 samples, with 1 yes’s and 0 no’s.

Decision Tree Example

- InfoYEARS (Tenured)=

1/6 I(1,0) + 2/6 I(0,2) + 1/6 I(0,1) + 2/6 I (2,0)= 0

- 1/6 I(1,0) means “years=2” has 1 out of 6 samples, with 1 yes’s and 0 no’s.
- 2/6 I(0,2) means “years=3” has 2 out of 6 samples, with 0 yes’s and 2 no’s.
- 1/6 I(0,1) means “years=6” has 1 out of 6 samples, with 0 yes’s and 1 no’s.
- 2/6 I(2,0) means “years=7” has 2 out of 6 samples, with 2 yes’s and 0 no’s.

Ch.10 Outline

- A framework for Symbol-Based Learning
- ID3 Decision Tree
- Unsupervised Learning

Unsupervised Learning

- The learning algorithms discussed so far implement forms of supervised learning
- They assume the existence of a teacher, some fitness measure, or other external method of classifying training instances
- Unsupervised Learning eliminates the teacher and requires that the learners form and evaluate concepts their own

Unsupervised Learning

- Science is perhaps the best example of unsupervised learning in humans
- Scientists do not have the benefit of a teacher.
- Instead, they propose hypotheses to explain observations,

Unsupervised Learning

- The clustering problem starts with (1) a collection of unclassified objects and (2) a means for measuring the similarity of objects
- The goal is to organize the objects into classes that meet some standard of quality, such as maximizing the similarity of objects in the same class

Unsupervised Learning

- Numeric taxonomy is one of the oldest approaches to the clustering problem
- A reasonable similarity metric treats each object as a point in n-dimensional space
- The similarity of two objects is the Euclidean distance between them in this space

Unsupervised Learning

- Using this similarity metric, a common clustering algorithm builds clusters in a bottom-up fashion, also known as agglomerative clustering:
- Examining all pairs of objects, select the pair with the highest degree of similarity, and mark that pair a cluster
- Defining the features of the cluster as some function (such as average) of the features of the component members and then replacing the component objects with this cluster definition
- Repeat this process on the collection of objects until all objects have been reduced to a single cluster

Unsupervised Learning

- The result of this algorithm is a Binary Tree whose leaf nodes are instances and whose internal nodes are clusters of increasing size
- We may also extend this algorithm to objects represented as sets of symbolic features.

Unsupervised Learning

- Object1={small, red, rubber, ball}
- Object1={small, blue, rubber, ball}
- Object1={large, black, wooden, ball}
- This metric would compute the similary values:
- Similarity(object1, object2)= ¾
- Similarity(object1, object3)=1/4

Partitioning Algorithms: Basic Concept

- Given a k, find a partition of k clusters that optimizes the chosen partitioning criterion
- Global optimal: exhaustively enumerate all partitions
- Heuristic methods: k-means and k-medoids algorithms
- k-means (MacQueen’67): Each cluster is represented by the center of the cluster
- k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87): Each cluster is represented by one of the objects in the cluster

The K-Means Clustering Method

- Given k, the k-means algorithm is implemented in four steps:
- Partition objects into k nonempty subsets
- Compute seed points as the centroids of the clusters of the current partition (the centroid is the center, i.e., mean point, of the cluster)
- Assign each object to the cluster with the nearest seed point
- Go back to Step 2, stop when no more new assignment

10

9

8

7

6

5

4

3

2

1

0

0

1

2

3

4

5

6

7

8

9

10

The K-Means Clustering Method10

9

8

7

6

5

Update the cluster means

Assign each objects to most similar center

4

3

2

1

0

0

1

2

3

4

5

6

7

8

9

10

reassign

reassign

K=2

Arbitrarily choose K object as initial cluster center

Update the cluster means

Example

- Run K-means clustering with 3 clusters (initial centroids: 3, 16, 25) for at least 2 iterations

Example

- Centroids:

3 – 2 3 4 7 9 new centroid: 5

16 – 10 11 12 16 18 19 new centroid: 14.33

25 – 23 24 25 30 new centroid: 25.5

Example

- Centroids:

5 – 2 3 4 7 9 new centroid: 5

14.33 – 10 11 12 16 18 19 new centroid: 14.33

25.5 – 23 24 25 30 new centroid: 25.5

In class Practice

- Run K-means clustering with 3 clusters (initial centroids: 3, 12, 19) for at least 2 iterations

Download Presentation

Connecting to Server..