Loading in 5 sec....

A rule induction algorithmPowerPoint Presentation

A rule induction algorithm

- 490 Views
- Updated On :
- Presentation posted in: Pets / Animals

ID3 (1986), Interatactive Dichotomizer 3 , followed by C4.5 (1993) then C5.0 (>2000) [Ross Quinlan]. the training set is partitioned into smaller & smaller subsets. a selection criteria forms the basis on which the training set is subdivided.

A rule induction algorithm

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

ID3 (1986), Interatactive Dichotomizer 3, followed by C4.5 (1993) then C5.0 (>2000)

[Ross Quinlan].

the training set is partitioned into smaller & smaller subsets.

a selection criteria forms the basis on which the training set is subdivided.

uses a `divide and conquer' method to build the tree

data is divided into subsets until the subset contains a single class

The algorithm is recursive

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

An example (animal classification)

4 input attributes

hair [T, F]

swims [T, F]

colour [white, brown, gray]

size [small, medium, large]

3 Classes

A = KANGAROO

B = DOLPHIN

C = WHALE

#hairswimscoloursizeCLASS

1TFgraymediumKANGAROO

2TFbrownmediumKANGAROO

3FTgraylargeDOLPHIN

4FTwhitemediumDOLPHIN

5FTbrownlargeWHALE

6TFgraylargeKANGAROO

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

An example (animal classification)

#hairswimscoloursizeCLASS

1TFgraymediumKANGAROO

2TFbrownmediumKANGAROO

3FTgraylargeDOLPHIN

4FTwhitemediumDOLPHIN

5FTbrownlargeWHALE

6TFgraylargeKANGAROO

Order of attribute selection will be hair, swims, colour, size

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

An example (animal classification)

#hairswimscoloursizeCLASS

1TFgraymediumKANGAROO

2TFbrownmediumKANGAROO

3FTgraylargeDOLPHIN

4FTwhitemediumDOLPHIN

5FTbrownlargeWHALE

6TFgraylargeKANGAROO

Change order of attribute selection: hair, colour, swims,size

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

An example (animal classification)

#hairswimscoloursizeCLASS

1TFgraymediumKANGAROO

2TFbrownmediumKANGAROO

3FTgraylargeDOLPHIN

4FTwhitemediumDOLPHIN

5FTbrownlargeWHALE

6TFgraylargeKANGAROO

Change order of attribute selection: size, swims, colour, hair

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

Introduce a conflicting example (the `non swimming' smallish whale!)

#hairswimscoloursizeCLASS

1TFgraymediumKANGAROO

2TFbrownmediumKANGAROO

3FTgraylargeDOLPHIN

4FTwhitemediumDOLPHIN

5FTbrownlargeWHALE

6TFgraylargeKANGAROO

7TFgraymediumWHALE

Order of attribute selection: hair, colour, swims, size

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

Same as example 1

#1 2 3 4 5 6 7

The tree is

Hair?

T

F

#3 4 5

#12 6 7

colour?

white

colour?

white

gray

gray

NULL

brown

brown

#1 6 7

swims?

DOLPHIN

T

DOLPHIN

WHALE

KANGAROO

F

#3

#4

#2

#5

#1 6 7

NULL

size?

small

large

medium

KANGAROO

NULL

#6

error

#1 7

How might an algorithm of this type handle missing data?

As the attribute value set has to be finite and discrete then the simplest way is to treat a missing value as an extra attribute value eg

#hairswimscoloursizeCLASS

1T?graymediumKANGAROO

2TFbrownmediumKANGAROO

3FTgraylargeDOLPHIN

4FTwhitemediumDOLPHIN

5FTbrownlargeWHALE

6TFgraylargeKANGAROO

The value set for swims now becomes {T, F, ?}

The rule sets now be written down from the decision trees.

For example

IF hair = T THEN KANGAROO

IF hair = F AND colour = white THEN DOLPHIN

IF hair = F AND colour = brown THEN WHALE

IF hair = F AND colour = gray THEN DOLPHIN

Hair?

T

F

KANGAROO

Colour?

white

gray

DOLPHIN

brown

DOLPHIN

WHALE

Comments:

The choice of which attribute to split on is crucial

The algorithm can deal with missing data

The algorithm can deal with conflict, by flagging that it exists

Next time …