1 / 10

# A rule induction algorithm - PowerPoint PPT Presentation

ID3 (1986), Interatactive Dichotomizer 3 , followed by C4.5 (1993) then C5.0 (>2000) [Ross Quinlan]. the training set is partitioned into smaller & smaller subsets. a selection criteria forms the basis on which the training set is subdivided.

## Related searches for A rule induction algorithm

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

A rule induction algorithm

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

#### Presentation Transcript

ID3 (1986), Interatactive Dichotomizer 3, followed by C4.5 (1993) then C5.0 (>2000)

[Ross Quinlan].

the training set is partitioned into smaller & smaller subsets.

a selection criteria forms the basis on which the training set is subdivided.

uses a `divide and conquer' method to build the tree

data is divided into subsets until the subset contains a single class

The algorithm is recursive

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

### A rule induction algorithm

An example (animal classification)

4 input attributes

hair [T, F]

colour [white, brown, gray]

size [small, medium, large]

3 Classes

A = KANGAROO

B = DOLPHIN

C = WHALE

1TFgraymediumKANGAROO

2TFbrownmediumKANGAROO

3FTgraylargeDOLPHIN

4FTwhitemediumDOLPHIN

5FTbrownlargeWHALE

6TFgraylargeKANGAROO

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

### A rule induction algorithm

An example (animal classification)

1TFgraymediumKANGAROO

2TFbrownmediumKANGAROO

3FTgraylargeDOLPHIN

4FTwhitemediumDOLPHIN

5FTbrownlargeWHALE

6TFgraylargeKANGAROO

Order of attribute selection will be hair, swims, colour, size

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

### A rule induction algorithm

An example (animal classification)

1TFgraymediumKANGAROO

2TFbrownmediumKANGAROO

3FTgraylargeDOLPHIN

4FTwhitemediumDOLPHIN

5FTbrownlargeWHALE

6TFgraylargeKANGAROO

Change order of attribute selection: hair, colour, swims,size

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

### A rule induction algorithm

An example (animal classification)

1TFgraymediumKANGAROO

2TFbrownmediumKANGAROO

3FTgraylargeDOLPHIN

4FTwhitemediumDOLPHIN

5FTbrownlargeWHALE

6TFgraylargeKANGAROO

Change order of attribute selection: size, swims, colour, hair

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

### A rule induction algorithm

Introduce a conflicting example (the `non swimming' smallish whale!)

1TFgraymediumKANGAROO

2TFbrownmediumKANGAROO

3FTgraylargeDOLPHIN

4FTwhitemediumDOLPHIN

5FTbrownlargeWHALE

6TFgraylargeKANGAROO

7TFgraymediumWHALE

Order of attribute selection: hair, colour, swims, size

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

### A rule induction algorithm

Same as example 1

#1 2 3 4 5 6 7

The tree is

Hair?

T

F

#3 4 5

#12 6 7

colour?

white

colour?

white

gray

gray

NULL

brown

brown

#1 6 7

DOLPHIN

T

DOLPHIN

WHALE

KANGAROO

F

#3

#4

#2

#5

#1 6 7

NULL

size?

small

large

medium

KANGAROO

NULL

#6

error

#1 7

### A rule induction algorithm

How might an algorithm of this type handle missing data?

As the attribute value set has to be finite and discrete then the simplest way is to treat a missing value as an extra attribute value eg

1T?graymediumKANGAROO

2TFbrownmediumKANGAROO

3FTgraylargeDOLPHIN

4FTwhitemediumDOLPHIN

5FTbrownlargeWHALE

6TFgraylargeKANGAROO

The value set for swims now becomes {T, F, ?}

The rule sets now be written down from the decision trees.

For example

IF hair = T THEN KANGAROO

IF hair = F AND colour = white THEN DOLPHIN

IF hair = F AND colour = brown THEN WHALE

IF hair = F AND colour = gray THEN DOLPHIN

Hair?

T

F

KANGAROO

Colour?

white

gray

DOLPHIN

brown

DOLPHIN

WHALE