A rule induction algorithm

1 / 10

# A rule induction algorithm - PowerPoint PPT Presentation

ID3 (1986), Interatactive Dichotomizer 3 , followed by C4.5 (1993) then C5.0 (&gt;2000) [Ross Quinlan]. the training set is partitioned into smaller &amp; smaller subsets. a selection criteria forms the basis on which the training set is subdivided.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'A rule induction algorithm' - Roberta

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ID3 (1986), Interatactive Dichotomizer 3, followed by C4.5 (1993) then C5.0 (>2000)

[Ross Quinlan].

the training set is partitioned into smaller & smaller subsets.

a selection criteria forms the basis on which the training set is subdivided.

uses a `divide and conquer' method to build the tree

data is divided into subsets until the subset contains a single class

The algorithm is recursive

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

A rule induction algorithm
An example (animal classification)

4 input attributes

hair [T, F]

colour [white, brown, gray]

size [small, medium, large]

3 Classes

A = KANGAROO

B = DOLPHIN

C = WHALE

# hair swims colour size CLASS

1 T F gray medium KANGAROO

2 T F brown medium KANGAROO

3 F T gray large DOLPHIN

4 F T white medium DOLPHIN

5 F T brown large WHALE

6 T F gray large KANGAROO

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

A rule induction algorithm
An example (animal classification)

# hair swims colour size CLASS

1 T F gray medium KANGAROO

2 T F brown medium KANGAROO

3 F T gray large DOLPHIN

4 F T white medium DOLPHIN

5 F T brown large WHALE

6 T F gray large KANGAROO

Order of attribute selection will be hair, swims, colour, size

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

A rule induction algorithm
An example (animal classification)

# hair swims colour size CLASS

1 T F gray medium KANGAROO

2 T F brown medium KANGAROO

3 F T gray large DOLPHIN

4 F T white medium DOLPHIN

5 F T brown large WHALE

6 T F gray large KANGAROO

Change order of attribute selection: hair, colour, swims,size

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

A rule induction algorithm
An example (animal classification)

# hair swims colour size CLASS

1 T F gray medium KANGAROO

2 T F brown medium KANGAROO

3 F T gray large DOLPHIN

4 F T white medium DOLPHIN

5 F T brown large WHALE

6 T F gray large KANGAROO

Change order of attribute selection: size, swims, colour, hair

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

A rule induction algorithm

# hair swims colour size CLASS

1 T F gray medium KANGAROO

2 T F brown medium KANGAROO

3 F T gray large DOLPHIN

4 F T white medium DOLPHIN

5 F T brown large WHALE

6 T F gray large KANGAROO

7 T F gray medium WHALE

Order of attribute selection: hair, colour, swims, size

The basic algorithm

PROCEDURE BuildTree (ExampleSubset)

NumberOf Classes = calculate the number of classes in the example subset

IF NumberOfClasses = 0 THEN Null leaf

ELSE

IF NumberOfClasses = 1 THEN

store the output class as a leaf in the tree

ELSE

DecisionNodeInput = Determine the input

IF the DecisionNodeInput = 0 THEN

Error: more than one class has all the same attributes

ELSE

Create a decision node for the DecisionNodeInput

FOR all values of the DecisionNodeInput

Determine the NewExampleSubset for this input value .

BuildTree(NewExampleSubset)

ENDFOR

ENDIF

ENDIF

ENDIF

A rule induction algorithm

Same as example 1

A rule induction algorithm

#1 2 3 4 5 6 7

The tree is

Hair?

T

F

#3 4 5

#12 6 7

colour?

white

colour?

white

gray

gray

NULL

brown

brown

#1 6 7

DOLPHIN

T

DOLPHIN

WHALE

KANGAROO

F

#3

#4

#2

#5

#1 6 7

NULL

size?

small

large

medium

KANGAROO

NULL

#6

error

#1 7

A rule induction algorithm

How might an algorithm of this type handle missing data?

As the attribute value set has to be finite and discrete then the simplest way is to treat a missing value as an extra attribute value eg

# hair swims colour size CLASS

1 T ? gray medium KANGAROO

2 T F brown medium KANGAROO

3 F T gray large DOLPHIN

4 F T white medium DOLPHIN

5 F T brown large WHALE

6 T F gray large KANGAROO

The value set for swims now becomes {T, F, ?}

The rule sets now be written down from the decision trees.

For example

IF hair = T THEN KANGAROO

IF hair = F AND colour = white THEN DOLPHIN

IF hair = F AND colour = brown THEN WHALE

IF hair = F AND colour = gray THEN DOLPHIN

A rule induction algorithm

Hair?

T

F

KANGAROO

Colour?

white

gray

DOLPHIN

brown

DOLPHIN

WHALE

A rule induction algorithm

The choice of which attribute to split on is crucial

The algorithm can deal with missing data

The algorithm can deal with conflict, by flagging that it exists

Next time …