ordinal classification
Download
Skip this Video
Download Presentation
Ordinal Classification

Loading in 2 Seconds...

play fullscreen
1 / 26

Ordinal Classification - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on

Ordinal Classification. Rob Potharst Erasmus University Rotterdam. What is ordinal classification?. Company: catering service Swift. total liabilities / total assets 1 net income / net worth 3 … … managers’work experience 5 market niche-position 3 … .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Ordinal Classification' - evonne


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ordinal classification

Ordinal Classification

Rob Potharst

Erasmus University Rotterdam

SIKS-Advanced Course on Computational Intelligence, October 2001

what is ordinal classification

What is ordinal classification?

SIKS-Advanced Course on Computational Intelligence, October 2001

company catering service swift
Company: catering service Swift
  • total liabilities / total assets 1
  • net income / net worth 3
  • … …
  • managers’work experience 5
  • market niche-position 3
  • … ...

bankruptcy risk + (acceptable)

SIKS-Advanced Course on Computational Intelligence, October 2001

slide4

Data set: 39 companies

2 2 2 2 1 3 5 3 5 4 2 4 +

4 5 2 3 3 3 5 4 5 5 4 5 +

3 5 1 1 2 2 5 3 5 5 3 5 +

2 3 2 1 2 4 5 2 5 4 3 4 +

3 4 3 2 2 2 5 3 5 5 3 5 +

3 5 3 3 3 2 5 3 4 4 3 4 +

3 5 2 3 4 4 5 4 4 5 3 5 +

1 1 4 1 2 3 5 2 4 4 1 4 +

3 4 3 3 2 4 4 2 4 3 1 3 +

3 4 2 1 2 2 4 2 4 4 1 4 +

2 5 1 1 3 4 4 3 4 4 3 4 +

3 3 4 4 3 4 4 2 4 4 1 3 +

1 1 2 1 1 3 4 2 4 4 1 4 +

2 1 1 1 4 3 4 2 4 4 3 3 +

2 3 2 1 1 2 4 4 4 4 2 5 +

2 3 4 3 1 5 4 2 4 3 2 3 +

2 2 2 1 1 4 4 4 4 4 2 4 +

2 1 3 1 1 3 5 2 4 2 1 3 +

2 1 2 1 1 3 4 2 4 4 2 4 +

2 1 2 1 1 5 4 2 4 4 2 4 +

2 1 1 1 1 3 2 2 4 4 2 3 ?

1 1 3 1 2 1 3 4 4 4 3 4 ?

2 1 2 1 1 2 4 3 3 2 1 2 ?

1 1 1 1 1 1 3 2 4 4 2 3 ?

2 2 2 1 1 3 3 2 4 4 2 3 ?

2 2 1 1 1 3 2 2 4 4 2 3 ?

2 1 2 1 1 3 2 2 4 4 2 4 ?

1 1 4 1 3 1 2 2 3 3 1 2 ?

3 4 4 3 2 3 3 4 4 4 3 4 ?

3 1 3 3 1 2 2 3 4 4 2 3 ?

1 1 2 1 1 1 3 3 4 4 2 3 -

3 5 2 1 1 1 3 2 3 4 1 3 -

2 2 1 1 1 1 3 3 3 4 3 4 -

2 1 1 1 1 1 2 2 3 4 3 4 -

1 1 2 1 1 1 3 1 4 3 1 2 -

1 1 3 1 2 1 2 1 3 3 2 3 -

1 1 1 1 1 1 2 2 4 4 2 3 -

1 1 3 1 1 1 1 1 4 3 1 3 -

2 1 1 1 1 1 1 1 2 1 1 2 -

20: + (acceptable)

9: - (unacceptable)

10: ? (uncertain)

from: Greco, Matarazzo, Slowinski (1996)

SIKS-Advanced Course on Computational Intelligence, October 2001

possible classifier
Possible classifier

if man.exp. > 4, then class = ‘+’

if man.exp. < 4 and net.inc/net.worth = 1, then class = ‘-’

all other cases: class = ‘?’

  • when applied to dataset of 39: 3 mistakes

SIKS-Advanced Course on Computational Intelligence, October 2001

what is classification
What is classification?

The act of assigning objects to classes, using the values of relevant features of those objects

  • So we need:
  • objects (individuals, cases), all belonging to some domain
  • classes, number and kind prescribed
  • features (attributes, variables)
  • a classifier (classification function) that assigns a class to any object

SIKS-Advanced Course on Computational Intelligence, October 2001

building classifiers
Building classifiers
  • = induction from a training set of examples:
    • data without noise
    • data with noise

SIKS-Advanced Course on Computational Intelligence, October 2001

induction methods especially from ai world
induction-methods (especially from AI world)
  • decision trees: C4.5, CART (from 1984 on)
  • neural networks: backpropagation (from1986, with false start from 1974)
  • rule induction algorithms: CN2 (1989)
  • newer methods: rough sets, fuzzy methods, decision lists, pattern based methods, etc.

SIKS-Advanced Course on Computational Intelligence, October 2001

decision tree example
Decision tree: example

man.exp. < 3

y

n

gen.exp./sales = 1

+

y

n

tot.liab/cashfl = 1

?

y

n

classifies 37 out of 39 ex’s correctly

-

?

SIKS-Advanced Course on Computational Intelligence, October 2001

ordinal classification1
Ordinal classification
  • features have ordinal scale
  • classes have ordinal scale
  • the ordering must be preserved!

SIKS-Advanced Course on Computational Intelligence, October 2001

preservation of ordering
Preservation of ordering

A classifier is monotone iff:

if A < B, then also class(A) <class(B)

SIKS-Advanced Course on Computational Intelligence, October 2001

relevance of ordinal classification
Relevance of ordinal classification
  • selection-problems
  • credit worthiness
  • pricing (e.g. real estate)
  • etc.

SIKS-Advanced Course on Computational Intelligence, October 2001

induction of monotone decision trees
Induction of monotone decision trees
  • using C4.5 or CART: non-monotone trees
  • needed: an algorithm that guarantees to generate only monotone trees
  • Makino, Ibaraki, etc. (1996),
  • only for 2-class problems, cumbersome
  • Potharst & Bioch (2000)
  • for k-class problems, fast and efficient

SIKS-Advanced Course on Computational Intelligence, October 2001

the algorithm
The algorithm

try to split subset T:

1) update D for subset T

2) ifD  T is homogeneous then

assign class label to T and make T a leaf definitively

else

split T into two non-empty subsets TL and TR using entropy

try to split subset TL

try to split subset TR

SIKS-Advanced Course on Computational Intelligence, October 2001

the update rule
The update rule

update D for T:

1) if min(T) isnot in Dthen

- add min(T) to D

- class ( min(T) ) = the maximal value allowed, given D

2) if max(T) is not in Dthen

- add max(T) to D

- class ( max(T) ) = the minimal value allowed, given D

SIKS-Advanced Course on Computational Intelligence, October 2001

the minimal value allowed given d
The minimal value allowed given D
  • For each x X \ D it is possible to calculate the minimal and the maximal class value possible, given D.
  • Let  x be the downset { y  X | y  x } of x
  • Let y* be an element in D  x with highest class value
  • Then the minimal class value possible for x is class (y*).

SIKS-Advanced Course on Computational Intelligence, October 2001

the maximal value allowed given d
The maximal value allowed given D
  • Let  x be the upset { y  X | y  x } of x
  • Let y* be an element in D  x with lowest class value
  • Then the maximal class value possible for x is class (y*)
  • if there is no such element then take the maximal class value (or the minimal, in the former case)

SIKS-Advanced Course on Computational Intelligence, October 2001

example
Example

X:

0 0 0

1 0 0

0 1 0

….

2 2 2

D:

0 0 1 0

0 0 2 1

1 1 2 2

2 0 2 2

2 1 2 3

attr. 1: values 0,1,2

attr. 2: values 0,1,2

attr. 3: values 0,1,2

classes: 0, 1, 2, 3

Let us calculate the min and max poss value forx = 022:

minvalue: y* = 002, so the min-value = 1

maxvalue: there is no y*, so the max-value = 3

SIKS-Advanced Course on Computational Intelligence, October 2001

tracing the algorithm
Tracing the algorithm

Try to split subset T = X:

update D for X:

min(X) = 000 is not in D; maxvalue of 000 is 0

add 000 with class 0 to D

max(X) = 222 is not in D; minvalue of 222 is 3

add 222 with class 3 to D

D  X is not homogeneous

so consider all the possible splits:

A1 0; A1 1; A2 0; A2 1; A3 0; A3 1

0 0 0 0

0 0 1 0

0 0 2 1

1 1 2 2

2 0 2 2

2 1 2 3

2 2 2 3

SIKS-Advanced Course on Computational Intelligence, October 2001

the entropy of each split
The entropy of each split

The split A1  0splits X into TL = [000,022] and TR= [100,222]

D TR

1 1 2 2

2 0 2 2

2 1 2 3

2 2 2 3

Entropy = 1

D TL

0 0 0 0

0 0 1 0

0 0 2 1

Entropy = 0.92

Average entropy of this split = 3/7 x 0.92 + 4/7 x 1 = 0.97

SIKS-Advanced Course on Computational Intelligence, October 2001

going on with the trace
Going on with the trace

The split with lowest entropy is A1  0, so we go on with T = TL = [000,022]:

Try to split subset T= [000,022]:

update D for T:

min(T) = 000 is already in D

max(T) = 022 has minimum value 1, so it is added to D

0 0 0 0

0 0 1 0

0 0 2 1

0 2 2 1

1 1 2 2

2 0 2 2

2 1 2 3

2 2 2 3

D T is not homogeneous, so we go on to consider

the following splits: A2 0; A2 1; A3 0; A3 1

Lowest entropy

SIKS-Advanced Course on Computational Intelligence, October 2001

we now have the following tree
We now have the following tree:

A1  0

A3  1

?

?

?

SIKS-Advanced Course on Computational Intelligence, October 2001

going on
Going on...

The split A3  1splits T into TL = [000,021] and TR= [002,022]

We go on with T = TL = [000,021]

Try to split subset T= [000,021]:

min(T) = 000 is already in D

max(T) = 021 has minimum value 0, so it is added to D

D T is homogeneous, so we stop and make T into a leaf with class value 0

Next, we go on with T = TR = [002,022], etc.

SIKS-Advanced Course on Computational Intelligence, October 2001

finally
Finally...

A1  0

A1  1

A3  1

A2  0

2

0

1

2

3

SIKS-Advanced Course on Computational Intelligence, October 2001

a monotone tree for the bankruptcy problem
A monotone tree for the Bankruptcy problem
  • can be seen on p. 107 of the paper that was handed out with this course
  • a tree with 6 leaves
  • uses the same attributes as those that come up with an ordinal version of the rough set approach: see Viara Popova’s lecture

SIKS-Advanced Course on Computational Intelligence, October 2001

conclusions and remaining problems
Conclusions and remaining problems
  • We described an efficient algorithm for the induction of monotone decision trees, in case we have a monotone dataset
  • We also have an algorithm to repair a non-monotone decision tree, but it makes the tree larger
  • What if we have noise in the dataset?
  • Is it possible to repair by pruning?

SIKS-Advanced Course on Computational Intelligence, October 2001

ad