knowledge integration by genetic algorithms
Download
Skip this Video
Download Presentation
Knowledge Integration by Genetic Algorithms

Loading in 2 Seconds...

play fullscreen
1 / 126

Knowledge Integration by Genetic Algorithms - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on

Knowledge Integration by Genetic Algorithms. Prof. Tzung-Pei Hong Department of Electrical Engineering National University of kaohsiung. Outline. Introduction Review GAs Fuzzy Sets Related Studies Knowledge Integration Strategies Classification Rules Association Rules

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Knowledge Integration by Genetic Algorithms' - olesia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
knowledge integration by genetic algorithms

Knowledge Integration by Genetic Algorithms

Prof. Tzung-Pei Hong

Department of Electrical Engineering National University of kaohsiung

outline
Outline
  • Introduction
  • Review
    • GAs
    • Fuzzy Sets
    • Related Studies
  • Knowledge Integration Strategies
    • Classification Rules
    • Association Rules
  • Conclusions
why knowledge integration
Why Knowledge Integration
  • Four Reasons

1. Knowledge is distributed among sources

Expert System

RB1

RBi

RBn

Integration

2. It Increases reliability

of knowledge-based

systems

GRB

4. Reduce the

effort on

developing an

expert system or decision support system

User Interface

3. Knowledge can be reused

why using gas
Why Using GAs ?
  • Integration

RB1

RBi

RBn

Integration  must satisfy

1.Completeness 2.Correctness 3.Consistency 4.Conciseness

Multi-objective optimization problem

GAs  finding optimal or nearly optimal solutions

vague knowledge
Vague Knowledge
  • In Real-World Applications

RB1

RBi

RBn

knowledge sources or data  linguistic or ambiguous information

Vagueness greatly influences

the resulting knowledge base

benefits
Benefits
  • Medsker [95]
    • Knowledge integrated from different sources has good validity
    • Integrated knowledge can deal with more complex problems
    • Knowledge integration may improve the performance of the knowledge base
    • Integrating would facilitate building bigger and better systems cheaply
traditional knowledge integration
Traditional Knowledge Integration
  • Problems
    • When conflict occurs
      • Domain experts must intervene in the integration process
    • Subjective
    • Time consuming
    • Limited Integration
      • A small number of knowledge sources
    • more knowledge sources
      • More difficult and complex
our goals
Our Goals
  • Solve potential conflicts and contradictions
  • Integrate knowledge without human expert’s intervention
  • Improve the integration speed
  • Make the scale of knowledge sources
history of gas
History of GAs
  • GA: Genetic Algorithm
  • History

John Holland

1975

K. A. De Jong

D. E. Goldberg

idea of ga
Idea of GA
  • Survival of the fittest
  • Iterative Procedure
  • Genetic operators
    • Reproduction
    • Crossover
    • Mutation
  • Near optimal solution
simple genetic algorithms
Simple Genetic Algorithms

Start

Initialize a

population of individuals

Evaluate each

individual's fitness value

Quit if : 1) Maximum generations are reached

2) Time limit is reached

Select the superior individuals

3) Population is converged

for reproduction

No

Yes

Quit ?

Apply crossover and

perhaps mutation

Evaluate new individual's

fitness value

stop

an example
An Example
  • A Function
    • Find the max
step1
Step1
  • Define a suitable representation
    • Each Chromosome
      • 12 bits
    • e.g.

t = 0  000000000000

t = 1  111111111111

t = 0.680  101011100001

step2
Step2
  • Create an initial population of N
    • N  Population size
    • Assume N = 40
step3
Step3
  • Define a suitable fitness function

f to evaluate the individuals

    • Fitness function  f(t)
    • e.g. The first six individuals
step 4
Step 4
  • Perform the crossover and the mutation operations to generate the possible offsprings
crossover
Crossover
  • Offsprings:
    • Inheriting some characteristics of their parents
  • e.g.

Parent 1 : 00011 0000001

Parent 2 : 01001 1001101

Child 1 : 000111001101

Child 2 : 010010000001

mutation
Mutation
  • Offsprings
    • possessing different characteristics from their ascendents
    • Preserving a reasonable level of population diversity
  • e.g. Bit change
  • e.g. Inversion

0 1 1 1 0 0 0 0 0 1 0 0

1 1 1 1 0 0 0 0 0 1 0 0

1 1 1 0 1 1 0 0 0 1 0 0

1 1 1 1 0 1 0 0 0 1 0 0

new offsprings
New Offsprings
  • The new offsprings produced by the operators
step 5
Step 5
  • Replace the individual
  • e.g. The first six individuals

NEW

step 6
Step 6
  • If the termination criteria are not satisfied, go to Step 4; otherwise, stop the genetic algorithm
    • The termination criteria
      • The maximum number of generations
      • The time limit
      • The population converged
fuzzy sets
Fuzzy Sets
  • 傳統電腦決策
    • 不是對(1)就是錯(0)

例如:25歲以上是青年,那26歲就是中年?

60分以上是及格,那60分以下就是不及格

  • 何謂模糊
    • 在對(1)與錯(0)之間,再多加幾個等級
      • 幾乎對(0.8)
      • 可能對(0.6)
      • 可能錯(0.4)
      • 幾乎錯(0.2)
fuzzy sets1
Fuzzy Sets

再多分成幾級  連續

  • Question:168公分到底算不算高?

隸屬度

身高(Cm)

160

170

180

example close to 0
Example:“Close to 0”
  • e.g.
    • μA(3) = 0.01
    • μA(1) = 0.09
    • μA(0.25) = 0.62
    • μA(0) = 1
  • Define a Membership Function:

μA(x) =

example close to 01
Example:“Close to 0”
  • Very Close to 0:

μA(x) =

fuzzy set cont
Fuzzy Set (Cont.)

0.6 sunny

0.8 sunny

x

0.1 sunny

  • Membership function
    • [0, 1]
  • e.g.
    • sunny : x → [0, 1]
fuzzy set
Fuzzy Set

Sunny

Not sunny

1 0.8 0.6 0.4 0.2 0

  • Simple
  • Intuitively pleasing
  • A generalization of crisp set
  • Vague  member → non-member

0 or 1

Non-member member

gradual

fuzzy operations
Fuzzy Operations
  • 交集(AND)
    • 取較小的可能性

EX:學生聰明(0.8) 而且 用功(0.6) 則是模範生(0.6)

  • 聯集(OR)
    • 取較大的可能性

EX:學生聰明(0.8) 或者 用功(0.6) 則是模範生(0.8)

  • 反面(NOT)
    • 取與1的差

EX:學生聰明是0.8, 則學生不聰明0.2

fuzzy inference example
Fuzzy Inference Example

大眼睛 小嘴巴 身材好

陶晶瑩 0 0.8 0.3

張惠妹 1 0.6 0.8

李 玟 0 0.3 0.9

李心潔 0.7 0.1 0.5

蔡依林 0.8 0.5 0.3

  • 洪老師找小老婆的條件
    • (大眼睛而且小嘴巴)或者是身材好

Question : 誰是最佳女主角

answer
Answer
    • 對陶晶瑩= (0 AND 0.8) OR 0.3 = 0 OR 0.3 = 0.3
    • 對張惠妹= (1 AND 0.6) OR 0.8 = 0.8
    • 對李 玟= (0 AND 0.3) OR 0.9 = 0.9
    • 對李心潔= (0.7 AND 0.1) OR 0.5 = 0.5
    • 對蔡依林= (0.8 AND 0.5) OR 0.3 = 0.5
  • 李 玟 為最佳選擇!

謝謝!

fuzzy decision
Fuzzy Decision
  • A = {A1, A2, A3, A4, A5}
    • A set of alternatives
  • C = {C1, C2, C3}
    • A set of criteria
example cont
Example (Cont.)
  • Assume : C1 and C2 or C3
    • E (Ai) : evaluation function
      • E (A1) = (0  0.8)  0.3 = 0  0.3 = 0.3
      • E (A2) = (1  0.6)  0.8 = 0.6  0.8 = 0.8
      • E (A3) = (0  0.3)  0.9 = 0  0.9 = 0.9  the best choice
      • E (A4) = (0.7  0.1)  0.5 = 0.1  0.5 = 0.5
      • E (A5) = (0.8  0.5)  0.3 = 0.5  0.3 = 0.5
review of knowledge integration
Review of Knowledge Integration

Knowledge

Integration

Cooperative

Approach

Centralized

Approach

Blackboard

LPC Model

Integrity

Constraints

Repertory

Grid

Genetic

Algorithm

Decision

Table

ga based classifier systems
GA-Based Classifier Systems

GA-Based

Classifier Systems

Michigan

Approach

Pittsburgh

Approach

rule 1

xxxxxxx....

rule set 1

rrrrrrrrr....

rule 2

rule set 2

zzzzzzzzzzzz....

yyyyyyy....

nnnnnn....

rule n

rule set m

mmmm.......

genetic knowledge integration
Genetic Knowledge Integration

Michigan

Approach

Pittsburgh

Approach

GKIDSO

Approach

TPGKI

Approach

MGKI

Approach

Vague

Knowledge

GFKILM

Approach

GFKIGM

Approach

TPGFKI

Approach

MGFKI

Approach

integration of classification rules
Integration of Classification Rules
  • Four Methods
    • GKIDSO
      • Genetic Knowledge-Integration approach with Domain-Specific Operators
    • TPGKI
      • Two-Phase Genetic Knowledge Integration
    • GFKILM
      • Genetic-Fuzzy Knowledge-Integration with several sets of Local Membership functions
    • GFKIGM
      • Genetic-Fuzzy Knowledge-Integration with a set of Global Membership functions
genetic knowledge integration framework
Genetic Knowledge-Integration Framework

Training

Data Set 1

Training

Data Set m

Expert

Group 1

Expert

Group n

K.A.

Tool 1

K.A.

Tool n

M.L

Method 1

M.L

Method m

Rule Set

Rule Set

Rule Set

Rule Set

Dictionary

Dictionary

Dictionary

Dictionary

Encoding

Global Feature

Set &Class Set

Intermediary

representation

Intermediary

representation

Intermediary

representation

Intermediary

representation

GA-Based

Knowledge Integration

Integrating

Case Set

Knowledge Base

Dictionary

slide39
Knowledge Integration

Rule Set

Knowledge Input

Rule Set

Rule Set

Knowledge Encoding

Genetic

Knowledge Integration

Knowledge

Verification

Knowledge

Knowledge Decoding

Integration

Data Set

Knowledge Base

gkido approach
GKIDO Approach

Knowledge integration

Knowledge encoding

Generation k

Generation 0

Initial population

Chromosome

Chromosome

1

1

RS

1

Chromosome

1

Chromosome

Chromosome

2

2

Chromosome

genetic

RS

2

2

Chromosome

Chromosome

3

3

Chromosome

RS

3

3

operators

Chromosome

Chromosome

RS

m

Chromosome

m

m

m

  • Genetic Knowledge-Integration approach with Domain-Specific Operators
  • Consists of two parts
    • Encoding
    • Integration
knowledge encoding
Knowledge Encoding

Rule Set

Intermediary Rule

Intermediary Rule

Fixed-Length Rule String

Fixed-Length Rule String

Variable-Length Rule-Set String

example brain tumor
Example: Brain Tumor
  • Two classes: {Adenoma, Meningioma}
  • Three features:
    • {Location, Calcification, Edema}
  • Feature values for Location
    • {brain surface, sellar, brain stem}
  • Feature values for Calcification
    • {no, marginal, vascular-like, lumpy}
  • Feature values for Edema
    • {no, < 2 cm, < 0.5 hemisphere}
intermediary rules
Intermediary Rules
  • Two Rules
    • R1:IF (Location=sellar) and (Calcification=no)

then Asenoma

    • R2:IF (Location=brain surface) and (Edema< 2cm)

then Meningioma

dummy test

R1:IF(Location=sellar) and (Calcification=no) and

(Edema= no , or < 2 cm , or < 0.5 hemisphere)

then Asenoma

R2:IF(Location=brain surface) and

(Calcification= no or marginal or vascular-like or lumpy)and

(Edema< 2cm)

then Meningioma

fixed length rule string
Fixed-Length Rule String

Location

Calcification

Edema

Classes

R1 : 010 1000 111 10

R2 : 100 1111 010 01

R1:IF(Location=sellar) and (Calcification=no) and

(Edema= no , or < 2 cm , or < 0.5 hemisphere)

then Asenoma

R2:IF(Location=brain surface) and

(Calcification= no or marginal or vascular-like or lumpy)and

(Edema< 2cm)

then Meningioma

knowledge integration
Knowledge Integration

Genetic Operation

Crossover

Initial Population

Generation 1

Mutation

Rule Set 1

Fusion

Rule Set 1

Fission

Rule Set 2

Rule Set 2

Rule Set n

Rule Set n

Fitness Function

fitness function
Fitness Function

-

-

  • Formally
  • where

-  is a control parameter

crossover1
Crossover

r

r

r

11

1

i

1

n

6

4

4

7

4

4

8

6

4

4

7

4

4

8

6

4

4

7

4

4

8

:

100

11

01

10

001

01001

0101010

0010101011

00

L

L

L

L

RS

1

4

2

4

3

1

{

cp

7

bits

1

r

r

r

2

j

21

2

m

6

4

4

7

4

4

8

6

4

4

7

4

4

8

6

4

4

7

4

4

8

:

0100110011

00

11011

1010101

1000110011

01

L

L

L

RS

1

4

2

4

3

2

{

7

bits

cp

2

crossover

1010101

1000110011

01

:

1001101100

01

01001

L

L

L

L

L

O

1

0101010

0010101011

00

:

0100110011

00

11011

L

L

O

2

fusion
Fusion
  • Eliminate redundancy and subsumption
    • Redundancy
      • R1: if A then B
      • R2: if A then B
    • Subsumption
      • R1: if A and C then B
      • R2: if A then B
fusion cont
Fusion (Cont.)
  • Eliminate redundancy
  • Eliminate subsumption
fission
Fission
  • Eliminate misclassification and contradiction
    • Misclassification
      • e: (A, C)

R: if A then B

    • Contradiction
      • R: if A then B or C

R1: if A then B

R2: if A then C

fission cont
Fission (Cont.)

r

r

r

k

1

ki

kn

6

4

4

7

4

4

8

6

4

4

7

4

4

8

6

4

4

7

4

4

8

11

:

1001101100

01

1001001

100

0010101011

00

L

L

L

L

RS

k

Fission

"

r

r

r

I

k

1

ki

kn

6

4

4

7

4

4

8

6

4

4

7

4

4

8

6

4

4

7

4

4

8

6

4

4

7

4

4

8

01

10

:

1001101100

01

1001001

100

1001001

010

0010101011

00

L

L

L

L

O

k

Insert

  • Eliminate misclassification
    • Select the "closest" near-miss rule to the wrong classified test instance for specializing
fission cont1
Fission (Cont.)

r

r

r

k

1

ki

kn

6

4

4

7

4

4

8

6

4

4

7

4

4

8

6

4

4

7

4

4

8

110

:

1001101100

01

100100110

0010101011

00

L

L

L

L

RS

k

Fission

1

2

r

r

r

r

k

1

ki

ki

kn

6

4

4

7

4

4

8

6

4

4

7

4

4

8

6

4

4

7

4

4

8

6

4

4

7

4

4

8

100

010

:

1001101100

01

100100110

100100110

0010101011

00

L

L

L

L

O

k

  • Eliminate contradiction
experiments breast cancer diagnosis
Experiments- Breast Cancer Diagnosis
  • Six knowledge sources are integrated
  • 699 cases used in the experiment
    • 524 cases for integrating
    • 175 cases for testing
  • 9 attributes and 2 classes
    • Benign : 458 cases
    • Malignant : 241 cases
  • Each rule is encoded into a bit string of 92 bits long
result
Result

Generation

CPU Time

Accuracy

Fitness

0

1

6

23

39

32

56

59

80

97

100

00:00:00

00:00:01

00:00:05

00:00:22

00:00:28

00:00:31

00:00:55

00:00:58

00:01:19

00:01:36

00:01:40

0.7720

0.8117

0.8824

0.9068

0.9228

0.9422

0.9487

0.9533

0.9556

0.9568

0.9619

0.7495

0.7856

0.8650

0.8719

0.8959

0.9237

0.9301

0.9346

0.9368

0.9473

0.9523

result cont
Result (Cont.)

correctly

classification

classes

case no.

misclassification

unknown

132

128

Benign

2

2

0

2

43

41

Malignant

  • Test cases: 175
application brain tumor diagnosis
Application- Brain Tumor Diagnosis
  • Ten knowledge sources are integrated
  • 504 actual cases used in the application
    • 378 cases for integrating
    • 126 cases for testing
  • 12 attributes and 6 classes
  • Each rule is encoded into a bit string

of 105 bits long

Glioblastoma: 54

Pituitary Adenoma: 85

Astrocytoma: 122

Medulloblastoma: 68

Meningioma: 119

Protoplasmic Astrocytoma: 56

application brain tumor diagnosis cont
Application - Brain Tumor Diagnosis (Cont.)

Generation

CPU Time

Accuracy

Fitness

0

150

300

450

600

750

900

1200

1350

1500

1650

1800

2000

00:00:00

00:19:05

00:38:24

00:57:22

01:16:28

01:35:31

01:54:55

02:32:58

02:51:19

03:10:36

03:29:40

03:49:31

04:14:24

0.7981

0.8117

0.8264

0.8321

0.8523

0.8601

0.8703

0.8791

0.8798

0.8830

0.8877

0.8907

0.9142

0.5330

0.5701

0.6070

0.6125

0.6230

0.6337

0.6370

0.6601

0.6673

0.6710

0.7022

0.7373

0.7590

tpgki approach
TPGKI Approach
  • TPGKI
    • Two-Phase Genetic Knowledge Integration
  • Consisting of two phases
    • Knowledge integration
    • Knowledge refinement
  • Integrating multiple rule sets by pure genetic operators
  • Domain-specific genetic operators need not intervene in the integration
two phases
Two Phases

r

11

r

11

RS

1

RS

1

RS

1

z

1

x

1

r

r

RS

RS

RS

2

2

2

3

3

3

RS

RS

RS

Select

the best

m

m

m

RS

RS

RS

m

r

1

r

m

1

Integration

Integration

Integration

r

r

mw

Phase

my

Phase

Phase

Refinement

Refinement

Phase

Phase

  • Integration phase & Refinement phases
knowledge integration phase
Knowledge-Integration Phase

Genetic Operation

Initial Population

Generation 1

Crossover

Mutation

Rule Set 1

Rule Set 1

Rule Set 2

Rule Set 2

Rule Set n

Rule Set n

Fitness Function

knowledge refinement phase
Knowledge-Refinement Phase

Genetic

Operation

Initial

Population

Generation 1

Crossover

Rule Set 1

Rule 1

Mutation

Rule 1

Rule 2

Rule 2

Rule Set i

Rule Set n

Rule m

Redundancy

Fitness Function

Subsumption

Contradiction

evaluation process
Evaluation Process

Let U be the object set

Sort rules by Accuracy* Necessity

Fitness=Accuracy*Necssity*Coverage

Remove

U=U-

Empty

STOP

experiments breast cancer diagnosis1
Experiments- Breast Cancer Diagnosis

Generation

CPU Time

Accuracy

Fitness

0

1

4

5

8

26

34

44

45

93

95

100

00:00:00

00:00:02

00:00:10

00:00:13

00:00:20

00:01:04

00:01:25

00:01:50

00:01:52

00:03:51

00:03:57

00:04:12

0.7720

0.8191

0.8581

0.9206

0.9477

0.9483

0.9525

0.9560

0.9657

0.9659

0.9674

0.9793

0.7495

0.7875

0.8250

0.9112

0.9119

0.9247

0.9281

0.9375

0.9428

0.9469

0.9484

0.9502

application brain tumor diagnosis1
Application- Brain Tumor Diagnosis

Generation

CPU Time

Accuracy

Fitness

00:00:00

00:31:05

01:02:13

01:33:42

02:04:33

02:35:31

03:07:55

04:09:38

04:41:03

05:12:39

05:43:40

06:25:31

06:59:05

0.7981

0.8191

0.8296

0.8472

0.8583

0.8753

0.8903

0.8989

0.9012

0.9057

0.9107

0.9162

0.9257

0.5744

0.5801

0.6070

0.7245

0.8015

0.8178

0.8327

0.8501

0.8523

0.8541

0.8583

0.8621

0.8700

0

150

300

450

600

750

900

1200

1350

1500

1650

1800

2000

comparison of gkidso and tpgki
Comparison of GKIDSO and TPGKI

Approach

CPU Time

Accuracy

Rule No.

GKIDSO

100

96.10%

10

(100 generations)

TPGKI

252

97.93%

7

(100 generations)

Approach

CPU Time

Accuracy

GKIDSO

15264

91.42%

92

(2000 generations)

TPGKI

25145

92.57%

86

(2000 generations)

  • Experiment: Breast Cancer Diagnosis
  • Application: Brian Tumor Diagnosis
genetic fuzzy knowledge integration
Genetic-Fuzzy Knowledge-Integration
  • GFKILM
    • Genetic-Fuzzy Knowledge-Integration with several sets of Local Membership functions
    • Associated with several sets of local membership functions
  • GFKIGM Approach
    • Genetic-Fuzzy Knowledge-Integration with a set of Global Membership functions
    • Associated with a set of global membership functions
genetic fuzzy knowledge integration framework
Genetic-Fuzzy Knowledge-Integration Framework

Expert

Expert

n

Group 1

Group

K.A.

M.L

K.A.

M.L

Training Set m

Training Set 1

Tool 1

Method 1

Tool n

Method m

Fuzzy

Fuzzy

Fuzzy

Fuzzy

Rule Set

Rule Set

Rule Set

Rule Set

Membership

Membership

Membership

Membership

Functions

Functions

Functions

Functions

Encoding

Intermediary

Intermediary

Intermediary

Intermediary

representation

representation

representation

representation

Genetic Fuzzy

Test

Records

Integrating

Knowledge Integration

objects

Instances

Fuzzy Rule Set

+

Membership Functions

gfkilm approach
GFKILM Approach

Knowledge encoding

Knowledge integration

Generation k

Generation 0

Initial population

~

~

Chromosome

Chromosome

Chromosome

+MFS

R

S

1

1

1

1

1

~

~

Chromosome

Chromosome

Chromosome

+MFS

R

2

2

2

S

genetic

2

2

~

~

Chromosome

Chromosome

Chromosome

3

3

3

+MFS

R

S

3

3

operators

~

~

Chromosome

Chromosome

Chromosome

+MFS

R

S

m

m

m

m

m

  • GFKILM approach consists of two parts
    • Encoding
    • Integration
knowledge encoding1
Knowledge Encoding

Rule Set+MFS

Intermediary Rule+MFS

Intermediary Rule+MFS

Fixed-Length Rule String

Fixed-Length Rule String

Associated with MFS

Associated with MFS

Variable-Length Rule-Set String

Associated with MFS

examples iris flowers
Examples: IRIS Flowers

u(S.W. )

u(S.L.)

Medium

Wide

Narrow

Medium

Long

Short

S.L.

S.W.

4.3

5.2

6.1

7.9

7.0

2.0

3.8

2.6

3.2

4.4

花萼長度

花萼寬度

u(P.L. )

u(P.W. )

Medium

Long

Short

Medium

Wide

Narrow

P.L.

P.W.

1.0

2.4

3.9

6.9

5.4

01

1.9

0.7

1.3

2.5

花瓣長度

花瓣寬度

Setosa =1, Versicolor=2, Virginica=3

examples
Examples

IF P.L.=Short Then Setosa

Intermediary Representation

IF S.L.=(Short or Medium or Long) and S.W.=(Narrow or

Medium or Wide)and P.L.=Short and P.W.=(Narrow or

Medium or Wide) Then Setosa

Membership functions + Fuzzy Rules

knowledge integration1
Knowledge Integration

Genetic Operation

Crossover

Initial Population

Generation 1

Mutation

RS1+MFS

Fusion

RS1+MFS

RS2+MFS

RS2+MFS

RSn+MFS

RSn+MFS

Fitness Function

fusion1
Fusion

: IF (P.L.=Short) Then Class is Setosa

: IF (P.L.=Short) Then Class is Setosa

fusion subsumption
Fusion (Subsumption)

: IF (P.L.=Short) Then Class is Setosa

: IF (P.L.=Short) and (P.W.=Narrow) Then Class is Setosa

fusion subsumption1
Fusion(subsumption)

~

r

ki

6

4

4

4

4

4

4

4

7

4

4

4

4

4

4

4

8

S

.

L

.

S

.

W

.

P

.

L

.

P

.

W

.

Class

6

4

7

4

8

6

4

7

4

8

6

4

7

4

8

6

4

7

4

8

}

~

~

:

1

,

1

,

1

,

1

,

1

,

1

1

,

1

,

1

,

1

,

1

,

1

1

,

1

,

1

,

0

,

1

,

0

1

,

1

,

1

,

1

,

1

,

1

100

L

L

R

S

k

~

r

kj

6

4

4

4

4

4

4

4

7

4

4

4

4

4

4

4

8

S

.

L

.

S

.

W

.

P

.

L

.

P

.

W

.

Class

6

4

7

4

8

6

4

7

4

8

6

4

7

4

8

6

4

7

4

8

}

1

,

1

,

1

,

1

,

1

,

1

1

,

1

,

1

,

1

,

1

,

1

1

,

1

,

1

,

0

,

1

,

0

1

,

1

,

1

,

0

,

1

,

0

100

L

L

~

r

ki

6

4

4

4

4

4

4

4

7

4

4

4

4

4

4

4

8

S

.

L

.

S

.

W

.

P

.

L

.

P

.

W

.

Class

6

4

7

4

8

6

4

7

4

8

6

4

7

4

8

6

4

7

4

8

}

~

~

:

1

,

1

,

1

,

1

,

1

,

1

1

,

1

,

1

,

1

,

1

,

1

1

,

1

,

1

,

0

,

1

,

0

1

,

1

,

1

,

1

,

1

,

1

100

L

L

R

S

k

~

r

kj

6

4

4

4

4

4

4

4

7

4

4

4

4

4

4

4

8

S

.

L

.

S

.

W

.

P

.

L

.

P

.

W

.

Class

6

4

7

4

8

6

4

7

4

8

6

4

7

4

8

6

4

7

4

8

}

1

,

1

,

1

,

1

,

1

,

1

1

,

1

,

1

,

1

,

1

,

1

1

,

1

,

1

,

0

,

1

,

0

1

,

1

,

1

,

0

,

1

,

0

100

L

L

~

r

ki

6

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

7

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

4

8

S

.

L

.

S

.

W

.

P

.

L

.

P

.

W

.

Class

6

4

4

4

7

4

4

4

8

6

4

4

4

7

4

4

4

8

6

4

4

4

7

4

4

4

8

6

4

4

4

7

4

4

4

8

}

~

~

-

-

:

5

.

1

,

0

.

8

,

6

.

0

,

0

.

87

.

1

,

1

.

0

2

.

5

,

0

.

7

,

3

.

1

,

0

.

6

,

3

.

9

,

0

.

7

2

.

3

,

1

.

4

,

3

.

7

,

1

.

5

,

5

.

3

,

1

.

6

0

.

7

,

0

.

8

,

1

.

3

,

0

.

7

,

1

.

8

,

0

.

6

100

L

L

R

S

k

Fusion

experiments hepatitis diagnosis
Experiments- Hepatitis Diagnosis
  • Ten knowledge sources are integrated
  • 155 cases used in the experiment
  • 19 attributes and 2 classes
experiments hepatitis diagnosis1
Experiments- Hepatitis Diagnosis

Generation

CPU Time

Accuracy

Fitness

0

13

69

124

181

261

414

1401

2110

3110

3550

3817

4000

00:00:00

00:00:04

00:00:19

00:00:35

00:00:52

00:01:15

00:01:55

00:06:07

00:09:20

00:13:42

00:15:35

00:16:45

00:17:36

0.7688

0.7844

0.8132

0.8328

0.8432

0.8525

0.8688

0.8876

0.8949

0.8965

0.8977

0.9183

0.9290

0.7537

0.7690

0.7972

0.8164

0.8266

0.8357

0.8517

0.8701

0.8773

0.8789

0.8800

0.9002

0.9107

application sugar cane breeding prediction
Application : Sugar-Cane Breeding Prediction
  • Four knowledge sources are integrated
  • 699 actual cases used in the application
  • 36 attributes and 2 classes
application sugar cane breeding prediction1
Application : Sugar-Cane Breeding Prediction

Generation

CPU Time

Accuracy

Fitness

0

2

9

17

29

37

76

230

290

392

498

990

1734

2052

2108

3341

5000

00:00:00

00:00:02

00:00:08

00:00:17

00:00:29

00:00:37

00:01:16

00:03:53

00:04:53

00:06:36

00:08:22

00:16:36

00:29:06

00:34:26

00:35:22

00:55:58

01:23:46

0.5674

0.6780

0.6803

0.6868

0.6871

0.6877

0.6903

0.6904

0.6952

0.6954

0.7174

0.7352

0.7378

0.7414

0.7416

0.7449

0.7602

0.5562

0.6647

0.6669

0.6733

0.6742

0.6748

0.6766

0.6768

0.6815

0.6817

0.7033

0.7207

0.7233

0.7268

0.7270

0.7302

0.7452

  • Each rule is encoded into a string of 362 units long
gfkigm approach
GFKIGM Approach
  • Genetic-Fuzzy Knowledge-Integration with a set of Global Membership functions
  • Consisting of two parts
    • Knowledge encoding
    • Knowledge integration
  • Generating a fuzzy rule-set associated with a global collection of membership functions for all fuzzy rules
knowledge encoding2
Knowledge Encoding

Rule Set+MFS

Intermediary Rule

MFS String

Fixed-Length Rule String

Variable-Length Rule-Set String

+ MFS String

examples iris flowers1
Examples: IRIS Flowers

: IF P.L.=Short Then Setosa

: IF P.L.=Long Then Virginica

: IF P.W.=Medium Then Versicolor

: IF P.W.=Wide Then Virginica

examples iris flowers2
Examples : IRIS Flowers

u(S.W. )

u(S.L.)

Medium

Wide

Narrow

Medium

Long

Short

S.L.

S.W.

4.3

5.2

6.1

7.9

7.0

2.0

3.8

2.6

3.2

4.4

花萼長度

花萼寬度

u(P.L. )

u(P.W. )

Medium

Long

Short

Medium

Wide

Narrow

P.L.

P.W.

1.0

2.4

3.9

6.9

5.4

01

1.9

0.7

1.3

2.5

花瓣長度

花瓣寬度

Setosa =1, Versicolor=2, Virginica=3

examples iris flowers3
Examples : IRIS Flowers

IF P.L.=Short Then Setosa

Intermediary Representation

IF S.L.=(Short or Medium or Long) and S.W.=(Narrow or

Medium or Wide)and P.L.=Short and P.W.=(Narrow or

Medium or Wide) Then Setosa

Rule String

examples iris flowers cont
Examples : IRIS Flowers (Cont.)

: IF P.L.=Short Then Setosa

: IF P.L.=Long Then Virginica

: IF P.W.=Medium Then Versicolor

: IF P.W.=Wide Then Virginica

knowledge integration2
Knowledge Integration

Genetic Operation

Crossover

Generation 1

Initial Population

Mutation

RS1+MFS

Fusion

RS1+MFS

RS2+MFS

RS2+MFS

RSn+MFS

RSn+MFS

Fitness Function

fusion3
Fusion

: IF (P.L.=Short) Then Class is Setosa

: IF (P.L.=Short) Then Class is Setosa

fusion subsumption2
Fusion (Subsumption)

: IF (P.L.=Short) Then Class is Setosa

: IF (P.L.=Short) and (P.W.=Narrow) Then Class is Setosa

experiments hepatitis diagnosis2
Experiments- Hepatitis Diagnosis

Generation

CPU Time

Accuracy

Fitness

0

4

34

160

473

570

1057

1495

1791

2251

2580

2710

3062

3342

3756

3847

4000

00:00:00

00:00:02

00:00:10

00:00:45

00:02:14

00:02:40

00:04:51

00:06:46

00:08:03

00:10:02

00:11:27

00:12:02

00:13:13

00:14:49

00:16:44

00:17:09

00:17:51

0.7688

0.7867

0.8228

0.8450

0.8542

0.8554

0.8578

0.8633

0.8656

0.8721

0.8837

0.8895

0.8910

0.9049

0.9056

0.9068

0.9161

0.7573

0.7712

0.8066

0.8284

0.8374

0.8386

0.8409

0.8463

0.8486

0.8550

0.8663

0.8720

0.8735

0.8871

0.8878

0.8890

0.8981

application sugar cane breeding prediction2
Application : Sugar-Cane Breeding Prediction

Generation

CPU Time

Accuracy

Fitness

0.5345

0.6530

0.6603

0.6608

0.6664

0.6665

0.6720

0.6726

0.6741

0.6944

0.6991

0.7041

0.7055

0.7082

0.7147

0.7266

0

2

3

9

13

16

17

227

308

493

1386

1637

2924

3151

3300

5000

00:00:00

00:00:02

00:00:03

00:00:08

00:00:12

00:00:15

00:01:16

00:03:47

00:03:53

00:08:15

00:23:14

00:27:27

00:49:04

00:52:52

00:56:22

01:24:37

0.5506

0.6726

0.6802

0.6807

0.6864

0.6868

0.6922

0.6928

0.6944

0.7153

0.7201

0.7253

0.7267

0.7295

0.7362

0.7485

  • Each knowledge source is encoded into a string of 542 units long
comparison of gfkilm and gfkigm
Comparison of GFKILM and GFKIGM

Approach

CPU Time

Accuracy

Rule No.

GFKILM

1056

92.90%

4

(4000 generations)

GFKIGM

1071

91.61%

4

(4000 generations)

Approach

CPU Time

Accuracy

Rule No.

GFKILM

5026

76.02%

2

(5000 generations)

GFKIGM

5077

74.85%

2

(5000 generations)

  • Experiment: Hepatitis Diagnosis
  • Application: Sugar-Cane Breeding Prediction
slide106
ROADMAP

Michigan

Approach

Pittsburgh

Approach

GKIDSO

Approach

TPGKI

Approach

MGKI

Approach

Vague

Knowledge

GFKILM

Approach

GFKIGM

Approach

TPGFKI

Approach

MGFKI

Approach

why data mining
Why Data Mining?

Supermarket

Commodities

Simon

if one customer buys milk

then he is likely to buy bread, so...

mining association rules
Mining Association Rules

Milk

Bread

IF bread is bought then milk is bought

the role of data mining
The Role of Data Mining

Useful patterns

Knowledge and strategy

Preprocess data

mining steps
Mining steps
  • Step1:Define minsup and minconf

ex: minsup=50%

minconf=50%

  • Step2:Find large itemsets
  • Step3:Generate association rules
example
Example

Large itemsets

Scan

Database

L

1

Itemset

Sup.

{A}

2

{B}

3

{C}

3

{E}

3

Scan

Database

Scan

Database

integrating mined knowledge
Integrating Mined Knowledge

If customers buy B and C, then they will buy D .

If customers buy A, then they will buy B.

A  B

B, C  D

A, C  E

.

.

.

Branch 1

Branch 2

If customers buy A and C, then they will buy E .

Headquarter

Branch 3

  • Association Rules
integration of association rules
Integration of Association Rules

...

DB2

DB1

DBn

AB→C

A→D

B→E

AB→C

A→D

B→E

...

AB→C

A→D

B→E

RD1

RD2

RDn

GRB

  • Synthesizing High-Frequency Rules
    • Weighting
    • Ranking
  • Xindong Wu and Shichao Zhang (2003)
    • Synthesizing High-Frequency Rules fromDifferent Data Sources
      • Known data sources
integration of association rules cont
Integration of Association Rules (Cont.)

Internet

journals

books

Web

X→Y

conf=0.7

X→Y

conf=0.72

X→Y

conf=0.68

  • Synthesizing
    • clustering method

X→Y

conf=?

  • Xindong Wu and Shichao Zhang (2003)
    • Synthesizing High-Frequency Rules fromDifferent Data Sources
      • Unknown data sources
integration of association rules1
Integration of Association Rules

Transaction database n

Transaction database i

Transaction database 1

Data Mining Method

Data Mining Method

Data Mining Method

Fuzzy

Fuzzy

Fuzzy

Rule Set i

Rule Set 1

Rule Set n

Membership

Membership

Membership

Functions i

Functions n

Functions 1

Encoding

Intermediary

Intermediary

Intermediary

representation

representation

representation

Integration

Genetic Fuzzy

Knowledge Integration

Sample Data

Fuzzy Rule Set

+

Membership Functions

  • Framework
data mining method
Data Mining Method

linguistic terms

Minimum support

Minimum confidence

Mining Membership Functions

Membership

Membership

Membership

Membership

Function Set2

Function Set3

Function Setq

Function Set1

Population

Chromosome1

Chromosome3

Chromosomeq

Chromosome2

PC

Transaction

Genetic Fuzzy

Database

MF Acquisition process

Fuzzy Mining

for Large 1-itemsets

Mining Fuzzy Association Rules

Final Membership

Function Set

Fuzzy Mining

Fuzzy Association Rules

  • Mining Fuzzy Association Rules and Membership Functions
mining membership functions
Mining Membership Functions

milk

bread

Membership value

Membership value

Low

Middle

High

Low

Middle

High

Quantity

Quantity

0

5

10

15

0

6

12

18

MF3

MF4

MF1

MF2

6

4

4

4

7

4

4

4

8

6

4

4

4

7

4

4

4

8

6

4

4

4

7

4

4

4

8

6

4

4

4

7

4

4

4

8

5, 5

,

10, 5

,

15, 5

6, 6

,

12, 6

,

18, 6

3, 3

,

6, 3

,

9, 3

4, 4

,

8, 4

,

12, 4

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

1

2

3

beverage

R

cookies

R

R

R

R

R

R

R

R

R

R

R

11

12

21

22

23

31

32

41

42

43

13

33

Low

Middle

High

Low

Middle

High

Low

Middle

High

Low

Middle

High

Membership value

Membership value

Low

Middle

High

Low

Middle

High

Quantity

Quantity

0

4

8

12

0

3

6

9

  • Example
fitness function2
Fitness Function

(b)

(a)

Low

Middle

High

Low

Middle

High

0

0

5

20

25

5

8

9

Quantity

Quantity

  • Formally
  • The two bad kinds of membership functions
mining fuzzy association rules
Mining Fuzzy Association Rules
  • Our fuzzy mining algorithm (2001)
    • Trade-off between time complexity and number of rules for fuzzy mining from quantitative data
conclusions
Conclusions
  • Classification Rules
    • A genetic knowledge-integration framework and four knowledge integration methodologies are proposed
      • GKIDSO Approach
      • TPGKI Approach
      • GFKILM Approach
      • GFKIGM Approach
    • Two real-world applications have been developed by our approaches
      • A self-integrating knowledge-based brain tumor diagnostic system
      • A sugar-cane breeding prediction system
conclusions cont
Conclusions (Cont.)
  • Advantages
    • Only a little computation time is needed
    • A large number of rule sets can be effectively integrated
    • It is objective
    • It may find new knowledge
    • Domain experts need not intervene when conflict occurs
conclusions cont1
Conclusions (Cont.)
  • Disadvantages
    • All knowledge sources need pre-process to be represented by rule strings
    • It need collect a set of data to measure the resulting knowledge
    • If the derived knowledge sources are too few, the initial some dummy knowledge sources are inserted into the population
conclusions cont2
Conclusions (Cont.)
  • Fuzzy Association Rules
    • fuzzy Mining + GA-based evolution
future work
Future Work
  • Heterogeneous knowledge representation
  • Vocabulary
ad