Knowledge integration by genetic algorithms
Download
1 / 126

Knowledge Integration by Genetic Algorithms - PowerPoint PPT Presentation


  • 118 Views
  • Uploaded on

Knowledge Integration by Genetic Algorithms. Prof. Tzung-Pei Hong Department of Electrical Engineering National University of kaohsiung. Outline. Introduction Review GAs Fuzzy Sets Related Studies Knowledge Integration Strategies Classification Rules Association Rules

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Knowledge Integration by Genetic Algorithms ' - olesia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Knowledge integration by genetic algorithms

Knowledge Integration by Genetic Algorithms

Prof. Tzung-Pei Hong

Department of Electrical Engineering National University of kaohsiung


Outline
Outline

  • Introduction

  • Review

    • GAs

    • Fuzzy Sets

    • Related Studies

  • Knowledge Integration Strategies

    • Classification Rules

    • Association Rules

  • Conclusions


Why knowledge integration
Why Knowledge Integration

  • Four Reasons

1. Knowledge is distributed among sources

Expert System

RB1

RBi

RBn

Integration

2. It Increases reliability

of knowledge-based

systems

GRB

4. Reduce the

effort on

developing an

expert system or decision support system

User Interface

3. Knowledge can be reused


Why using gas
Why Using GAs ?

  • Integration

RB1

RBi

RBn

Integration  must satisfy

1.Completeness 2.Correctness 3.Consistency 4.Conciseness

Multi-objective optimization problem

GAs  finding optimal or nearly optimal solutions


Vague knowledge
Vague Knowledge

  • In Real-World Applications

RB1

RBi

RBn

knowledge sources or data  linguistic or ambiguous information

Vagueness greatly influences

the resulting knowledge base


Benefits
Benefits

  • Medsker [95]

    • Knowledge integrated from different sources has good validity

    • Integrated knowledge can deal with more complex problems

    • Knowledge integration may improve the performance of the knowledge base

    • Integrating would facilitate building bigger and better systems cheaply


Traditional knowledge integration
Traditional Knowledge Integration

  • Problems

    • When conflict occurs

      • Domain experts must intervene in the integration process

    • Subjective

    • Time consuming

    • Limited Integration

      • A small number of knowledge sources

    • more knowledge sources

      • More difficult and complex


Our goals
Our Goals

  • Solve potential conflicts and contradictions

  • Integrate knowledge without human expert’s intervention

  • Improve the integration speed

  • Make the scale of knowledge sources


History of gas
History of GAs

  • GA: Genetic Algorithm

  • History

John Holland

1975

K. A. De Jong

D. E. Goldberg


Idea of ga
Idea of GA

  • Survival of the fittest

  • Iterative Procedure

  • Genetic operators

    • Reproduction

    • Crossover

    • Mutation

  • Near optimal solution


Simple genetic algorithms
Simple Genetic Algorithms

Start

Initialize a

population of individuals

Evaluate each

individual's fitness value

Quit if : 1) Maximum generations are reached

2) Time limit is reached

Select the superior individuals

3) Population is converged

for reproduction

No

Yes

Quit ?

Apply crossover and

perhaps mutation

Evaluate new individual's

fitness value

stop


An example
An Example

  • A Function

    • Find the max


Step1
Step1

  • Define a suitable representation

    • Each Chromosome

      • 12 bits

    • e.g.

      t = 0  000000000000

      t = 1  111111111111

      t = 0.680  101011100001


Step2
Step2

  • Create an initial population of N

    • N  Population size

    • Assume N = 40


Step3
Step3

  • Define a suitable fitness function

    f to evaluate the individuals

    • Fitness function  f(t)

    • e.g. The first six individuals


Step 4
Step 4

  • Perform the crossover and the mutation operations to generate the possible offsprings


Crossover
Crossover

  • Offsprings:

    • Inheriting some characteristics of their parents

  • e.g.

Parent 1 : 00011 0000001

Parent 2 : 01001 1001101

Child 1 : 000111001101

Child 2 : 010010000001


Mutation
Mutation

  • Offsprings

    • possessing different characteristics from their ascendents

    • Preserving a reasonable level of population diversity

  • e.g. Bit change

  • e.g. Inversion

0 1 1 1 0 0 0 0 0 1 0 0

1 1 1 1 0 0 0 0 0 1 0 0

1 1 1 0 1 1 0 0 0 1 0 0

1 1 1 1 0 1 0 0 0 1 0 0


New offsprings
New Offsprings

  • The new offsprings produced by the operators


Step 5
Step 5

  • Replace the individual

  • e.g. The first six individuals

NEW


Step 6
Step 6

  • If the termination criteria are not satisfied, go to Step 4; otherwise, stop the genetic algorithm

    • The termination criteria

      • The maximum number of generations

      • The time limit

      • The population converged



Fuzzy sets
Fuzzy Sets

  • 傳統電腦決策

    • 不是對(1)就是錯(0)

      例如:25歲以上是青年,那26歲就是中年?

      60分以上是及格,那60分以下就是不及格

  • 何謂模糊

    • 在對(1)與錯(0)之間,再多加幾個等級

      • 幾乎對(0.8)

      • 可能對(0.6)

      • 可能錯(0.4)

      • 幾乎錯(0.2)


Fuzzy sets1
Fuzzy Sets

再多分成幾級  連續

  • Question:168公分到底算不算高?

隸屬度

身高(Cm)

160

170

180


Example close to 0
Example:“Close to 0”

  • e.g.

    • μA(3) = 0.01

    • μA(1) = 0.09

    • μA(0.25) = 0.62

    • μA(0) = 1

  • Define a Membership Function:

    μA(x) =


Example close to 01
Example:“Close to 0”

  • Very Close to 0:

    μA(x) =


Fuzzy set cont
Fuzzy Set (Cont.)

0.6 sunny

0.8 sunny

x

0.1 sunny

  • Membership function

    • [0, 1]

  • e.g.

    • sunny : x → [0, 1]


Fuzzy set
Fuzzy Set

Sunny

Not sunny

1 0.8 0.6 0.4 0.2 0

  • Simple

  • Intuitively pleasing

  • A generalization of crisp set

  • Vague  member → non-member

0 or 1

Non-member member

gradual


Fuzzy operations
Fuzzy Operations

  • 交集(AND)

    • 取較小的可能性

      EX:學生聰明(0.8) 而且 用功(0.6) 則是模範生(0.6)

  • 聯集(OR)

    • 取較大的可能性

      EX:學生聰明(0.8) 或者 用功(0.6) 則是模範生(0.8)

  • 反面(NOT)

    • 取與1的差

      EX:學生聰明是0.8, 則學生不聰明0.2


Fuzzy inference example
Fuzzy Inference Example

大眼睛 小嘴巴 身材好

陶晶瑩 0 0.8 0.3

張惠妹 1 0.6 0.8

李 玟 0 0.3 0.9

李心潔 0.7 0.1 0.5

蔡依林 0.8 0.5 0.3

  • 洪老師找小老婆的條件

    • (大眼睛而且小嘴巴)或者是身材好

      Question : 誰是最佳女主角


Answer
Answer

  • 對陶晶瑩= (0 AND 0.8) OR 0.3 = 0 OR 0.3 = 0.3

  • 對張惠妹= (1 AND 0.6) OR 0.8 = 0.8

  • 對李 玟= (0 AND 0.3) OR 0.9 = 0.9

  • 對李心潔= (0.7 AND 0.1) OR 0.5 = 0.5

  • 對蔡依林= (0.8 AND 0.5) OR 0.3 = 0.5

  • 李 玟 為最佳選擇!

  • 謝謝!


    Fuzzy decision
    Fuzzy Decision

    • A = {A1, A2, A3, A4, A5}

      • A set of alternatives

    • C = {C1, C2, C3}

      • A set of criteria


    Example cont
    Example (Cont.)

    • Assume : C1 and C2 or C3

      • E (Ai) : evaluation function

        • E (A1) = (0  0.8)  0.3 = 0  0.3 = 0.3

        • E (A2) = (1  0.6)  0.8 = 0.6  0.8 = 0.8

        • E (A3) = (0  0.3)  0.9 = 0  0.9 = 0.9  the best choice

        • E (A4) = (0.7  0.1)  0.5 = 0.1  0.5 = 0.5

        • E (A5) = (0.8  0.5)  0.3 = 0.5  0.3 = 0.5


    Review of knowledge integration
    Review of Knowledge Integration

    Knowledge

    Integration

    Cooperative

    Approach

    Centralized

    Approach

    Blackboard

    LPC Model

    Integrity

    Constraints

    Repertory

    Grid

    Genetic

    Algorithm

    Decision

    Table


    Ga based classifier systems
    GA-Based Classifier Systems

    GA-Based

    Classifier Systems

    Michigan

    Approach

    Pittsburgh

    Approach

    rule 1

    xxxxxxx....

    rule set 1

    rrrrrrrrr....

    rule 2

    rule set 2

    zzzzzzzzzzzz....

    yyyyyyy....

    nnnnnn....

    rule n

    rule set m

    mmmm.......


    Genetic knowledge integration
    Genetic Knowledge Integration

    Michigan

    Approach

    Pittsburgh

    Approach

    GKIDSO

    Approach

    TPGKI

    Approach

    MGKI

    Approach

    Vague

    Knowledge

    GFKILM

    Approach

    GFKIGM

    Approach

    TPGFKI

    Approach

    MGFKI

    Approach


    Integration of classification rules
    Integration of Classification Rules

    • Four Methods

      • GKIDSO

        • Genetic Knowledge-Integration approach with Domain-Specific Operators

      • TPGKI

        • Two-Phase Genetic Knowledge Integration

      • GFKILM

        • Genetic-Fuzzy Knowledge-Integration with several sets of Local Membership functions

      • GFKIGM

        • Genetic-Fuzzy Knowledge-Integration with a set of Global Membership functions


    Genetic knowledge integration framework
    Genetic Knowledge-Integration Framework

    Training

    Data Set 1

    Training

    Data Set m

    Expert

    Group 1

    Expert

    Group n

    K.A.

    Tool 1

    K.A.

    Tool n

    M.L

    Method 1

    M.L

    Method m

    Rule Set

    Rule Set

    Rule Set

    Rule Set

    Dictionary

    Dictionary

    Dictionary

    Dictionary

    Encoding

    Global Feature

    Set &Class Set

    Intermediary

    representation

    Intermediary

    representation

    Intermediary

    representation

    Intermediary

    representation

    GA-Based

    Knowledge Integration

    Integrating

    Case Set

    Knowledge Base

    Dictionary


    Knowledge Integration

    Rule Set

    Knowledge Input

    Rule Set

    Rule Set

    Knowledge Encoding

    Genetic

    Knowledge Integration

    Knowledge

    Verification

    Knowledge

    Knowledge Decoding

    Integration

    Data Set

    Knowledge Base


    Gkido approach
    GKIDO Approach

    Knowledge integration

    Knowledge encoding

    Generation k

    Generation 0

    Initial population

    Chromosome

    Chromosome

    1

    1

    RS

    1

    Chromosome

    1

    Chromosome

    Chromosome

    2

    2

    Chromosome

    genetic

    RS

    2

    2

    Chromosome

    Chromosome

    3

    3

    Chromosome

    RS

    3

    3

    operators

    Chromosome

    Chromosome

    RS

    m

    Chromosome

    m

    m

    m

    • Genetic Knowledge-Integration approach with Domain-Specific Operators

    • Consists of two parts

      • Encoding

      • Integration


    Knowledge encoding
    Knowledge Encoding

    Rule Set

    Intermediary Rule

    Intermediary Rule

    Fixed-Length Rule String

    Fixed-Length Rule String

    Variable-Length Rule-Set String


    Example brain tumor
    Example: Brain Tumor

    • Two classes: {Adenoma, Meningioma}

    • Three features:

      • {Location, Calcification, Edema}

    • Feature values for Location

      • {brain surface, sellar, brain stem}

    • Feature values for Calcification

      • {no, marginal, vascular-like, lumpy}

    • Feature values for Edema

      • {no, < 2 cm, < 0.5 hemisphere}


    Intermediary rules
    Intermediary Rules

    • Two Rules

      • R1:IF (Location=sellar) and (Calcification=no)

        then Asenoma

      • R2:IF (Location=brain surface) and (Edema< 2cm)

        then Meningioma

    dummy test

    R1:IF(Location=sellar) and (Calcification=no) and

    (Edema= no , or < 2 cm , or < 0.5 hemisphere)

    then Asenoma

    R2:IF(Location=brain surface) and

    (Calcification= no or marginal or vascular-like or lumpy)and

    (Edema< 2cm)

    then Meningioma


    Fixed length rule string
    Fixed-Length Rule String

    Location

    Calcification

    Edema

    Classes

    R1 : 010 1000 111 10

    R2 : 100 1111 010 01

    R1:IF(Location=sellar) and (Calcification=no) and

    (Edema= no , or < 2 cm , or < 0.5 hemisphere)

    then Asenoma

    R2:IF(Location=brain surface) and

    (Calcification= no or marginal or vascular-like or lumpy)and

    (Edema< 2cm)

    then Meningioma


    Knowledge integration
    Knowledge Integration

    Genetic Operation

    Crossover

    Initial Population

    Generation 1

    Mutation

    Rule Set 1

    Fusion

    Rule Set 1

    Fission

    Rule Set 2

    Rule Set 2

    Rule Set n

    Rule Set n

    Fitness Function


    Fitness function
    Fitness Function

    -

    -

    • Formally

    • where

    -  is a control parameter


    Crossover1
    Crossover

    r

    r

    r

    11

    1

    i

    1

    n

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    :

    100

    11

    01

    10

    001

    01001

    0101010

    0010101011

    00

    L

    L

    L

    L

    RS

    1

    4

    2

    4

    3

    1

    {

    cp

    7

    bits

    1

    r

    r

    r

    2

    j

    21

    2

    m

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    :

    0100110011

    00

    11011

    1010101

    1000110011

    01

    L

    L

    L

    RS

    1

    4

    2

    4

    3

    2

    {

    7

    bits

    cp

    2

    crossover

    1010101

    1000110011

    01

    :

    1001101100

    01

    01001

    L

    L

    L

    L

    L

    O

    1

    0101010

    0010101011

    00

    :

    0100110011

    00

    11011

    L

    L

    O

    2


    Fusion
    Fusion

    • Eliminate redundancy and subsumption

      • Redundancy

        • R1: if A then B

        • R2: if A then B

      • Subsumption

        • R1: if A and C then B

        • R2: if A then B


    Fusion cont
    Fusion (Cont.)

    • Eliminate redundancy

    • Eliminate subsumption


    Fission
    Fission

    • Eliminate misclassification and contradiction

      • Misclassification

        • e: (A, C)

          R: if A then B

      • Contradiction

        • R: if A then B or C

          R1: if A then B

          R2: if A then C


    Fission cont
    Fission (Cont.)

    r

    r

    r

    k

    1

    ki

    kn

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    11

    :

    1001101100

    01

    1001001

    100

    0010101011

    00

    L

    L

    L

    L

    RS

    k

    Fission

    "

    r

    r

    r

    I

    k

    1

    ki

    kn

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    01

    10

    :

    1001101100

    01

    1001001

    100

    1001001

    010

    0010101011

    00

    L

    L

    L

    L

    O

    k

    Insert

    • Eliminate misclassification

      • Select the "closest" near-miss rule to the wrong classified test instance for specializing


    Fission cont1
    Fission (Cont.)

    r

    r

    r

    k

    1

    ki

    kn

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    110

    :

    1001101100

    01

    100100110

    0010101011

    00

    L

    L

    L

    L

    RS

    k

    Fission

    1

    2

    r

    r

    r

    r

    k

    1

    ki

    ki

    kn

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    100

    010

    :

    1001101100

    01

    100100110

    100100110

    0010101011

    00

    L

    L

    L

    L

    O

    k

    • Eliminate contradiction


    Experiments breast cancer diagnosis
    Experiments- Breast Cancer Diagnosis

    • Six knowledge sources are integrated

    • 699 cases used in the experiment

      • 524 cases for integrating

      • 175 cases for testing

    • 9 attributes and 2 classes

      • Benign : 458 cases

      • Malignant : 241 cases

    • Each rule is encoded into a bit string of 92 bits long


    Result
    Result

    Generation

    CPU Time

    Accuracy

    Fitness

    0

    1

    6

    23

    39

    32

    56

    59

    80

    97

    100

    00:00:00

    00:00:01

    00:00:05

    00:00:22

    00:00:28

    00:00:31

    00:00:55

    00:00:58

    00:01:19

    00:01:36

    00:01:40

    0.7720

    0.8117

    0.8824

    0.9068

    0.9228

    0.9422

    0.9487

    0.9533

    0.9556

    0.9568

    0.9619

    0.7495

    0.7856

    0.8650

    0.8719

    0.8959

    0.9237

    0.9301

    0.9346

    0.9368

    0.9473

    0.9523


    Result cont
    Result (Cont.)

    correctly

    classification

    classes

    case no.

    misclassification

    unknown

    132

    128

    Benign

    2

    2

    0

    2

    43

    41

    Malignant

    • Test cases: 175



    Application brain tumor diagnosis
    Application- Brain Tumor Diagnosis

    • Ten knowledge sources are integrated

    • 504 actual cases used in the application

      • 378 cases for integrating

      • 126 cases for testing

    • 12 attributes and 6 classes

    • Each rule is encoded into a bit string

      of 105 bits long

    Glioblastoma: 54

    Pituitary Adenoma: 85

    Astrocytoma: 122

    Medulloblastoma: 68

    Meningioma: 119

    Protoplasmic Astrocytoma: 56


    Application brain tumor diagnosis cont
    Application - Brain Tumor Diagnosis (Cont.)

    Generation

    CPU Time

    Accuracy

    Fitness

    0

    150

    300

    450

    600

    750

    900

    1200

    1350

    1500

    1650

    1800

    2000

    00:00:00

    00:19:05

    00:38:24

    00:57:22

    01:16:28

    01:35:31

    01:54:55

    02:32:58

    02:51:19

    03:10:36

    03:29:40

    03:49:31

    04:14:24

    0.7981

    0.8117

    0.8264

    0.8321

    0.8523

    0.8601

    0.8703

    0.8791

    0.8798

    0.8830

    0.8877

    0.8907

    0.9142

    0.5330

    0.5701

    0.6070

    0.6125

    0.6230

    0.6337

    0.6370

    0.6601

    0.6673

    0.6710

    0.7022

    0.7373

    0.7590



    Tpgki approach
    TPGKI Approach

    • TPGKI

      • Two-Phase Genetic Knowledge Integration

    • Consisting of two phases

      • Knowledge integration

      • Knowledge refinement

    • Integrating multiple rule sets by pure genetic operators

    • Domain-specific genetic operators need not intervene in the integration


    Two phases
    Two Phases

    r

    11

    r

    11

    RS

    1

    RS

    1

    RS

    1

    z

    1

    x

    1

    r

    r

    RS

    RS

    RS

    2

    2

    2

    3

    3

    3

    RS

    RS

    RS

    Select

    the best

    m

    m

    m

    RS

    RS

    RS

    m

    r

    1

    r

    m

    1

    Integration

    Integration

    Integration

    r

    r

    mw

    Phase

    my

    Phase

    Phase

    Refinement

    Refinement

    Phase

    Phase

    • Integration phase & Refinement phases


    Knowledge integration phase
    Knowledge-Integration Phase

    Genetic Operation

    Initial Population

    Generation 1

    Crossover

    Mutation

    Rule Set 1

    Rule Set 1

    Rule Set 2

    Rule Set 2

    Rule Set n

    Rule Set n

    Fitness Function


    Knowledge refinement phase
    Knowledge-Refinement Phase

    Genetic

    Operation

    Initial

    Population

    Generation 1

    Crossover

    Rule Set 1

    Rule 1

    Mutation

    Rule 1

    Rule 2

    Rule 2

    Rule Set i

    Rule Set n

    Rule m

    Redundancy

    Fitness Function

    Subsumption

    Contradiction



    Evaluation process
    Evaluation Process

    Let U be the object set

    Sort rules by Accuracy* Necessity

    Fitness=Accuracy*Necssity*Coverage

    Remove

    U=U-

    Empty

    STOP


    Experiments breast cancer diagnosis1
    Experiments- Breast Cancer Diagnosis

    Generation

    CPU Time

    Accuracy

    Fitness

    0

    1

    4

    5

    8

    26

    34

    44

    45

    93

    95

    100

    00:00:00

    00:00:02

    00:00:10

    00:00:13

    00:00:20

    00:01:04

    00:01:25

    00:01:50

    00:01:52

    00:03:51

    00:03:57

    00:04:12

    0.7720

    0.8191

    0.8581

    0.9206

    0.9477

    0.9483

    0.9525

    0.9560

    0.9657

    0.9659

    0.9674

    0.9793

    0.7495

    0.7875

    0.8250

    0.9112

    0.9119

    0.9247

    0.9281

    0.9375

    0.9428

    0.9469

    0.9484

    0.9502



    Application brain tumor diagnosis1
    Application- Brain Tumor Diagnosis

    Generation

    CPU Time

    Accuracy

    Fitness

    00:00:00

    00:31:05

    01:02:13

    01:33:42

    02:04:33

    02:35:31

    03:07:55

    04:09:38

    04:41:03

    05:12:39

    05:43:40

    06:25:31

    06:59:05

    0.7981

    0.8191

    0.8296

    0.8472

    0.8583

    0.8753

    0.8903

    0.8989

    0.9012

    0.9057

    0.9107

    0.9162

    0.9257

    0.5744

    0.5801

    0.6070

    0.7245

    0.8015

    0.8178

    0.8327

    0.8501

    0.8523

    0.8541

    0.8583

    0.8621

    0.8700

    0

    150

    300

    450

    600

    750

    900

    1200

    1350

    1500

    1650

    1800

    2000



    Comparison of gkidso and tpgki
    Comparison of GKIDSO and TPGKI

    Approach

    CPU Time

    Accuracy

    Rule No.

    GKIDSO

    100

    96.10%

    10

    (100 generations)

    TPGKI

    252

    97.93%

    7

    (100 generations)

    Approach

    CPU Time

    Accuracy

    GKIDSO

    15264

    91.42%

    92

    (2000 generations)

    TPGKI

    25145

    92.57%

    86

    (2000 generations)

    • Experiment: Breast Cancer Diagnosis

    • Application: Brian Tumor Diagnosis


    Genetic fuzzy knowledge integration
    Genetic-Fuzzy Knowledge-Integration

    • GFKILM

      • Genetic-Fuzzy Knowledge-Integration with several sets of Local Membership functions

      • Associated with several sets of local membership functions

    • GFKIGM Approach

      • Genetic-Fuzzy Knowledge-Integration with a set of Global Membership functions

      • Associated with a set of global membership functions


    Genetic fuzzy knowledge integration framework
    Genetic-Fuzzy Knowledge-Integration Framework

    Expert

    Expert

    n

    Group 1

    Group

    K.A.

    M.L

    K.A.

    M.L

    Training Set m

    Training Set 1

    Tool 1

    Method 1

    Tool n

    Method m

    Fuzzy

    Fuzzy

    Fuzzy

    Fuzzy

    Rule Set

    Rule Set

    Rule Set

    Rule Set

    Membership

    Membership

    Membership

    Membership

    Functions

    Functions

    Functions

    Functions

    Encoding

    Intermediary

    Intermediary

    Intermediary

    Intermediary

    representation

    representation

    representation

    representation

    Genetic Fuzzy

    Test

    Records

    Integrating

    Knowledge Integration

    objects

    Instances

    Fuzzy Rule Set

    +

    Membership Functions


    Gfkilm approach
    GFKILM Approach

    Knowledge encoding

    Knowledge integration

    Generation k

    Generation 0

    Initial population

    ~

    ~

    Chromosome

    Chromosome

    Chromosome

    +MFS

    R

    S

    1

    1

    1

    1

    1

    ~

    ~

    Chromosome

    Chromosome

    Chromosome

    +MFS

    R

    2

    2

    2

    S

    genetic

    2

    2

    ~

    ~

    Chromosome

    Chromosome

    Chromosome

    3

    3

    3

    +MFS

    R

    S

    3

    3

    operators

    ~

    ~

    Chromosome

    Chromosome

    Chromosome

    +MFS

    R

    S

    m

    m

    m

    m

    m

    • GFKILM approach consists of two parts

      • Encoding

      • Integration


    Knowledge encoding1
    Knowledge Encoding

    Rule Set+MFS

    Intermediary Rule+MFS

    Intermediary Rule+MFS

    Fixed-Length Rule String

    Fixed-Length Rule String

    Associated with MFS

    Associated with MFS

    Variable-Length Rule-Set String

    Associated with MFS


    Examples iris flowers
    Examples: IRIS Flowers

    u(S.W. )

    u(S.L.)

    Medium

    Wide

    Narrow

    Medium

    Long

    Short

    S.L.

    S.W.

    4.3

    5.2

    6.1

    7.9

    7.0

    2.0

    3.8

    2.6

    3.2

    4.4

    花萼長度

    花萼寬度

    u(P.L. )

    u(P.W. )

    Medium

    Long

    Short

    Medium

    Wide

    Narrow

    P.L.

    P.W.

    1.0

    2.4

    3.9

    6.9

    5.4

    01

    1.9

    0.7

    1.3

    2.5

    花瓣長度

    花瓣寬度

    Setosa =1, Versicolor=2, Virginica=3


    Examples
    Examples

    IF P.L.=Short Then Setosa

    Intermediary Representation

    IF S.L.=(Short or Medium or Long) and S.W.=(Narrow or

    Medium or Wide)and P.L.=Short and P.W.=(Narrow or

    Medium or Wide) Then Setosa

    Membership functions + Fuzzy Rules


    Knowledge integration1
    Knowledge Integration

    Genetic Operation

    Crossover

    Initial Population

    Generation 1

    Mutation

    RS1+MFS

    Fusion

    RS1+MFS

    RS2+MFS

    RS2+MFS

    RSn+MFS

    RSn+MFS

    Fitness Function




    Fusion1
    Fusion

    : IF (P.L.=Short) Then Class is Setosa

    : IF (P.L.=Short) Then Class is Setosa



    Fusion subsumption
    Fusion (Subsumption)

    : IF (P.L.=Short) Then Class is Setosa

    : IF (P.L.=Short) and (P.W.=Narrow) Then Class is Setosa


    Fusion subsumption1
    Fusion(subsumption)

    ~

    r

    ki

    6

    4

    4

    4

    4

    4

    4

    4

    7

    4

    4

    4

    4

    4

    4

    4

    8

    S

    .

    L

    .

    S

    .

    W

    .

    P

    .

    L

    .

    P

    .

    W

    .

    Class

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    }

    ~

    ~

    :

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    0

    ,

    1

    ,

    0

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    100

    L

    L

    R

    S

    k

    ~

    r

    kj

    6

    4

    4

    4

    4

    4

    4

    4

    7

    4

    4

    4

    4

    4

    4

    4

    8

    S

    .

    L

    .

    S

    .

    W

    .

    P

    .

    L

    .

    P

    .

    W

    .

    Class

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    }

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    0

    ,

    1

    ,

    0

    1

    ,

    1

    ,

    1

    ,

    0

    ,

    1

    ,

    0

    100

    L

    L

    ~

    r

    ki

    6

    4

    4

    4

    4

    4

    4

    4

    7

    4

    4

    4

    4

    4

    4

    4

    8

    S

    .

    L

    .

    S

    .

    W

    .

    P

    .

    L

    .

    P

    .

    W

    .

    Class

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    }

    ~

    ~

    :

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    0

    ,

    1

    ,

    0

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    100

    L

    L

    R

    S

    k

    ~

    r

    kj

    6

    4

    4

    4

    4

    4

    4

    4

    7

    4

    4

    4

    4

    4

    4

    4

    8

    S

    .

    L

    .

    S

    .

    W

    .

    P

    .

    L

    .

    P

    .

    W

    .

    Class

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    }

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    0

    ,

    1

    ,

    0

    1

    ,

    1

    ,

    1

    ,

    0

    ,

    1

    ,

    0

    100

    L

    L

    ~

    r

    ki

    6

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    7

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    8

    S

    .

    L

    .

    S

    .

    W

    .

    P

    .

    L

    .

    P

    .

    W

    .

    Class

    6

    4

    4

    4

    7

    4

    4

    4

    8

    6

    4

    4

    4

    7

    4

    4

    4

    8

    6

    4

    4

    4

    7

    4

    4

    4

    8

    6

    4

    4

    4

    7

    4

    4

    4

    8

    }

    ~

    ~

    -

    -

    :

    5

    .

    1

    ,

    0

    .

    8

    ,

    6

    .

    0

    ,

    0

    .

    87

    .

    1

    ,

    1

    .

    0

    2

    .

    5

    ,

    0

    .

    7

    ,

    3

    .

    1

    ,

    0

    .

    6

    ,

    3

    .

    9

    ,

    0

    .

    7

    2

    .

    3

    ,

    1

    .

    4

    ,

    3

    .

    7

    ,

    1

    .

    5

    ,

    5

    .

    3

    ,

    1

    .

    6

    0

    .

    7

    ,

    0

    .

    8

    ,

    1

    .

    3

    ,

    0

    .

    7

    ,

    1

    .

    8

    ,

    0

    .

    6

    100

    L

    L

    R

    S

    k

    Fusion


    Experiments hepatitis diagnosis
    Experiments- Hepatitis Diagnosis

    • Ten knowledge sources are integrated

    • 155 cases used in the experiment

    • 19 attributes and 2 classes


    Experiments hepatitis diagnosis1
    Experiments- Hepatitis Diagnosis

    Generation

    CPU Time

    Accuracy

    Fitness

    0

    13

    69

    124

    181

    261

    414

    1401

    2110

    3110

    3550

    3817

    4000

    00:00:00

    00:00:04

    00:00:19

    00:00:35

    00:00:52

    00:01:15

    00:01:55

    00:06:07

    00:09:20

    00:13:42

    00:15:35

    00:16:45

    00:17:36

    0.7688

    0.7844

    0.8132

    0.8328

    0.8432

    0.8525

    0.8688

    0.8876

    0.8949

    0.8965

    0.8977

    0.9183

    0.9290

    0.7537

    0.7690

    0.7972

    0.8164

    0.8266

    0.8357

    0.8517

    0.8701

    0.8773

    0.8789

    0.8800

    0.9002

    0.9107



    Application sugar cane breeding prediction
    Application : Sugar-Cane Breeding Prediction

    • Four knowledge sources are integrated

    • 699 actual cases used in the application

    • 36 attributes and 2 classes


    Application sugar cane breeding prediction1
    Application : Sugar-Cane Breeding Prediction

    Generation

    CPU Time

    Accuracy

    Fitness

    0

    2

    9

    17

    29

    37

    76

    230

    290

    392

    498

    990

    1734

    2052

    2108

    3341

    5000

    00:00:00

    00:00:02

    00:00:08

    00:00:17

    00:00:29

    00:00:37

    00:01:16

    00:03:53

    00:04:53

    00:06:36

    00:08:22

    00:16:36

    00:29:06

    00:34:26

    00:35:22

    00:55:58

    01:23:46

    0.5674

    0.6780

    0.6803

    0.6868

    0.6871

    0.6877

    0.6903

    0.6904

    0.6952

    0.6954

    0.7174

    0.7352

    0.7378

    0.7414

    0.7416

    0.7449

    0.7602

    0.5562

    0.6647

    0.6669

    0.6733

    0.6742

    0.6748

    0.6766

    0.6768

    0.6815

    0.6817

    0.7033

    0.7207

    0.7233

    0.7268

    0.7270

    0.7302

    0.7452

    • Each rule is encoded into a string of 362 units long



    Gfkigm approach
    GFKIGM Approach

    • Genetic-Fuzzy Knowledge-Integration with a set of Global Membership functions

    • Consisting of two parts

      • Knowledge encoding

      • Knowledge integration

    • Generating a fuzzy rule-set associated with a global collection of membership functions for all fuzzy rules


    Knowledge encoding2
    Knowledge Encoding

    Rule Set+MFS

    Intermediary Rule

    MFS String

    Fixed-Length Rule String

    Variable-Length Rule-Set String

    + MFS String


    Examples iris flowers1
    Examples: IRIS Flowers

    : IF P.L.=Short Then Setosa

    : IF P.L.=Long Then Virginica

    : IF P.W.=Medium Then Versicolor

    : IF P.W.=Wide Then Virginica


    Examples iris flowers2
    Examples : IRIS Flowers

    u(S.W. )

    u(S.L.)

    Medium

    Wide

    Narrow

    Medium

    Long

    Short

    S.L.

    S.W.

    4.3

    5.2

    6.1

    7.9

    7.0

    2.0

    3.8

    2.6

    3.2

    4.4

    花萼長度

    花萼寬度

    u(P.L. )

    u(P.W. )

    Medium

    Long

    Short

    Medium

    Wide

    Narrow

    P.L.

    P.W.

    1.0

    2.4

    3.9

    6.9

    5.4

    01

    1.9

    0.7

    1.3

    2.5

    花瓣長度

    花瓣寬度

    Setosa =1, Versicolor=2, Virginica=3


    Examples iris flowers3
    Examples : IRIS Flowers

    IF P.L.=Short Then Setosa

    Intermediary Representation

    IF S.L.=(Short or Medium or Long) and S.W.=(Narrow or

    Medium or Wide)and P.L.=Short and P.W.=(Narrow or

    Medium or Wide) Then Setosa

    Rule String


    Examples iris flowers cont
    Examples : IRIS Flowers (Cont.)

    : IF P.L.=Short Then Setosa

    : IF P.L.=Long Then Virginica

    : IF P.W.=Medium Then Versicolor

    : IF P.W.=Wide Then Virginica


    Knowledge integration2
    Knowledge Integration

    Genetic Operation

    Crossover

    Generation 1

    Initial Population

    Mutation

    RS1+MFS

    Fusion

    RS1+MFS

    RS2+MFS

    RS2+MFS

    RSn+MFS

    RSn+MFS

    Fitness Function




    Fusion3
    Fusion

    : IF (P.L.=Short) Then Class is Setosa

    : IF (P.L.=Short) Then Class is Setosa


    Fusion subsumption2
    Fusion (Subsumption)

    : IF (P.L.=Short) Then Class is Setosa

    : IF (P.L.=Short) and (P.W.=Narrow) Then Class is Setosa


    Experiments hepatitis diagnosis2
    Experiments- Hepatitis Diagnosis

    Generation

    CPU Time

    Accuracy

    Fitness

    0

    4

    34

    160

    473

    570

    1057

    1495

    1791

    2251

    2580

    2710

    3062

    3342

    3756

    3847

    4000

    00:00:00

    00:00:02

    00:00:10

    00:00:45

    00:02:14

    00:02:40

    00:04:51

    00:06:46

    00:08:03

    00:10:02

    00:11:27

    00:12:02

    00:13:13

    00:14:49

    00:16:44

    00:17:09

    00:17:51

    0.7688

    0.7867

    0.8228

    0.8450

    0.8542

    0.8554

    0.8578

    0.8633

    0.8656

    0.8721

    0.8837

    0.8895

    0.8910

    0.9049

    0.9056

    0.9068

    0.9161

    0.7573

    0.7712

    0.8066

    0.8284

    0.8374

    0.8386

    0.8409

    0.8463

    0.8486

    0.8550

    0.8663

    0.8720

    0.8735

    0.8871

    0.8878

    0.8890

    0.8981



    Application sugar cane breeding prediction2
    Application : Sugar-Cane Breeding Prediction

    Generation

    CPU Time

    Accuracy

    Fitness

    0.5345

    0.6530

    0.6603

    0.6608

    0.6664

    0.6665

    0.6720

    0.6726

    0.6741

    0.6944

    0.6991

    0.7041

    0.7055

    0.7082

    0.7147

    0.7266

    0

    2

    3

    9

    13

    16

    17

    227

    308

    493

    1386

    1637

    2924

    3151

    3300

    5000

    00:00:00

    00:00:02

    00:00:03

    00:00:08

    00:00:12

    00:00:15

    00:01:16

    00:03:47

    00:03:53

    00:08:15

    00:23:14

    00:27:27

    00:49:04

    00:52:52

    00:56:22

    01:24:37

    0.5506

    0.6726

    0.6802

    0.6807

    0.6864

    0.6868

    0.6922

    0.6928

    0.6944

    0.7153

    0.7201

    0.7253

    0.7267

    0.7295

    0.7362

    0.7485

    • Each knowledge source is encoded into a string of 542 units long



    Comparison of gfkilm and gfkigm
    Comparison of GFKILM and GFKIGM

    Approach

    CPU Time

    Accuracy

    Rule No.

    GFKILM

    1056

    92.90%

    4

    (4000 generations)

    GFKIGM

    1071

    91.61%

    4

    (4000 generations)

    Approach

    CPU Time

    Accuracy

    Rule No.

    GFKILM

    5026

    76.02%

    2

    (5000 generations)

    GFKIGM

    5077

    74.85%

    2

    (5000 generations)

    • Experiment: Hepatitis Diagnosis

    • Application: Sugar-Cane Breeding Prediction


    ROADMAP

    Michigan

    Approach

    Pittsburgh

    Approach

    GKIDSO

    Approach

    TPGKI

    Approach

    MGKI

    Approach

    Vague

    Knowledge

    GFKILM

    Approach

    GFKIGM

    Approach

    TPGFKI

    Approach

    MGFKI

    Approach


    Why data mining
    Why Data Mining?

    Supermarket

    Commodities

    Simon

    if one customer buys milk

    then he is likely to buy bread, so...


    Mining association rules
    Mining Association Rules

    Milk

    Bread

    IF bread is bought then milk is bought


    The role of data mining
    The Role of Data Mining

    Useful patterns

    Knowledge and strategy

    Preprocess data


    Mining steps
    Mining steps

    • Step1:Define minsup and minconf

      ex: minsup=50%

      minconf=50%

    • Step2:Find large itemsets

    • Step3:Generate association rules


    Example
    Example

    Large itemsets

    Scan

    Database

    L

    1

    Itemset

    Sup.

    {A}

    2

    {B}

    3

    {C}

    3

    {E}

    3

    Scan

    Database

    Scan

    Database



    Integrating mined knowledge
    Integrating Mined Knowledge

    If customers buy B and C, then they will buy D .

    If customers buy A, then they will buy B.

    A  B

    B, C  D

    A, C  E

    .

    .

    .

    Branch 1

    Branch 2

    If customers buy A and C, then they will buy E .

    Headquarter

    Branch 3

    • Association Rules


    Integration of association rules
    Integration of Association Rules

    ...

    DB2

    DB1

    DBn

    AB→C

    A→D

    B→E

    AB→C

    A→D

    B→E

    ...

    AB→C

    A→D

    B→E

    RD1

    RD2

    RDn

    GRB

    • Synthesizing High-Frequency Rules

      • Weighting

      • Ranking

    • Xindong Wu and Shichao Zhang (2003)

      • Synthesizing High-Frequency Rules fromDifferent Data Sources

        • Known data sources


    Integration of association rules cont
    Integration of Association Rules (Cont.)

    Internet

    journals

    books

    Web

    X→Y

    conf=0.7

    X→Y

    conf=0.72

    X→Y

    conf=0.68

    • Synthesizing

      • clustering method

    X→Y

    conf=?

    • Xindong Wu and Shichao Zhang (2003)

      • Synthesizing High-Frequency Rules fromDifferent Data Sources

        • Unknown data sources


    Integration of association rules1
    Integration of Association Rules

    Transaction database n

    Transaction database i

    Transaction database 1

    Data Mining Method

    Data Mining Method

    Data Mining Method

    Fuzzy

    Fuzzy

    Fuzzy

    Rule Set i

    Rule Set 1

    Rule Set n

    Membership

    Membership

    Membership

    Functions i

    Functions n

    Functions 1

    Encoding

    Intermediary

    Intermediary

    Intermediary

    representation

    representation

    representation

    Integration

    Genetic Fuzzy

    Knowledge Integration

    Sample Data

    Fuzzy Rule Set

    +

    Membership Functions

    • Framework


    Data mining method
    Data Mining Method

    linguistic terms

    Minimum support

    Minimum confidence

    Mining Membership Functions

    Membership

    Membership

    Membership

    Membership

    Function Set2

    Function Set3

    Function Setq

    Function Set1

    Population

    Chromosome1

    Chromosome3

    Chromosomeq

    Chromosome2

    PC

    Transaction

    Genetic Fuzzy

    Database

    MF Acquisition process

    Fuzzy Mining

    for Large 1-itemsets

    Mining Fuzzy Association Rules

    Final Membership

    Function Set

    Fuzzy Mining

    Fuzzy Association Rules

    • Mining Fuzzy Association Rules and Membership Functions


    Mining membership functions
    Mining Membership Functions

    milk

    bread

    Membership value

    Membership value

    Low

    Middle

    High

    Low

    Middle

    High

    Quantity

    Quantity

    0

    5

    10

    15

    0

    6

    12

    18

    MF3

    MF4

    MF1

    MF2

    6

    4

    4

    4

    7

    4

    4

    4

    8

    6

    4

    4

    4

    7

    4

    4

    4

    8

    6

    4

    4

    4

    7

    4

    4

    4

    8

    6

    4

    4

    4

    7

    4

    4

    4

    8

    5, 5

    ,

    10, 5

    ,

    15, 5

    6, 6

    ,

    12, 6

    ,

    18, 6

    3, 3

    ,

    6, 3

    ,

    9, 3

    4, 4

    ,

    8, 4

    ,

    12, 4

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    beverage

    R

    cookies

    R

    R

    R

    R

    R

    R

    R

    R

    R

    R

    R

    11

    12

    21

    22

    23

    31

    32

    41

    42

    43

    13

    33

    Low

    Middle

    High

    Low

    Middle

    High

    Low

    Middle

    High

    Low

    Middle

    High

    Membership value

    Membership value

    Low

    Middle

    High

    Low

    Middle

    High

    Quantity

    Quantity

    0

    4

    8

    12

    0

    3

    6

    9

    • Example


    Fitness function2
    Fitness Function

    (b)

    (a)

    Low

    Middle

    High

    Low

    Middle

    High

    0

    0

    5

    20

    25

    5

    8

    9

    Quantity

    Quantity

    • Formally

    • The two bad kinds of membership functions


    Mining fuzzy association rules
    Mining Fuzzy Association Rules

    • Our fuzzy mining algorithm (2001)

      • Trade-off between time complexity and number of rules for fuzzy mining from quantitative data


    Conclusions
    Conclusions

    • Classification Rules

      • A genetic knowledge-integration framework and four knowledge integration methodologies are proposed

        • GKIDSO Approach

        • TPGKI Approach

        • GFKILM Approach

        • GFKIGM Approach

      • Two real-world applications have been developed by our approaches

        • A self-integrating knowledge-based brain tumor diagnostic system

        • A sugar-cane breeding prediction system


    Conclusions cont
    Conclusions (Cont.)

    • Advantages

      • Only a little computation time is needed

      • A large number of rule sets can be effectively integrated

      • It is objective

      • It may find new knowledge

      • Domain experts need not intervene when conflict occurs


    Conclusions cont1
    Conclusions (Cont.)

    • Disadvantages

      • All knowledge sources need pre-process to be represented by rule strings

      • It need collect a set of data to measure the resulting knowledge

      • If the derived knowledge sources are too few, the initial some dummy knowledge sources are inserted into the population


    Conclusions cont2
    Conclusions (Cont.)

    • Fuzzy Association Rules

      • fuzzy Mining + GA-based evolution


    Future work
    Future Work

    • Heterogeneous knowledge representation

    • Vocabulary



    ad