Knowledge integration by genetic algorithms
This presentation is the property of its rightful owner.
Sponsored Links
1 / 126

Knowledge Integration by Genetic Algorithms PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on
  • Presentation posted in: General

Knowledge Integration by Genetic Algorithms. Prof. Tzung-Pei Hong Department of Electrical Engineering National University of kaohsiung. Outline. Introduction Review GAs Fuzzy Sets Related Studies Knowledge Integration Strategies Classification Rules Association Rules

Download Presentation

Knowledge Integration by Genetic Algorithms

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Knowledge integration by genetic algorithms

Knowledge Integration by Genetic Algorithms

Prof. Tzung-Pei Hong

Department of Electrical Engineering National University of kaohsiung


Outline

Outline

  • Introduction

  • Review

    • GAs

    • Fuzzy Sets

    • Related Studies

  • Knowledge Integration Strategies

    • Classification Rules

    • Association Rules

  • Conclusions


Why knowledge integration

Why Knowledge Integration

  • Four Reasons

1. Knowledge is distributed among sources

Expert System

RB1

RBi

RBn

Integration

2. It Increases reliability

of knowledge-based

systems

GRB

4. Reduce the

effort on

developing an

expert system or decision support system

User Interface

3. Knowledge can be reused


Why using gas

Why Using GAs ?

  • Integration

RB1

RBi

RBn

Integration  must satisfy

1.Completeness 2.Correctness 3.Consistency 4.Conciseness

Multi-objective optimization problem

GAs  finding optimal or nearly optimal solutions


Vague knowledge

Vague Knowledge

  • In Real-World Applications

RB1

RBi

RBn

knowledge sources or data  linguistic or ambiguous information

Vagueness greatly influences

the resulting knowledge base


Benefits

Benefits

  • Medsker [95]

    • Knowledge integrated from different sources has good validity

    • Integrated knowledge can deal with more complex problems

    • Knowledge integration may improve the performance of the knowledge base

    • Integrating would facilitate building bigger and better systems cheaply


Traditional knowledge integration

Traditional Knowledge Integration

  • Problems

    • When conflict occurs

      • Domain experts must intervene in the integration process

    • Subjective

    • Time consuming

    • Limited Integration

      • A small number of knowledge sources

    • more knowledge sources

      • More difficult and complex


Our goals

Our Goals

  • Solve potential conflicts and contradictions

  • Integrate knowledge without human expert’s intervention

  • Improve the integration speed

  • Make the scale of knowledge sources


History of gas

History of GAs

  • GA: Genetic Algorithm

  • History

John Holland

1975

K. A. De Jong

D. E. Goldberg


Idea of ga

Idea of GA

  • Survival of the fittest

  • Iterative Procedure

  • Genetic operators

    • Reproduction

    • Crossover

    • Mutation

  • Near optimal solution


Simple genetic algorithms

Simple Genetic Algorithms

Start

Initialize a

population of individuals

Evaluate each

individual's fitness value

Quit if : 1) Maximum generations are reached

2) Time limit is reached

Select the superior individuals

3) Population is converged

for reproduction

No

Yes

Quit ?

Apply crossover and

perhaps mutation

Evaluate new individual's

fitness value

stop


An example

An Example

  • A Function

    • Find the max


Step1

Step1

  • Define a suitable representation

    • Each Chromosome

      • 12 bits

    • e.g.

      t = 0  000000000000

      t = 1  111111111111

      t = 0.680  101011100001


Step2

Step2

  • Create an initial population of N

    • N  Population size

    • Assume N = 40


Step3

Step3

  • Define a suitable fitness function

    f to evaluate the individuals

    • Fitness function  f(t)

    • e.g. The first six individuals


Step 4

Step 4

  • Perform the crossover and the mutation operations to generate the possible offsprings


Crossover

Crossover

  • Offsprings:

    • Inheriting some characteristics of their parents

  • e.g.

Parent 1 : 00011 0000001

Parent 2 : 01001 1001101

Child 1 : 000111001101

Child 2 : 010010000001


Mutation

Mutation

  • Offsprings

    • possessing different characteristics from their ascendents

    • Preserving a reasonable level of population diversity

  • e.g. Bit change

  • e.g. Inversion

0 1 1 1 0 0 0 0 0 1 0 0

1 1 1 1 0 0 0 0 0 1 0 0

1 1 1 0 1 1 0 0 0 1 0 0

1 1 1 1 0 1 0 0 0 1 0 0


New offsprings

New Offsprings

  • The new offsprings produced by the operators


Step 5

Step 5

  • Replace the individual

  • e.g. The first six individuals

NEW


Step 6

Step 6

  • If the termination criteria are not satisfied, go to Step 4; otherwise, stop the genetic algorithm

    • The termination criteria

      • The maximum number of generations

      • The time limit

      • The population converged


Experiment

Experiment


Fuzzy sets

Fuzzy Sets

  • 傳統電腦決策

    • 不是對(1)就是錯(0)

      例如:25歲以上是青年,那26歲就是中年?

      60分以上是及格,那60分以下就是不及格

  • 何謂模糊

    • 在對(1)與錯(0)之間,再多加幾個等級

      • 幾乎對(0.8)

      • 可能對(0.6)

      • 可能錯(0.4)

      • 幾乎錯(0.2)


Fuzzy sets1

Fuzzy Sets

再多分成幾級  連續

  • Question:168公分到底算不算高?

隸屬度

身高(Cm)

160

170

180


Example close to 0

Example:“Close to 0”

  • e.g.

    • μA(3) = 0.01

    • μA(1) = 0.09

    • μA(0.25) = 0.62

    • μA(0) = 1

  • Define a Membership Function:

    μA(x) =


Example close to 01

Example:“Close to 0”

  • Very Close to 0:

    μA(x) =


Fuzzy set cont

Fuzzy Set (Cont.)

0.6 sunny

0.8 sunny

x

0.1 sunny

  • Membership function

    • [0, 1]

  • e.g.

    • sunny : x → [0, 1]


Fuzzy set

Fuzzy Set

Sunny

Not sunny

1 0.8 0.6 0.4 0.2 0

  • Simple

  • Intuitively pleasing

  • A generalization of crisp set

  • Vague  member → non-member

0 or 1

Non-member member

gradual


Fuzzy operations

Fuzzy Operations

  • 交集(AND)

    • 取較小的可能性

      EX:學生聰明(0.8) 而且 用功(0.6) 則是模範生(0.6)

  • 聯集(OR)

    • 取較大的可能性

      EX:學生聰明(0.8) 或者 用功(0.6) 則是模範生(0.8)

  • 反面(NOT)

    • 取與1的差

      EX:學生聰明是0.8, 則學生不聰明0.2


Fuzzy inference example

Fuzzy Inference Example

大眼睛小嘴巴身材好

陶晶瑩00.80.3

張惠妹10.60.8

李 玟00.30.9

李心潔0.70.10.5

蔡依林0.80.50.3

  • 洪老師找小老婆的條件

    • (大眼睛而且小嘴巴)或者是身材好

      Question : 誰是最佳女主角


Answer

Answer

  • 對陶晶瑩= (0 AND 0.8) OR 0.3 = 0 OR 0.3 = 0.3

  • 對張惠妹= (1 AND 0.6) OR 0.8 = 0.8

  • 對李 玟= (0 AND 0.3) OR 0.9 = 0.9

  • 對李心潔= (0.7 AND 0.1) OR 0.5 = 0.5

  • 對蔡依林= (0.8 AND 0.5) OR 0.3 = 0.5

  • 李 玟 為最佳選擇!

  • 謝謝!


    Fuzzy decision

    Fuzzy Decision

    • A = {A1, A2, A3, A4, A5}

      • A set of alternatives

    • C = {C1, C2, C3}

      • A set of criteria


    Example cont

    Example (Cont.)

    • Assume : C1 and C2 or C3

      • E (Ai) : evaluation function

        • E (A1) = (0  0.8)  0.3 = 0  0.3 = 0.3

        • E (A2) = (1  0.6)  0.8 = 0.6  0.8 = 0.8

        • E (A3) = (0  0.3)  0.9 = 0  0.9 = 0.9  the best choice

        • E (A4) = (0.7  0.1)  0.5 = 0.1  0.5 = 0.5

        • E (A5) = (0.8  0.5)  0.3 = 0.5  0.3 = 0.5


    Review of knowledge integration

    Review of Knowledge Integration

    Knowledge

    Integration

    Cooperative

    Approach

    Centralized

    Approach

    Blackboard

    LPC Model

    Integrity

    Constraints

    Repertory

    Grid

    Genetic

    Algorithm

    Decision

    Table


    Ga based classifier systems

    GA-Based Classifier Systems

    GA-Based

    Classifier Systems

    Michigan

    Approach

    Pittsburgh

    Approach

    rule 1

    xxxxxxx....

    rule set 1

    rrrrrrrrr....

    rule 2

    rule set 2

    zzzzzzzzzzzz....

    yyyyyyy....

    nnnnnn....

    rule n

    rule set m

    mmmm.......


    Genetic knowledge integration

    Genetic Knowledge Integration

    Michigan

    Approach

    Pittsburgh

    Approach

    GKIDSO

    Approach

    TPGKI

    Approach

    MGKI

    Approach

    Vague

    Knowledge

    GFKILM

    Approach

    GFKIGM

    Approach

    TPGFKI

    Approach

    MGFKI

    Approach


    Integration of classification rules

    Integration of Classification Rules

    • Four Methods

      • GKIDSO

        • Genetic Knowledge-Integration approach with Domain-Specific Operators

      • TPGKI

        • Two-Phase Genetic Knowledge Integration

      • GFKILM

        • Genetic-Fuzzy Knowledge-Integration with several sets of Local Membership functions

      • GFKIGM

        • Genetic-Fuzzy Knowledge-Integration with a set of Global Membership functions


    Genetic knowledge integration framework

    Genetic Knowledge-Integration Framework

    Training

    Data Set 1

    Training

    Data Set m

    Expert

    Group 1

    Expert

    Group n

    K.A.

    Tool 1

    K.A.

    Tool n

    M.L

    Method 1

    M.L

    Method m

    Rule Set

    Rule Set

    Rule Set

    Rule Set

    Dictionary

    Dictionary

    Dictionary

    Dictionary

    Encoding

    Global Feature

    Set &Class Set

    Intermediary

    representation

    Intermediary

    representation

    Intermediary

    representation

    Intermediary

    representation

    GA-Based

    Knowledge Integration

    Integrating

    Case Set

    Knowledge Base

    Dictionary


    Knowledge integration by genetic algorithms

    Knowledge Integration

    Rule Set

    Knowledge Input

    Rule Set

    Rule Set

    Knowledge Encoding

    Genetic

    Knowledge Integration

    Knowledge

    Verification

    Knowledge

    Knowledge Decoding

    Integration

    Data Set

    Knowledge Base


    Gkido approach

    GKIDO Approach

    Knowledge integration

    Knowledge encoding

    Generation k

    Generation 0

    Initial population

    Chromosome

    Chromosome

    1

    1

    RS

    1

    Chromosome

    1

    Chromosome

    Chromosome

    2

    2

    Chromosome

    genetic

    RS

    2

    2

    Chromosome

    Chromosome

    3

    3

    Chromosome

    RS

    3

    3

    operators

    Chromosome

    Chromosome

    RS

    m

    Chromosome

    m

    m

    m

    • Genetic Knowledge-Integration approach with Domain-Specific Operators

    • Consists of two parts

      • Encoding

      • Integration


    Knowledge encoding

    Knowledge Encoding

    Rule Set

    Intermediary Rule

    Intermediary Rule

    Fixed-Length Rule String

    Fixed-Length Rule String

    Variable-Length Rule-Set String


    Example brain tumor

    Example: Brain Tumor

    • Two classes: {Adenoma, Meningioma}

    • Three features:

      • {Location, Calcification, Edema}

    • Feature values for Location

      • {brain surface, sellar, brain stem}

    • Feature values for Calcification

      • {no, marginal, vascular-like, lumpy}

    • Feature values for Edema

      • {no, < 2 cm, < 0.5 hemisphere}


    Intermediary rules

    Intermediary Rules

    • Two Rules

      • R1:IF (Location=sellar) and (Calcification=no)

        then Asenoma

      • R2:IF (Location=brain surface) and (Edema< 2cm)

        then Meningioma

    dummy test

    R1:IF(Location=sellar) and (Calcification=no) and

    (Edema= no , or < 2 cm , or < 0.5 hemisphere)

    then Asenoma

    R2:IF(Location=brain surface) and

    (Calcification= no or marginal or vascular-like or lumpy)and

    (Edema< 2cm)

    then Meningioma


    Fixed length rule string

    Fixed-Length Rule String

    Location

    Calcification

    Edema

    Classes

    R1 : 010 1000 111 10

    R2 : 100 1111 010 01

    R1:IF(Location=sellar) and (Calcification=no) and

    (Edema= no , or < 2 cm , or < 0.5 hemisphere)

    then Asenoma

    R2:IF(Location=brain surface) and

    (Calcification= no or marginal or vascular-like or lumpy)and

    (Edema< 2cm)

    then Meningioma


    Knowledge integration

    Knowledge Integration

    Genetic Operation

    Crossover

    Initial Population

    Generation 1

    Mutation

    Rule Set 1

    Fusion

    Rule Set 1

    Fission

    Rule Set 2

    Rule Set 2

    Rule Set n

    Rule Set n

    Fitness Function


    Fitness function

    Fitness Function

    -

    -

    • Formally

    • where

    -  is a control parameter


    Crossover1

    Crossover

    r

    r

    r

    11

    1

    i

    1

    n

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    :

    100

    11

    01

    10

    001

    01001

    0101010

    0010101011

    00

    L

    L

    L

    L

    RS

    1

    4

    2

    4

    3

    1

    {

    cp

    7

    bits

    1

    r

    r

    r

    2

    j

    21

    2

    m

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    :

    0100110011

    00

    11011

    1010101

    1000110011

    01

    L

    L

    L

    RS

    1

    4

    2

    4

    3

    2

    {

    7

    bits

    cp

    2

    crossover

    1010101

    1000110011

    01

    :

    1001101100

    01

    01001

    L

    L

    L

    L

    L

    O

    1

    0101010

    0010101011

    00

    :

    0100110011

    00

    11011

    L

    L

    O

    2


    Fusion

    Fusion

    • Eliminate redundancy and subsumption

      • Redundancy

        • R1: if A then B

        • R2: if A then B

      • Subsumption

        • R1: if A and C then B

        • R2: if A then B


    Fusion cont

    Fusion (Cont.)

    • Eliminate redundancy

    • Eliminate subsumption


    Fission

    Fission

    • Eliminate misclassification and contradiction

      • Misclassification

        • e: (A, C)

          R: if A then B

      • Contradiction

        • R: if A then B or C

          R1: if A then B

          R2: if A then C


    Fission cont

    Fission (Cont.)

    r

    r

    r

    k

    1

    ki

    kn

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    11

    :

    1001101100

    01

    1001001

    100

    0010101011

    00

    L

    L

    L

    L

    RS

    k

    Fission

    "

    r

    r

    r

    I

    k

    1

    ki

    kn

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    01

    10

    :

    1001101100

    01

    1001001

    100

    1001001

    010

    0010101011

    00

    L

    L

    L

    L

    O

    k

    Insert

    • Eliminate misclassification

      • Select the "closest" near-miss rule to the wrong classified test instance for specializing


    Fission cont1

    Fission (Cont.)

    r

    r

    r

    k

    1

    ki

    kn

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    110

    :

    1001101100

    01

    100100110

    0010101011

    00

    L

    L

    L

    L

    RS

    k

    Fission

    1

    2

    r

    r

    r

    r

    k

    1

    ki

    ki

    kn

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    6

    4

    4

    7

    4

    4

    8

    100

    010

    :

    1001101100

    01

    100100110

    100100110

    0010101011

    00

    L

    L

    L

    L

    O

    k

    • Eliminate contradiction


    Experiments breast cancer diagnosis

    Experiments- Breast Cancer Diagnosis

    • Six knowledge sources are integrated

    • 699 cases used in the experiment

      • 524 cases for integrating

      • 175 cases for testing

    • 9 attributes and 2 classes

      • Benign : 458 cases

      • Malignant : 241 cases

    • Each rule is encoded into a bit string of 92 bits long


    Result

    Result

    Generation

    CPU Time

    Accuracy

    Fitness

    0

    1

    6

    23

    39

    32

    56

    59

    80

    97

    100

    00:00:00

    00:00:01

    00:00:05

    00:00:22

    00:00:28

    00:00:31

    00:00:55

    00:00:58

    00:01:19

    00:01:36

    00:01:40

    0.7720

    0.8117

    0.8824

    0.9068

    0.9228

    0.9422

    0.9487

    0.9533

    0.9556

    0.9568

    0.9619

    0.7495

    0.7856

    0.8650

    0.8719

    0.8959

    0.9237

    0.9301

    0.9346

    0.9368

    0.9473

    0.9523


    Result cont

    Result (Cont.)

    correctly

    classification

    classes

    case no.

    misclassification

    unknown

    132

    128

    Benign

    2

    2

    0

    2

    43

    41

    Malignant

    • Test cases: 175


    Knowledge integration by genetic algorithms

    Experiments- Breast Cancer Diagnosis


    Application brain tumor diagnosis

    Application- Brain Tumor Diagnosis

    • Ten knowledge sources are integrated

    • 504 actual cases used in the application

      • 378 cases for integrating

      • 126 cases for testing

    • 12 attributes and 6 classes

    • Each rule is encoded into a bit string

      of 105 bits long

    Glioblastoma: 54

    Pituitary Adenoma: 85

    Astrocytoma: 122

    Medulloblastoma: 68

    Meningioma: 119

    Protoplasmic Astrocytoma: 56


    Application brain tumor diagnosis cont

    Application - Brain Tumor Diagnosis (Cont.)

    Generation

    CPU Time

    Accuracy

    Fitness

    0

    150

    300

    450

    600

    750

    900

    1200

    1350

    1500

    1650

    1800

    2000

    00:00:00

    00:19:05

    00:38:24

    00:57:22

    01:16:28

    01:35:31

    01:54:55

    02:32:58

    02:51:19

    03:10:36

    03:29:40

    03:49:31

    04:14:24

    0.7981

    0.8117

    0.8264

    0.8321

    0.8523

    0.8601

    0.8703

    0.8791

    0.8798

    0.8830

    0.8877

    0.8907

    0.9142

    0.5330

    0.5701

    0.6070

    0.6125

    0.6230

    0.6337

    0.6370

    0.6601

    0.6673

    0.6710

    0.7022

    0.7373

    0.7590


    Knowledge integration by genetic algorithms

    Application - Brain Tumor Diagnosis


    Tpgki approach

    TPGKI Approach

    • TPGKI

      • Two-Phase Genetic Knowledge Integration

    • Consisting of two phases

      • Knowledge integration

      • Knowledge refinement

    • Integrating multiple rule sets by pure genetic operators

    • Domain-specific genetic operators need not intervene in the integration


    Two phases

    Two Phases

    r

    11

    r

    11

    RS

    1

    RS

    1

    RS

    1

    z

    1

    x

    1

    r

    r

    RS

    RS

    RS

    2

    2

    2

    3

    3

    3

    RS

    RS

    RS

    Select

    the best

    m

    m

    m

    RS

    RS

    RS

    m

    r

    1

    r

    m

    1

    Integration

    Integration

    Integration

    r

    r

    mw

    Phase

    my

    Phase

    Phase

    Refinement

    Refinement

    Phase

    Phase

    • Integration phase & Refinement phases


    Knowledge integration phase

    Knowledge-Integration Phase

    Genetic Operation

    Initial Population

    Generation 1

    Crossover

    Mutation

    Rule Set 1

    Rule Set 1

    Rule Set 2

    Rule Set 2

    Rule Set n

    Rule Set n

    Fitness Function


    Knowledge refinement phase

    Knowledge-Refinement Phase

    Genetic

    Operation

    Initial

    Population

    Generation 1

    Crossover

    Rule Set 1

    Rule 1

    Mutation

    Rule 1

    Rule 2

    Rule 2

    Rule Set i

    Rule Set n

    Rule m

    Redundancy

    Fitness Function

    Subsumption

    Contradiction


    Fitness function1

    Fitness Function


    Evaluation process

    Evaluation Process

    Let U be the object set

    Sort rules by Accuracy* Necessity

    Fitness=Accuracy*Necssity*Coverage

    Remove

    U=U-

    Empty

    STOP


    Experiments breast cancer diagnosis1

    Experiments- Breast Cancer Diagnosis

    Generation

    CPU Time

    Accuracy

    Fitness

    0

    1

    4

    5

    8

    26

    34

    44

    45

    93

    95

    100

    00:00:00

    00:00:02

    00:00:10

    00:00:13

    00:00:20

    00:01:04

    00:01:25

    00:01:50

    00:01:52

    00:03:51

    00:03:57

    00:04:12

    0.7720

    0.8191

    0.8581

    0.9206

    0.9477

    0.9483

    0.9525

    0.9560

    0.9657

    0.9659

    0.9674

    0.9793

    0.7495

    0.7875

    0.8250

    0.9112

    0.9119

    0.9247

    0.9281

    0.9375

    0.9428

    0.9469

    0.9484

    0.9502


    Knowledge integration by genetic algorithms

    Experiments- Breast Cancer Diagnosis


    Application brain tumor diagnosis1

    Application- Brain Tumor Diagnosis

    Generation

    CPU Time

    Accuracy

    Fitness

    00:00:00

    00:31:05

    01:02:13

    01:33:42

    02:04:33

    02:35:31

    03:07:55

    04:09:38

    04:41:03

    05:12:39

    05:43:40

    06:25:31

    06:59:05

    0.7981

    0.8191

    0.8296

    0.8472

    0.8583

    0.8753

    0.8903

    0.8989

    0.9012

    0.9057

    0.9107

    0.9162

    0.9257

    0.5744

    0.5801

    0.6070

    0.7245

    0.8015

    0.8178

    0.8327

    0.8501

    0.8523

    0.8541

    0.8583

    0.8621

    0.8700

    0

    150

    300

    450

    600

    750

    900

    1200

    1350

    1500

    1650

    1800

    2000


    Knowledge integration by genetic algorithms

    Application - Brain Tumor Diagnosis


    Comparison of gkidso and tpgki

    Comparison of GKIDSO and TPGKI

    Approach

    CPU Time

    Accuracy

    Rule No.

    GKIDSO

    100

    96.10%

    10

    (100 generations)

    TPGKI

    252

    97.93%

    7

    (100 generations)

    Approach

    CPU Time

    Accuracy

    GKIDSO

    15264

    91.42%

    92

    (2000 generations)

    TPGKI

    25145

    92.57%

    86

    (2000 generations)

    • Experiment: Breast Cancer Diagnosis

    • Application: Brian Tumor Diagnosis


    Genetic fuzzy knowledge integration

    Genetic-Fuzzy Knowledge-Integration

    • GFKILM

      • Genetic-Fuzzy Knowledge-Integration with several sets of Local Membership functions

      • Associated with several sets of local membership functions

    • GFKIGM Approach

      • Genetic-Fuzzy Knowledge-Integration with a set of Global Membership functions

      • Associated with a set of global membership functions


    Genetic fuzzy knowledge integration framework

    Genetic-Fuzzy Knowledge-Integration Framework

    Expert

    Expert

    n

    Group 1

    Group

    K.A.

    M.L

    K.A.

    M.L

    Training Set m

    Training Set 1

    Tool 1

    Method 1

    Tool n

    Method m

    Fuzzy

    Fuzzy

    Fuzzy

    Fuzzy

    Rule Set

    Rule Set

    Rule Set

    Rule Set

    Membership

    Membership

    Membership

    Membership

    Functions

    Functions

    Functions

    Functions

    Encoding

    Intermediary

    Intermediary

    Intermediary

    Intermediary

    representation

    representation

    representation

    representation

    Genetic Fuzzy

    Test

    Records

    Integrating

    Knowledge Integration

    objects

    Instances

    Fuzzy Rule Set

    +

    Membership Functions


    Gfkilm approach

    GFKILM Approach

    Knowledge encoding

    Knowledge integration

    Generation k

    Generation 0

    Initial population

    ~

    ~

    Chromosome

    Chromosome

    Chromosome

    +MFS

    R

    S

    1

    1

    1

    1

    1

    ~

    ~

    Chromosome

    Chromosome

    Chromosome

    +MFS

    R

    2

    2

    2

    S

    genetic

    2

    2

    ~

    ~

    Chromosome

    Chromosome

    Chromosome

    3

    3

    3

    +MFS

    R

    S

    3

    3

    operators

    ~

    ~

    Chromosome

    Chromosome

    Chromosome

    +MFS

    R

    S

    m

    m

    m

    m

    m

    • GFKILM approach consists of two parts

      • Encoding

      • Integration


    Knowledge encoding1

    Knowledge Encoding

    Rule Set+MFS

    Intermediary Rule+MFS

    Intermediary Rule+MFS

    Fixed-Length Rule String

    Fixed-Length Rule String

    Associated with MFS

    Associated with MFS

    Variable-Length Rule-Set String

    Associated with MFS


    Examples iris flowers

    Examples: IRIS Flowers

    u(S.W. )

    u(S.L.)

    Medium

    Wide

    Narrow

    Medium

    Long

    Short

    S.L.

    S.W.

    4.3

    5.2

    6.1

    7.9

    7.0

    2.0

    3.8

    2.6

    3.2

    4.4

    花萼長度

    花萼寬度

    u(P.L. )

    u(P.W. )

    Medium

    Long

    Short

    Medium

    Wide

    Narrow

    P.L.

    P.W.

    1.0

    2.4

    3.9

    6.9

    5.4

    01

    1.9

    0.7

    1.3

    2.5

    花瓣長度

    花瓣寬度

    Setosa =1, Versicolor=2, Virginica=3


    Examples

    Examples

    IF P.L.=Short Then Setosa

    Intermediary Representation

    IF S.L.=(Short or Medium or Long) and S.W.=(Narrow or

    Medium or Wide)and P.L.=Short and P.W.=(Narrow or

    Medium or Wide) Then Setosa

    Membership functions + Fuzzy Rules


    Knowledge integration1

    Knowledge Integration

    Genetic Operation

    Crossover

    Initial Population

    Generation 1

    Mutation

    RS1+MFS

    Fusion

    RS1+MFS

    RS2+MFS

    RS2+MFS

    RSn+MFS

    RSn+MFS

    Fitness Function


    Crossover2

    Crossover


    Mutation1

    Mutation


    Fusion1

    Fusion

    : IF (P.L.=Short) Then Class is Setosa

    : IF (P.L.=Short) Then Class is Setosa


    Fusion2

    Fusion


    Fusion subsumption

    Fusion (Subsumption)

    : IF (P.L.=Short) Then Class is Setosa

    : IF (P.L.=Short) and (P.W.=Narrow) Then Class is Setosa


    Fusion subsumption1

    Fusion(subsumption)

    ~

    r

    ki

    6

    4

    4

    4

    4

    4

    4

    4

    7

    4

    4

    4

    4

    4

    4

    4

    8

    S

    .

    L

    .

    S

    .

    W

    .

    P

    .

    L

    .

    P

    .

    W

    .

    Class

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    }

    ~

    ~

    :

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    0

    ,

    1

    ,

    0

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    100

    L

    L

    R

    S

    k

    ~

    r

    kj

    6

    4

    4

    4

    4

    4

    4

    4

    7

    4

    4

    4

    4

    4

    4

    4

    8

    S

    .

    L

    .

    S

    .

    W

    .

    P

    .

    L

    .

    P

    .

    W

    .

    Class

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    }

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    0

    ,

    1

    ,

    0

    1

    ,

    1

    ,

    1

    ,

    0

    ,

    1

    ,

    0

    100

    L

    L

    ~

    r

    ki

    6

    4

    4

    4

    4

    4

    4

    4

    7

    4

    4

    4

    4

    4

    4

    4

    8

    S

    .

    L

    .

    S

    .

    W

    .

    P

    .

    L

    .

    P

    .

    W

    .

    Class

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    }

    ~

    ~

    :

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    0

    ,

    1

    ,

    0

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    100

    L

    L

    R

    S

    k

    ~

    r

    kj

    6

    4

    4

    4

    4

    4

    4

    4

    7

    4

    4

    4

    4

    4

    4

    4

    8

    S

    .

    L

    .

    S

    .

    W

    .

    P

    .

    L

    .

    P

    .

    W

    .

    Class

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    6

    4

    7

    4

    8

    }

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    ,

    1

    1

    ,

    1

    ,

    1

    ,

    0

    ,

    1

    ,

    0

    1

    ,

    1

    ,

    1

    ,

    0

    ,

    1

    ,

    0

    100

    L

    L

    ~

    r

    ki

    6

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    7

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    4

    8

    S

    .

    L

    .

    S

    .

    W

    .

    P

    .

    L

    .

    P

    .

    W

    .

    Class

    6

    4

    4

    4

    7

    4

    4

    4

    8

    6

    4

    4

    4

    7

    4

    4

    4

    8

    6

    4

    4

    4

    7

    4

    4

    4

    8

    6

    4

    4

    4

    7

    4

    4

    4

    8

    }

    ~

    ~

    -

    -

    :

    5

    .

    1

    ,

    0

    .

    8

    ,

    6

    .

    0

    ,

    0

    .

    87

    .

    1

    ,

    1

    .

    0

    2

    .

    5

    ,

    0

    .

    7

    ,

    3

    .

    1

    ,

    0

    .

    6

    ,

    3

    .

    9

    ,

    0

    .

    7

    2

    .

    3

    ,

    1

    .

    4

    ,

    3

    .

    7

    ,

    1

    .

    5

    ,

    5

    .

    3

    ,

    1

    .

    6

    0

    .

    7

    ,

    0

    .

    8

    ,

    1

    .

    3

    ,

    0

    .

    7

    ,

    1

    .

    8

    ,

    0

    .

    6

    100

    L

    L

    R

    S

    k

    Fusion


    Experiments hepatitis diagnosis

    Experiments- Hepatitis Diagnosis

    • Ten knowledge sources are integrated

    • 155 cases used in the experiment

    • 19 attributes and 2 classes


    Experiments hepatitis diagnosis1

    Experiments- Hepatitis Diagnosis

    Generation

    CPU Time

    Accuracy

    Fitness

    0

    13

    69

    124

    181

    261

    414

    1401

    2110

    3110

    3550

    3817

    4000

    00:00:00

    00:00:04

    00:00:19

    00:00:35

    00:00:52

    00:01:15

    00:01:55

    00:06:07

    00:09:20

    00:13:42

    00:15:35

    00:16:45

    00:17:36

    0.7688

    0.7844

    0.8132

    0.8328

    0.8432

    0.8525

    0.8688

    0.8876

    0.8949

    0.8965

    0.8977

    0.9183

    0.9290

    0.7537

    0.7690

    0.7972

    0.8164

    0.8266

    0.8357

    0.8517

    0.8701

    0.8773

    0.8789

    0.8800

    0.9002

    0.9107


    Knowledge integration by genetic algorithms

    Experiments- Hepatitis Diagnosis


    Application sugar cane breeding prediction

    Application : Sugar-Cane Breeding Prediction

    • Four knowledge sources are integrated

    • 699 actual cases used in the application

    • 36 attributes and 2 classes


    Application sugar cane breeding prediction1

    Application : Sugar-Cane Breeding Prediction

    Generation

    CPU Time

    Accuracy

    Fitness

    0

    2

    9

    17

    29

    37

    76

    230

    290

    392

    498

    990

    1734

    2052

    2108

    3341

    5000

    00:00:00

    00:00:02

    00:00:08

    00:00:17

    00:00:29

    00:00:37

    00:01:16

    00:03:53

    00:04:53

    00:06:36

    00:08:22

    00:16:36

    00:29:06

    00:34:26

    00:35:22

    00:55:58

    01:23:46

    0.5674

    0.6780

    0.6803

    0.6868

    0.6871

    0.6877

    0.6903

    0.6904

    0.6952

    0.6954

    0.7174

    0.7352

    0.7378

    0.7414

    0.7416

    0.7449

    0.7602

    0.5562

    0.6647

    0.6669

    0.6733

    0.6742

    0.6748

    0.6766

    0.6768

    0.6815

    0.6817

    0.7033

    0.7207

    0.7233

    0.7268

    0.7270

    0.7302

    0.7452

    • Each rule is encoded into a string of 362 units long


    Knowledge integration by genetic algorithms

    Application:Sugar-Cane Breeding Prediction


    Gfkigm approach

    GFKIGM Approach

    • Genetic-Fuzzy Knowledge-Integration with a set of Global Membership functions

    • Consisting of two parts

      • Knowledge encoding

      • Knowledge integration

    • Generating a fuzzy rule-set associated with a global collection of membership functions for all fuzzy rules


    Knowledge encoding2

    Knowledge Encoding

    Rule Set+MFS

    Intermediary Rule

    MFS String

    Fixed-Length Rule String

    Variable-Length Rule-Set String

    + MFS String


    Examples iris flowers1

    Examples: IRIS Flowers

    : IF P.L.=Short Then Setosa

    : IF P.L.=Long Then Virginica

    : IF P.W.=Medium Then Versicolor

    : IF P.W.=Wide Then Virginica


    Examples iris flowers2

    Examples : IRIS Flowers

    u(S.W. )

    u(S.L.)

    Medium

    Wide

    Narrow

    Medium

    Long

    Short

    S.L.

    S.W.

    4.3

    5.2

    6.1

    7.9

    7.0

    2.0

    3.8

    2.6

    3.2

    4.4

    花萼長度

    花萼寬度

    u(P.L. )

    u(P.W. )

    Medium

    Long

    Short

    Medium

    Wide

    Narrow

    P.L.

    P.W.

    1.0

    2.4

    3.9

    6.9

    5.4

    01

    1.9

    0.7

    1.3

    2.5

    花瓣長度

    花瓣寬度

    Setosa =1, Versicolor=2, Virginica=3


    Examples iris flowers3

    Examples : IRIS Flowers

    IF P.L.=Short Then Setosa

    Intermediary Representation

    IF S.L.=(Short or Medium or Long) and S.W.=(Narrow or

    Medium or Wide)and P.L.=Short and P.W.=(Narrow or

    Medium or Wide) Then Setosa

    Rule String


    Examples iris flowers cont

    Examples : IRIS Flowers (Cont.)

    : IF P.L.=Short Then Setosa

    : IF P.L.=Long Then Virginica

    : IF P.W.=Medium Then Versicolor

    : IF P.W.=Wide Then Virginica


    Knowledge integration2

    Knowledge Integration

    Genetic Operation

    Crossover

    Generation 1

    Initial Population

    Mutation

    RS1+MFS

    Fusion

    RS1+MFS

    RS2+MFS

    RS2+MFS

    RSn+MFS

    RSn+MFS

    Fitness Function


    Crossover3

    Crossover


    Mutation2

    Mutation


    Fusion3

    Fusion

    : IF (P.L.=Short) Then Class is Setosa

    : IF (P.L.=Short) Then Class is Setosa


    Fusion subsumption2

    Fusion (Subsumption)

    : IF (P.L.=Short) Then Class is Setosa

    : IF (P.L.=Short) and (P.W.=Narrow) Then Class is Setosa


    Experiments hepatitis diagnosis2

    Experiments- Hepatitis Diagnosis

    Generation

    CPU Time

    Accuracy

    Fitness

    0

    4

    34

    160

    473

    570

    1057

    1495

    1791

    2251

    2580

    2710

    3062

    3342

    3756

    3847

    4000

    00:00:00

    00:00:02

    00:00:10

    00:00:45

    00:02:14

    00:02:40

    00:04:51

    00:06:46

    00:08:03

    00:10:02

    00:11:27

    00:12:02

    00:13:13

    00:14:49

    00:16:44

    00:17:09

    00:17:51

    0.7688

    0.7867

    0.8228

    0.8450

    0.8542

    0.8554

    0.8578

    0.8633

    0.8656

    0.8721

    0.8837

    0.8895

    0.8910

    0.9049

    0.9056

    0.9068

    0.9161

    0.7573

    0.7712

    0.8066

    0.8284

    0.8374

    0.8386

    0.8409

    0.8463

    0.8486

    0.8550

    0.8663

    0.8720

    0.8735

    0.8871

    0.8878

    0.8890

    0.8981


    Knowledge integration by genetic algorithms

    Experiments- Hepatitis Diagnosis


    Application sugar cane breeding prediction2

    Application : Sugar-Cane Breeding Prediction

    Generation

    CPU Time

    Accuracy

    Fitness

    0.5345

    0.6530

    0.6603

    0.6608

    0.6664

    0.6665

    0.6720

    0.6726

    0.6741

    0.6944

    0.6991

    0.7041

    0.7055

    0.7082

    0.7147

    0.7266

    0

    2

    3

    9

    13

    16

    17

    227

    308

    493

    1386

    1637

    2924

    3151

    3300

    5000

    00:00:00

    00:00:02

    00:00:03

    00:00:08

    00:00:12

    00:00:15

    00:01:16

    00:03:47

    00:03:53

    00:08:15

    00:23:14

    00:27:27

    00:49:04

    00:52:52

    00:56:22

    01:24:37

    0.5506

    0.6726

    0.6802

    0.6807

    0.6864

    0.6868

    0.6922

    0.6928

    0.6944

    0.7153

    0.7201

    0.7253

    0.7267

    0.7295

    0.7362

    0.7485

    • Each knowledge source is encoded into a string of 542 units long


    Knowledge integration by genetic algorithms

    Application:Sugar-Cane Breeding Prediction


    Comparison of gfkilm and gfkigm

    Comparison of GFKILM and GFKIGM

    Approach

    CPU Time

    Accuracy

    Rule No.

    GFKILM

    1056

    92.90%

    4

    (4000 generations)

    GFKIGM

    1071

    91.61%

    4

    (4000 generations)

    Approach

    CPU Time

    Accuracy

    Rule No.

    GFKILM

    5026

    76.02%

    2

    (5000 generations)

    GFKIGM

    5077

    74.85%

    2

    (5000 generations)

    • Experiment: Hepatitis Diagnosis

    • Application: Sugar-Cane Breeding Prediction


    Knowledge integration by genetic algorithms

    ROADMAP

    Michigan

    Approach

    Pittsburgh

    Approach

    GKIDSO

    Approach

    TPGKI

    Approach

    MGKI

    Approach

    Vague

    Knowledge

    GFKILM

    Approach

    GFKIGM

    Approach

    TPGFKI

    Approach

    MGFKI

    Approach


    Why data mining

    Why Data Mining?

    Supermarket

    Commodities

    Simon

    if one customer buys milk

    then he is likely to buy bread, so...


    Mining association rules

    Mining Association Rules

    Milk

    Bread

    IF bread is bought then milk is bought


    The role of data mining

    The Role of Data Mining

    Useful patterns

    Knowledge and strategy

    Preprocess data


    Mining steps

    Mining steps

    • Step1:Define minsup and minconf

      ex: minsup=50%

      minconf=50%

    • Step2:Find large itemsets

    • Step3:Generate association rules


    Example

    Example

    Large itemsets

    Scan

    Database

    L

    1

    Itemset

    Sup.

    {A}

    2

    {B}

    3

    {C}

    3

    {E}

    3

    Scan

    Database

    Scan

    Database


    Example1

    Example


    Integrating mined knowledge

    Integrating Mined Knowledge

    If customers buy B and C, then they will buy D .

    If customers buy A, then they will buy B.

    A  B

    B, C  D

    A, C  E

    .

    .

    .

    Branch 1

    Branch 2

    If customers buy A and C, then they will buy E .

    Headquarter

    Branch 3

    • Association Rules


    Integration of association rules

    Integration of Association Rules

    ...

    DB2

    DB1

    DBn

    AB→C

    A→D

    B→E

    AB→C

    A→D

    B→E

    ...

    AB→C

    A→D

    B→E

    RD1

    RD2

    RDn

    GRB

    • Synthesizing High-Frequency Rules

      • Weighting

      • Ranking

    • Xindong Wu and Shichao Zhang (2003)

      • Synthesizing High-Frequency Rules fromDifferent Data Sources

        • Known data sources


    Integration of association rules cont

    Integration of Association Rules (Cont.)

    Internet

    journals

    books

    Web

    X→Y

    conf=0.7

    X→Y

    conf=0.72

    X→Y

    conf=0.68

    • Synthesizing

      • clustering method

    X→Y

    conf=?

    • Xindong Wu and Shichao Zhang (2003)

      • Synthesizing High-Frequency Rules fromDifferent Data Sources

        • Unknown data sources


    Integration of association rules1

    Integration of Association Rules

    Transaction database n

    Transaction database i

    Transaction database 1

    Data Mining Method

    Data Mining Method

    Data Mining Method

    Fuzzy

    Fuzzy

    Fuzzy

    Rule Set i

    Rule Set 1

    Rule Set n

    Membership

    Membership

    Membership

    Functions i

    Functions n

    Functions 1

    Encoding

    Intermediary

    Intermediary

    Intermediary

    representation

    representation

    representation

    Integration

    Genetic Fuzzy

    Knowledge Integration

    Sample Data

    Fuzzy Rule Set

    +

    Membership Functions

    • Framework


    Data mining method

    Data Mining Method

    linguistic terms

    Minimum support

    Minimum confidence

    Mining Membership Functions

    Membership

    Membership

    Membership

    Membership

    Function Set2

    Function Set3

    Function Setq

    Function Set1

    Population

    Chromosome1

    Chromosome3

    Chromosomeq

    Chromosome2

    PC

    Transaction

    Genetic Fuzzy

    Database

    MF Acquisition process

    Fuzzy Mining

    for Large 1-itemsets

    Mining Fuzzy Association Rules

    Final Membership

    Function Set

    Fuzzy Mining

    Fuzzy Association Rules

    • Mining Fuzzy Association Rules and Membership Functions


    Mining membership functions

    Mining Membership Functions

    milk

    bread

    Membership value

    Membership value

    Low

    Middle

    High

    Low

    Middle

    High

    Quantity

    Quantity

    0

    5

    10

    15

    0

    6

    12

    18

    MF3

    MF4

    MF1

    MF2

    6

    4

    4

    4

    7

    4

    4

    4

    8

    6

    4

    4

    4

    7

    4

    4

    4

    8

    6

    4

    4

    4

    7

    4

    4

    4

    8

    6

    4

    4

    4

    7

    4

    4

    4

    8

    5, 5

    ,

    10, 5

    ,

    15, 5

    6, 6

    ,

    12, 6

    ,

    18, 6

    3, 3

    ,

    6, 3

    ,

    9, 3

    4, 4

    ,

    8, 4

    ,

    12, 4

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    1

    2

    3

    beverage

    R

    cookies

    R

    R

    R

    R

    R

    R

    R

    R

    R

    R

    R

    11

    12

    21

    22

    23

    31

    32

    41

    42

    43

    13

    33

    Low

    Middle

    High

    Low

    Middle

    High

    Low

    Middle

    High

    Low

    Middle

    High

    Membership value

    Membership value

    Low

    Middle

    High

    Low

    Middle

    High

    Quantity

    Quantity

    0

    4

    8

    12

    0

    3

    6

    9

    • Example


    Fitness function2

    Fitness Function

    (b)

    (a)

    Low

    Middle

    High

    Low

    Middle

    High

    0

    0

    5

    20

    25

    5

    8

    9

    Quantity

    Quantity

    • Formally

    • The two bad kinds of membership functions


    Mining fuzzy association rules

    Mining Fuzzy Association Rules

    • Our fuzzy mining algorithm (2001)

      • Trade-off between time complexity and number of rules for fuzzy mining from quantitative data


    Conclusions

    Conclusions

    • Classification Rules

      • A genetic knowledge-integration framework and four knowledge integration methodologies are proposed

        • GKIDSO Approach

        • TPGKI Approach

        • GFKILM Approach

        • GFKIGM Approach

      • Two real-world applications have been developed by our approaches

        • A self-integrating knowledge-based brain tumor diagnostic system

        • A sugar-cane breeding prediction system


    Conclusions cont

    Conclusions (Cont.)

    • Advantages

      • Only a little computation time is needed

      • A large number of rule sets can be effectively integrated

      • It is objective

      • It may find new knowledge

      • Domain experts need not intervene when conflict occurs


    Conclusions cont1

    Conclusions (Cont.)

    • Disadvantages

      • All knowledge sources need pre-process to be represented by rule strings

      • It need collect a set of data to measure the resulting knowledge

      • If the derived knowledge sources are too few, the initial some dummy knowledge sources are inserted into the population


    Conclusions cont2

    Conclusions (Cont.)

    • Fuzzy Association Rules

      • fuzzy Mining + GA-based evolution


    Future work

    Future Work

    • Heterogeneous knowledge representation

    • Vocabulary


    Thank you

    Thank You


  • Login