Fuzzy rough data mining
Download
1 / 68

Fuzzy-rough data mining - PowerPoint PPT Presentation


  • 185 Views
  • Uploaded on
  • Presentation posted in: General

Fuzzy-rough data mining. Richard Jensen Advanced Reasoning Group University of Aberystwyth rkj@aber.ac.uk http://users.aber.ac.uk/rkj. Outline. Knowledge discovery process Fuzzy-rough methods Feature selection and extensions Instance selection Classification/prediction

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Fuzzy-rough data mining

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Fuzzy-rough data mining

Richard Jensen

Advanced Reasoning Group

University of Aberystwyth

rkj@aber.ac.uk

http://users.aber.ac.uk/rkj


Outline

  • Knowledge discovery process

  • Fuzzy-rough methods

    • Feature selection and extensions

    • Instance selection

    • Classification/prediction

    • Semi-supervised learning


Knowledge discovery

  • The process

  • The problem of too much data

    • Requires storage

    • Intractable for data mining algorithms

    • Noisy or irrelevant data is misleading/confounding


Feature Selection


Feature selection

  • Why dimensionality reduction/feature selection?

  • Growth of information - need to manage this effectively

  • Curse of dimensionality - a problem for machine learning and data mining

  • Data visualisation - graphing data

Intractable

High dimensional

data

Dimensionality

Low dimensional

Reduction

data

Processing System


Why do it?

  • Case 1: We’re interested in features

    • We want to know which are relevant

    • If we fit a model, it should be interpretable

  • Case 2: We’re interested in prediction

    • Features are not interesting in themselves

    • We just want to build a good classifier (or other kind of predictor)


Feature selection process

  • Feature selection (FS) preserves data semantics by selecting rather than transforming

  • Subset generation: forwards, backwards, random…

  • Evaluation function: determines ‘goodness’ of subsets

  • Stopping criterion: decide when to stop subset search

Feature set

Subset

Evaluation

Generation

Subset

suitability

Stopping

Continue

Stop

Validation

Criterion


Fuzzy-rough feature selection


Fuzzy-rough set theory

  • Problems:

    • Rough set methods (usually) require data discretization beforehand

    • Extensions, e.g. tolerance rough sets, require thresholds

    • Also no flexibility in approximations

      • E.g. objects either belong fully to the lower (or upper) approximation, or not at all


Fuzzy-rough sets

Rough set

t-norm

Fuzzy-rough set

implicator


Fuzzy-rough feature selection

  • Based on fuzzy similarity

  • Lower/upper approximations

(e.g.)


FRFS: evaluation function

  • Fuzzy positive region #1

  • Fuzzy positive region #2 (weak)

  • Dependency function


FRFS: finding reducts

  • Fuzzy-rough QuickReduct

    • Evaluation: use the dependency function (or other fuzzy-rough measure)

    • Generation: greedy hill-climbing

    • Stopping criterion: when maximal evaluation function is reached (or to degree α)


FRFS

  • Other search methods

    • GAs, PSO, EDAs, Harmony Search, etc

    • Backward elimination, plus-L minus-R, floating search, SAT, etc

  • Other subset evaluations

    • Fuzzy boundary region

    • Fuzzy entropy

    • Fuzzy discernibility function


Ant-based FS


Boundary region

Upper

Approximation

Set X

Lower

Approximation

Equivalence class [x]B


FRFS: boundary region

  • Fuzzy lower and upper approximation define fuzzy boundary region

  • For each concept, minimise the boundary region

    • (also applicable to crisp RSFS)

  • Results seem to show this is a more informed heuristic (but more computationally complex)


Finding smallest reducts

  • Usually too expensive to search exhaustively for reducts with minimal cardinality

  • Reducts found via discernibility matrices through, e.g.:

    • Converting from CNF to DNF (expensive)

    • Hill-climbing search using clauses (non-optimal)

    • Other search methods - GAs etc (non-optimal)

  • SAT approach

    • Solve directly in SAT formulation

    • DPLL approach ensures optimal reducts


Fuzzy discernibility matrices

  • Extension of crisp approach

    • Previously, attributes had {0,1} membership to clauses

    • Now have membership in [0,1]

  • Fuzzy DMs can be used to find fuzzy-rough reducts


Formulation

  • Fuzzy satisfiability

  • In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true

  • For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true


Example


DPLL algorithm


Experimentation: results


FRFS: issues

  • Problem – noise tolerance!


Vaguely quantified rough sets

y belongs to the lower approximation of A iff allelements of Ry belong to A

y belongs to the upper approximation of A iffat least oneelement of Ry belongs to A

Pawlakrough set

y belongs to the lower approximation of A iffmostelements of Ry belong to A

y belongs to the upper approximation of A iffat least someelements of Ry belong to A

VQRS


VQRS-based feature selection

  • Use the quantified lower approximation, positive region and dependency degree

    • Evaluation: the quantified dependency (can be crisp or fuzzy)

    • Generation: greedy hill-climbing

    • Stopping criterion: when the quantified positive region is maximal (or to degree α)

  • Should be more noise-tolerant, but is non-monotonic


Progress

Qualitative data

Rough set theory

Quantitative data

Fuzzy rough set theory

...

Noisy data

VQRS

Fuzzy VPRS

Monotonic

OWA-FRFS


More issues...

  • Problem #1: how to choose fuzzy similarity?

  • Problem #2: how to handle missing values?


Interval-valued FRFS

IV fuzzy rough set

  • Answer #1: Model uncertainty in fuzzy similarity by interval-valued similarity

IV fuzzy similarity


Interval-valued FRFS

  • When comparing two object values for a given attribute – what to do if at least one is missing?

  • Answer #2: Model missing values via the unit interval


Other measures

  • Boundary region

  • Discernibility function


Initial experimentation

Original Dataset

Cross-validation folds

Type-1 FRFS

Data corruption

IV-FRFS methods

Reduced folds

Reduced folds

JRip

JRip


Initial experimentation


Initial results: lower approx


Instance Selection


Instance selection: basic ideas

Not needed

Remove objects to keep the underlying

approximations unchanged


Instance selection: basic ideas

Noisy objects

Remove objects whose positive region membership is < 1


FRIS-I


FRIS-II


FRIS-III


Fuzzy rough instance selection

  • Time complexity is a problem for FRIS-II and FRIS-III

  • Less complex: Fuzzy rough prototype selection

    • More on this later...


Fuzzy-rough classification and prediction


FRNN/VQNN


FRNN/VQNN


Further developments

  • FRNN and VQNN have limitations (for classification problems)

    • FRNN only uses one neighbour

    • VQNN equivalent to FNN if the same similarity relation is used

  • POSNN uses the positive region to also consider the quality of neighbours

    • E.g. instances in overlapping class regions are less interesting

    • More on this later...


Discovering rules via RST

  • Equivalence classes

    • Form the antecedent part of a rule

    • The lower approximation tells us if this is predictive of a given concept (certain rules)

  • Typically done in one of two ways:

    • Overlaying reducts

    • Building rules by considering individual equivalence classes (e.g. LEM2)


QuickRules framework

  • The fuzzy tolerance classes used during this process can be used to create fuzzy rules

  • When a reduct is found the resulting rules cover all instances

Feature set

Subset

Evaluation and

Generation

Rule Induction

Subset

suitability

Stopping

Continue

Stop

Validation

Criterion


Harmony search approach

  • R. Diao and Q. Shen. A harmony search based approach to hybrid fuzzy-rough rule induction, Proceedings of the 21st International Conference on Fuzzy Systems, 2012.


Harmony search approach

Musicians

Harmony

Fitness

Notes

HarmonyMemory

Minimise ( a – 2 ) 2 + ( b – 3 ) 4 + ( c – 1 ) 2 + 3


Key notion mapping

HarmonySearch

Hybrid RuleInduction

NumericalOptimisation

Musician

Fuzzy rule rx

Variable

Note

Feature subset

Value

Harmony

Rule set

Solution

Fitness

Combined evaluation

Evaluation


Comparison vsQuickRules

HarmonyRules56.33±10.00

QuickRules

63.1±11.89

Rule cardinality distribution for dataset web of 2556 features


Fuzzy-rough semi-supervised learning


Semi-supervised learning (SSL)

  • Lies somewhere between supervised and unsupervised learning

  • Why use it?

    • Data is expensive to label/classify

    • Labels can also be difficult to obtain

    • Large amounts of unlabelled data available

  • When is SSL useful?

    • Small number of labelled objects but large number of unlabelled objects


Semi-supervised learning

  • A number of methods for SSL – self-learning, generative models etc.

    • Labelled data objects – usually small in number

    • Unlabelled data objects – usually large in number

    • A set of features describe the objects

    • Class label tells us only which labelled objects belong to

  • SSL therefore attempts to learn labels (or structure) for data which has no labels

    • Labelled data provides ‘clues’ for the unlabelled data


Co-training

Labelled Dataset

subset 1

subset 2

Unlabelled Data

Learner 1

Learner 2

Predictions

Predictions


Self-learning

Labelled data objects

Labelled Dataset

Learner

Predictions

Unlabelled Data


Fuzzy-rough self learning (FRSL)

  • Basic idea is to propagate labels using the upper and lower approximations

    • Label only those objects which belong to the lower approximation of a class to a high degree

    • Can use upper approximation to decide on ties

  • Attempts to minimise mis-labelling and subsequent reinforcement

  • Paper: N. Mac Parthalain and R. Jensen. Fuzzy-Rough Set based Semi-Supervised Learning. Proceedings of the 20th International Conference on Fuzzy Systems (FUZZ-IEEE’11), pp. 2465-2471, 2011.


FRSL

Labelled dataset

Labelled data objects

Yes

Lower

approximation

membership = 1?

No

Fuzzy-rough learner

Predictions

Unlabelled Data


Experimentation (Problem 1)


SS-FCM


FNN


FRSL


Experimentation (Problem 2)


SS-FCM


FNN


FRSL


Conclusion

  • Looked at fuzzy-rough methods for data mining

    • Feature selection, finding optimal reducts

    • Handling missing values and other problems

    • Classification/prediction

    • Instance selection

    • Semi-supervised learning

  • Future work

    • Imputation, better rule induction and instance selection methods, more semi-supervised methods, optimizations, instance/feature weighting


FR methods in Weka

  • Weka implementations of all fuzzy-rough methods can be downloaded from:

  • KEEL version available soon (hopefully!)

  • http://users.aber.ac.uk/rkj/book/wekafull.jar


ad
  • Login