- 170 Views
- Uploaded on
- Presentation posted in: General

Fuzzy-rough data mining

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Fuzzy-rough data mining

Richard Jensen

Advanced Reasoning Group

University of Aberystwyth

rkj@aber.ac.uk

http://users.aber.ac.uk/rkj

- Knowledge discovery process
- Fuzzy-rough methods
- Feature selection and extensions
- Instance selection
- Classification/prediction
- Semi-supervised learning

- The process
- The problem of too much data
- Requires storage
- Intractable for data mining algorithms
- Noisy or irrelevant data is misleading/confounding

Feature Selection

- Why dimensionality reduction/feature selection?
- Growth of information - need to manage this effectively
- Curse of dimensionality - a problem for machine learning and data mining
- Data visualisation - graphing data

Intractable

High dimensional

data

Dimensionality

Low dimensional

Reduction

data

Processing System

- Case 1: We’re interested in features
- We want to know which are relevant
- If we fit a model, it should be interpretable

- Case 2: We’re interested in prediction
- Features are not interesting in themselves
- We just want to build a good classifier (or other kind of predictor)

- Feature selection (FS) preserves data semantics by selecting rather than transforming
- Subset generation: forwards, backwards, random…
- Evaluation function: determines ‘goodness’ of subsets
- Stopping criterion: decide when to stop subset search

Feature set

Subset

Evaluation

Generation

Subset

suitability

Stopping

Continue

Stop

Validation

Criterion

Fuzzy-rough feature selection

- Problems:
- Rough set methods (usually) require data discretization beforehand
- Extensions, e.g. tolerance rough sets, require thresholds
- Also no flexibility in approximations
- E.g. objects either belong fully to the lower (or upper) approximation, or not at all

Rough set

t-norm

Fuzzy-rough set

implicator

- Based on fuzzy similarity
- Lower/upper approximations

(e.g.)

- Fuzzy positive region #1
- Fuzzy positive region #2 (weak)
- Dependency function

- Fuzzy-rough QuickReduct
- Evaluation: use the dependency function (or other fuzzy-rough measure)
- Generation: greedy hill-climbing
- Stopping criterion: when maximal evaluation function is reached (or to degree α)

- Other search methods
- GAs, PSO, EDAs, Harmony Search, etc
- Backward elimination, plus-L minus-R, floating search, SAT, etc

- Other subset evaluations
- Fuzzy boundary region
- Fuzzy entropy
- Fuzzy discernibility function

Upper

Approximation

Set X

Lower

Approximation

Equivalence class [x]B

- Fuzzy lower and upper approximation define fuzzy boundary region
- For each concept, minimise the boundary region
- (also applicable to crisp RSFS)

- Results seem to show this is a more informed heuristic (but more computationally complex)

- Usually too expensive to search exhaustively for reducts with minimal cardinality
- Reducts found via discernibility matrices through, e.g.:
- Converting from CNF to DNF (expensive)
- Hill-climbing search using clauses (non-optimal)
- Other search methods - GAs etc (non-optimal)

- SAT approach
- Solve directly in SAT formulation
- DPLL approach ensures optimal reducts

- Extension of crisp approach
- Previously, attributes had {0,1} membership to clauses
- Now have membership in [0,1]

- Fuzzy DMs can be used to find fuzzy-rough reducts

- Fuzzy satisfiability
- In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true
- For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true

- Problem – noise tolerance!

y belongs to the lower approximation of A iff allelements of Ry belong to A

y belongs to the upper approximation of A iffat least oneelement of Ry belongs to A

Pawlakrough set

y belongs to the lower approximation of A iffmostelements of Ry belong to A

y belongs to the upper approximation of A iffat least someelements of Ry belong to A

VQRS

- Use the quantified lower approximation, positive region and dependency degree
- Evaluation: the quantified dependency (can be crisp or fuzzy)
- Generation: greedy hill-climbing
- Stopping criterion: when the quantified positive region is maximal (or to degree α)

- Should be more noise-tolerant, but is non-monotonic

Qualitative data

Rough set theory

Quantitative data

Fuzzy rough set theory

...

Noisy data

VQRS

Fuzzy VPRS

Monotonic

OWA-FRFS

- Problem #1: how to choose fuzzy similarity?
- Problem #2: how to handle missing values?

IV fuzzy rough set

- Answer #1: Model uncertainty in fuzzy similarity by interval-valued similarity

IV fuzzy similarity

- When comparing two object values for a given attribute – what to do if at least one is missing?
- Answer #2: Model missing values via the unit interval

- Boundary region
- Discernibility function

Original Dataset

Cross-validation folds

Type-1 FRFS

Data corruption

IV-FRFS methods

Reduced folds

Reduced folds

JRip

JRip

Instance Selection

Not needed

Remove objects to keep the underlying

approximations unchanged

Noisy objects

Remove objects whose positive region membership is < 1

- Time complexity is a problem for FRIS-II and FRIS-III
- Less complex: Fuzzy rough prototype selection
- More on this later...

Fuzzy-rough classification and prediction

- FRNN and VQNN have limitations (for classification problems)
- FRNN only uses one neighbour
- VQNN equivalent to FNN if the same similarity relation is used

- POSNN uses the positive region to also consider the quality of neighbours
- E.g. instances in overlapping class regions are less interesting
- More on this later...

- Equivalence classes
- Form the antecedent part of a rule
- The lower approximation tells us if this is predictive of a given concept (certain rules)

- Typically done in one of two ways:
- Overlaying reducts
- Building rules by considering individual equivalence classes (e.g. LEM2)

- The fuzzy tolerance classes used during this process can be used to create fuzzy rules
- When a reduct is found the resulting rules cover all instances

Feature set

Subset

Evaluation and

Generation

Rule Induction

Subset

suitability

Stopping

Continue

Stop

Validation

Criterion

- R. Diao and Q. Shen. A harmony search based approach to hybrid fuzzy-rough rule induction, Proceedings of the 21st International Conference on Fuzzy Systems, 2012.

Musicians

Harmony

Fitness

Notes

HarmonyMemory

Minimise ( a – 2 ) 2 + ( b – 3 ) 4 + ( c – 1 ) 2 + 3

HarmonySearch

Hybrid RuleInduction

NumericalOptimisation

Musician

Fuzzy rule rx

Variable

Note

Feature subset

Value

Harmony

Rule set

Solution

Fitness

Combined evaluation

Evaluation

HarmonyRules56.33±10.00

QuickRules

63.1±11.89

Rule cardinality distribution for dataset web of 2556 features

Fuzzy-rough semi-supervised learning

- Lies somewhere between supervised and unsupervised learning
- Why use it?
- Data is expensive to label/classify
- Labels can also be difficult to obtain
- Large amounts of unlabelled data available

- When is SSL useful?
- Small number of labelled objects but large number of unlabelled objects

- A number of methods for SSL – self-learning, generative models etc.
- Labelled data objects – usually small in number
- Unlabelled data objects – usually large in number
- A set of features describe the objects
- Class label tells us only which labelled objects belong to

- SSL therefore attempts to learn labels (or structure) for data which has no labels
- Labelled data provides ‘clues’ for the unlabelled data

Labelled Dataset

subset 1

subset 2

Unlabelled Data

Learner 1

Learner 2

Predictions

Predictions

Labelled data objects

Labelled Dataset

Learner

Predictions

Unlabelled Data

- Basic idea is to propagate labels using the upper and lower approximations
- Label only those objects which belong to the lower approximation of a class to a high degree
- Can use upper approximation to decide on ties

- Attempts to minimise mis-labelling and subsequent reinforcement
- Paper: N. Mac Parthalain and R. Jensen. Fuzzy-Rough Set based Semi-Supervised Learning. Proceedings of the 20th International Conference on Fuzzy Systems (FUZZ-IEEE’11), pp. 2465-2471, 2011.

Labelled dataset

Labelled data objects

Yes

Lower

approximation

membership = 1?

No

Fuzzy-rough learner

Predictions

Unlabelled Data

- Looked at fuzzy-rough methods for data mining
- Feature selection, finding optimal reducts
- Handling missing values and other problems
- Classification/prediction
- Instance selection
- Semi-supervised learning

- Future work
- Imputation, better rule induction and instance selection methods, more semi-supervised methods, optimizations, instance/feature weighting

- Weka implementations of all fuzzy-rough methods can be downloaded from:
- KEEL version available soon (hopefully!)

- http://users.aber.ac.uk/rkj/book/wekafull.jar