- 206 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Fuzzy-rough data mining' - bonner

Download Now**An Image/Link below is provided (as is) to download presentation**

Download Now

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Fuzzy-rough data mining

Richard Jensen

Advanced Reasoning Group

University of Aberystwyth

rkj@aber.ac.uk

http://users.aber.ac.uk/rkj

Outline

- Knowledge discovery process
- Fuzzy-rough methods
- Feature selection and extensions
- Instance selection
- Classification/prediction
- Semi-supervised learning

Knowledge discovery

- The process
- The problem of too much data
- Requires storage
- Intractable for data mining algorithms
- Noisy or irrelevant data is misleading/confounding

Feature selection

- Why dimensionality reduction/feature selection?
- Growth of information - need to manage this effectively
- Curse of dimensionality - a problem for machine learning and data mining
- Data visualisation - graphing data

Intractable

High dimensional

data

Dimensionality

Low dimensional

Reduction

data

Processing System

Why do it?

- Case 1: We’re interested in features
- We want to know which are relevant
- If we fit a model, it should be interpretable
- Case 2: We’re interested in prediction
- Features are not interesting in themselves
- We just want to build a good classifier (or other kind of predictor)

Feature selection process

- Feature selection (FS) preserves data semantics by selecting rather than transforming
- Subset generation: forwards, backwards, random…
- Evaluation function: determines ‘goodness’ of subsets
- Stopping criterion: decide when to stop subset search

Feature set

Subset

Evaluation

Generation

Subset

suitability

Stopping

Continue

Stop

Validation

Criterion

Fuzzy-rough set theory

- Problems:
- Rough set methods (usually) require data discretization beforehand
- Extensions, e.g. tolerance rough sets, require thresholds
- Also no flexibility in approximations
- E.g. objects either belong fully to the lower (or upper) approximation, or not at all

FRFS: evaluation function

- Fuzzy positive region #1
- Fuzzy positive region #2 (weak)
- Dependency function

FRFS: finding reducts

- Fuzzy-rough QuickReduct
- Evaluation: use the dependency function (or other fuzzy-rough measure)
- Generation: greedy hill-climbing
- Stopping criterion: when maximal evaluation function is reached (or to degree α)

FRFS

- Other search methods
- GAs, PSO, EDAs, Harmony Search, etc
- Backward elimination, plus-L minus-R, floating search, SAT, etc
- Other subset evaluations
- Fuzzy boundary region
- Fuzzy entropy
- Fuzzy discernibility function

FRFS: boundary region

- Fuzzy lower and upper approximation define fuzzy boundary region
- For each concept, minimise the boundary region
- (also applicable to crisp RSFS)
- Results seem to show this is a more informed heuristic (but more computationally complex)

Finding smallest reducts

- Usually too expensive to search exhaustively for reducts with minimal cardinality
- Reducts found via discernibility matrices through, e.g.:
- Converting from CNF to DNF (expensive)
- Hill-climbing search using clauses (non-optimal)
- Other search methods - GAs etc (non-optimal)
- SAT approach
- Solve directly in SAT formulation
- DPLL approach ensures optimal reducts

Fuzzy discernibility matrices

- Extension of crisp approach
- Previously, attributes had {0,1} membership to clauses
- Now have membership in [0,1]
- Fuzzy DMs can be used to find fuzzy-rough reducts

Formulation

- Fuzzy satisfiability
- In crisp SAT, a clause is fully satisfied if at least one variable in the clause has been set to true
- For the fuzzy case, clauses may be satisfied to a certain degree depending on which variables have been assigned the value true

FRFS: issues

- Problem – noise tolerance!

Vaguely quantified rough sets

y belongs to the lower approximation of A iff allelements of Ry belong to A

y belongs to the upper approximation of A iffat least oneelement of Ry belongs to A

Pawlakrough set

y belongs to the lower approximation of A iffmostelements of Ry belong to A

y belongs to the upper approximation of A iffat least someelements of Ry belong to A

VQRS

VQRS-based feature selection

- Use the quantified lower approximation, positive region and dependency degree
- Evaluation: the quantified dependency (can be crisp or fuzzy)
- Generation: greedy hill-climbing
- Stopping criterion: when the quantified positive region is maximal (or to degree α)
- Should be more noise-tolerant, but is non-monotonic

Progress

Qualitative data

Rough set theory

Quantitative data

Fuzzy rough set theory

...

Noisy data

VQRS

Fuzzy VPRS

Monotonic

OWA-FRFS

More issues...

- Problem #1: how to choose fuzzy similarity?
- Problem #2: how to handle missing values?

Interval-valued FRFS

IV fuzzy rough set

- Answer #1: Model uncertainty in fuzzy similarity by interval-valued similarity

IV fuzzy similarity

Interval-valued FRFS

- When comparing two object values for a given attribute – what to do if at least one is missing?
- Answer #2: Model missing values via the unit interval

Other measures

- Boundary region
- Discernibility function

Initial experimentation

Original Dataset

Cross-validation folds

Type-1 FRFS

Data corruption

IV-FRFS methods

Reduced folds

Reduced folds

JRip

JRip

Instance selection: basic ideas

Not needed

Remove objects to keep the underlying

approximations unchanged

Fuzzy rough instance selection

- Time complexity is a problem for FRIS-II and FRIS-III
- Less complex: Fuzzy rough prototype selection
- More on this later...

Further developments

- FRNN and VQNN have limitations (for classification problems)
- FRNN only uses one neighbour
- VQNN equivalent to FNN if the same similarity relation is used
- POSNN uses the positive region to also consider the quality of neighbours
- E.g. instances in overlapping class regions are less interesting
- More on this later...

Discovering rules via RST

- Equivalence classes
- Form the antecedent part of a rule
- The lower approximation tells us if this is predictive of a given concept (certain rules)
- Typically done in one of two ways:
- Overlaying reducts
- Building rules by considering individual equivalence classes (e.g. LEM2)

QuickRules framework

- The fuzzy tolerance classes used during this process can be used to create fuzzy rules
- When a reduct is found the resulting rules cover all instances

Feature set

Subset

Evaluation and

Generation

Rule Induction

Subset

suitability

Stopping

Continue

Stop

Validation

Criterion

Harmony search approach

- R. Diao and Q. Shen. A harmony search based approach to hybrid fuzzy-rough rule induction, Proceedings of the 21st International Conference on Fuzzy Systems, 2012.

Harmony search approach

Musicians

Harmony

Fitness

Notes

HarmonyMemory

Minimise ( a – 2 ) 2 + ( b – 3 ) 4 + ( c – 1 ) 2 + 3

Key notion mapping

HarmonySearch

Hybrid RuleInduction

NumericalOptimisation

Musician

Fuzzy rule rx

Variable

Note

Feature subset

Value

Harmony

Rule set

Solution

Fitness

Combined evaluation

Evaluation

Comparison vsQuickRules

HarmonyRules56.33±10.00

QuickRules

63.1±11.89

Rule cardinality distribution for dataset web of 2556 features

Semi-supervised learning (SSL)

- Lies somewhere between supervised and unsupervised learning
- Why use it?
- Data is expensive to label/classify
- Labels can also be difficult to obtain
- Large amounts of unlabelled data available
- When is SSL useful?
- Small number of labelled objects but large number of unlabelled objects

Semi-supervised learning

- A number of methods for SSL – self-learning, generative models etc.
- Labelled data objects – usually small in number
- Unlabelled data objects – usually large in number
- A set of features describe the objects
- Class label tells us only which labelled objects belong to
- SSL therefore attempts to learn labels (or structure) for data which has no labels
- Labelled data provides ‘clues’ for the unlabelled data

Fuzzy-rough self learning (FRSL)

- Basic idea is to propagate labels using the upper and lower approximations
- Label only those objects which belong to the lower approximation of a class to a high degree
- Can use upper approximation to decide on ties
- Attempts to minimise mis-labelling and subsequent reinforcement
- Paper: N. Mac Parthalain and R. Jensen. Fuzzy-Rough Set based Semi-Supervised Learning. Proceedings of the 20th International Conference on Fuzzy Systems (FUZZ-IEEE’11), pp. 2465-2471, 2011.

FRSL

Labelled dataset

Labelled data objects

Yes

Lower

approximation

membership = 1?

No

Fuzzy-rough learner

Predictions

Unlabelled Data

Conclusion

- Looked at fuzzy-rough methods for data mining
- Feature selection, finding optimal reducts
- Handling missing values and other problems
- Classification/prediction
- Instance selection
- Semi-supervised learning
- Future work
- Imputation, better rule induction and instance selection methods, more semi-supervised methods, optimizations, instance/feature weighting

FR methods in Weka

- Weka implementations of all fuzzy-rough methods can be downloaded from:
- KEEL version available soon (hopefully!)

- http://users.aber.ac.uk/rkj/book/wekafull.jar

Download Presentation

Connecting to Server..