Feature selection
1 / 19

Feature Selection - PowerPoint PPT Presentation

  • Uploaded on

Feature Selection. Benjamin Biesinger - Manuel Maly - Patrick Zwickl. Agenda. Introduction : What is feature selection? What is our contribution? Phases : What is the sequence of actions in our solution? Solution : How does it work in particular? Results : What is returned?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Feature Selection' - papina

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Feature selection
Feature Selection

Benjamin Biesinger - Manuel Maly - Patrick Zwickl


  • Introduction: What is feature selection? What is our contribution?

  • Phases: What is the sequence of actions in our solution?

  • Solution: How does it work in particular?

  • Results: What is returned?

  • Analysis: What to do with it? What can we conclude from it?


  • Not all features of a data set are useful for classification

  • A large number of attributes negatively influences the computation time

  • The most essential features should be used for classification

  • Feature selection is an approach solving this issue

  • Different search strategies and evaluations are available, but which is the best?

  • Automatic feature selection: Several algorithms are run, compared and analyzed for trends → Implemented by us


  • Phases: (I) Meta-classification - (II) Classification

  • Before: File loading & preparation

  • Afterwards: Comparison + output generation


  • Java command-line application utilizing the WEKA toolkit

  • Command-line arguments: Filename (of dataset), Classifier algorithm name, Split (feature selection <-> classification percentage)

    • Example: „winequality-red.csv M5Rules 20“

  • Computation of results and display in system output of console

Solution flow 1
Solution (Flow 1)

  • Parsing of dataset and creation of WEKA-specific „Instances“ object.

  • Split of Instances object in two parts, depending on percentage entered by user.

  • Combining all evaluation and search algorithms given in properties-files, and applying on 1. Instances object, finally storing results in dedicated objects (SData).

  • Classifying all combinations from step 3 with classifier entered by user on 2. Instances object. Again storing results in SData objects.

Solution flow 2
Solution (Flow 2)

  • Gaining aggregate information on all results by iterating over SData objects.

  • Print trend analysis and information on combined evaluation and search algorithms, plus the corresponding classification results (time + mean absolute error).

Solution output excerpt
Solution (Output Excerpt)

@TREND of selected features

Attribute: bottom-right-square has Count: 8

=============== Evaluation: ConsistencySubsetEval ===============

--- Search: GreedyStepwise ---

# of selected features: 1, selection time: 34, classification time: 36, mean abs. error:47,07%

# of selected features: 2, selection time: 35, classification time: 34, mean abs. error:43,16% …

--- Search: RandomSearch ---

Automatic feature number (no influence by user): 5, selection time: 74, classification time: 118, mean abs. error:44,46%


  • Tested on 3 different datasets

    • Tic Tac Toe

    • Wine Quality (red)

    • Balance Scale

  • 2 comparisons per dataset were made

    • For each feature selection individually

    • Between different feature selection techniques

  • Is there a trend which features are selected by most techniques?

1st comparison
1st Comparison

  • Influence of number of selected features on

    • Runtime

    • Classification accuracy (measured in MAE)

1st comparison result
1st Comparison Result

  • Only those search algorithms used that implement RankedOutputSearch interface

    • Capable to influence the number of features to select

  • Number of features selected and MAE behave to each other directly proportional – to runtime inversely proportional

2nd comparison
2nd Comparison

  • Feature Selection Technique consists of

    • Search algorithm

    • Evaluation algorithm

    • Not all combinations possible!

  • Different feature selection techniques compared to each other concerning:

    • Runtime

    • Performance (measured in MAE)

2nd comparison result
2nd Comparison Result

  • Different techniques select different amount of attributes

    • In some extent, different attributes, too

  • Some techniques are slower than others

    • Huge runtime differences between search algorithms

  • Some techniques select insufficient attributes to give acceptable results


  • In all tested datasets there was a trend on which features were selected

  • Higher count of selection implies bigger influence to the output


  • Different feature selection techniques – different characteristics

  • ClassifierSubsetEval / RaceSearch very good classification results

  • Less attributes – faster classification

    • Algorithms that select less features are faster

      • e.g. GeneticSearch

Feature selection1
Feature Selection


Benjamin Biesinger - Manuel Maly - Patrick Zwickl

Anything missed?

Any questions?

The essential features ;)