feature selection n.
Skip this Video
Download Presentation
Feature Selection

Loading in 2 Seconds...

play fullscreen
1 / 19

Feature Selection - PowerPoint PPT Presentation

  • Uploaded on

Feature Selection. Benjamin Biesinger - Manuel Maly - Patrick Zwickl. Agenda. Introduction : What is feature selection? What is our contribution? Phases : What is the sequence of actions in our solution? Solution : How does it work in particular? Results : What is returned?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Feature Selection' - papina

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
feature selection
Feature Selection

Benjamin Biesinger - Manuel Maly - Patrick Zwickl

  • Introduction: What is feature selection? What is our contribution?
  • Phases: What is the sequence of actions in our solution?
  • Solution: How does it work in particular?
  • Results: What is returned?
  • Analysis: What to do with it? What can we conclude from it?
  • Not all features of a data set are useful for classification
  • A large number of attributes negatively influences the computation time
  • The most essential features should be used for classification
  • Feature selection is an approach solving this issue
  • Different search strategies and evaluations are available, but which is the best?
  • Automatic feature selection: Several algorithms are run, compared and analyzed for trends → Implemented by us
  • Phases: (I) Meta-classification - (II) Classification
  • Before: File loading & preparation
  • Afterwards: Comparison + output generation
  • Java command-line application utilizing the WEKA toolkit
  • Command-line arguments: Filename (of dataset), Classifier algorithm name, Split (feature selection <-> classification percentage)
    • Example: „winequality-red.csv M5Rules 20“
  • Computation of results and display in system output of console
solution flow 1
Solution (Flow 1)
  • Parsing of dataset and creation of WEKA-specific „Instances“ object.
  • Split of Instances object in two parts, depending on percentage entered by user.
  • Combining all evaluation and search algorithms given in properties-files, and applying on 1. Instances object, finally storing results in dedicated objects (SData).
  • Classifying all combinations from step 3 with classifier entered by user on 2. Instances object. Again storing results in SData objects.
solution flow 2
Solution (Flow 2)
  • Gaining aggregate information on all results by iterating over SData objects.
  • Print trend analysis and information on combined evaluation and search algorithms, plus the corresponding classification results (time + mean absolute error).
solution output excerpt
Solution (Output Excerpt)

@TREND of selected features

Attribute: bottom-right-square has Count: 8

=============== Evaluation: ConsistencySubsetEval ===============

--- Search: GreedyStepwise ---

# of selected features: 1, selection time: 34, classification time: 36, mean abs. error:47,07%

# of selected features: 2, selection time: 35, classification time: 34, mean abs. error:43,16% …

--- Search: RandomSearch ---

Automatic feature number (no influence by user): 5, selection time: 74, classification time: 118, mean abs. error:44,46%

  • Tested on 3 different datasets
    • Tic Tac Toe
    • Wine Quality (red)
    • Balance Scale
  • 2 comparisons per dataset were made
    • For each feature selection individually
    • Between different feature selection techniques
  • Is there a trend which features are selected by most techniques?
1st comparison
1st Comparison
  • Influence of number of selected features on
    • Runtime
    • Classification accuracy (measured in MAE)
1st comparison result
1st Comparison Result
  • Only those search algorithms used that implement RankedOutputSearch interface
    • Capable to influence the number of features to select
  • Number of features selected and MAE behave to each other directly proportional – to runtime inversely proportional
2nd comparison
2nd Comparison
  • Feature Selection Technique consists of
    • Search algorithm
    • Evaluation algorithm
    • Not all combinations possible!
  • Different feature selection techniques compared to each other concerning:
    • Runtime
    • Performance (measured in MAE)
2nd comparison result
2nd Comparison Result
  • Different techniques select different amount of attributes
    • In some extent, different attributes, too
  • Some techniques are slower than others
    • Huge runtime differences between search algorithms
  • Some techniques select insufficient attributes to give acceptable results
  • In all tested datasets there was a trend on which features were selected
  • Higher count of selection implies bigger influence to the output
  • Different feature selection techniques – different characteristics
  • ClassifierSubsetEval / RaceSearch very good classification results
  • Less attributes – faster classification
    • Algorithms that select less features are faster
      • e.g. GeneticSearch
feature selection1
Feature Selection


Benjamin Biesinger - Manuel Maly - Patrick Zwickl

Anything missed?

Any questions?

The essential features ;)