1 / 14

Feature Selection

Feature Selection. Alexandros Potamianos School of ECE Natl. Tech. Univ. of Athens Fall 2014-2015. "A blasphemous sect suggested .. that all men should juggle letters and symbols until they constructed by an improbable gift of chance these canonical books ...

grovesr
Download Presentation

Feature Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feature Selection Alexandros Potamianos School of ECE Natl. Tech. Univ. of Athens Fall 2014-2015

  2. "A blasphemous sect suggested .. that all men should juggle letters and symbols until they constructed by an improbable gift of chance these canonical books ... The sect almost disappeared but I have seen old men who, for long periods of time, would hide in the latrines with some metal disks in a forbidden dice cup, feebly trying to mimic the divine disorder." from the Library of Babel, by Jorge Luis Borges

  3. Feature Selection (Th&K ch5) • Preprocessing • Outlier Removal • variance based, |x – mean| > 2*std or 3*std • Data Normalization • z-normalization: (x-mean)/std • Followed optionally by sigmoid compression: 1/[1+exp(-y)] • Missing Data • Imputation (pseudo-EM) • Multiple imputation (Bayesian) • EM and variants

  4. Feature Selection (Th&K ch5) • How to measure a good feature? • Classification error estimates • Divergence • Expected value of ln(p(x|w1)/p(x|w2)) • Kullback-Leibler distance between pdfs • Equal covariance => Mahalanobis distance • Bounds of classification performance: • Chernoff Bound and Bhattacharyya distance • Scatter Matrices and Fisher distriminant • Classification error proper!

  5. Outline • Feature Selection • Variable Ranking • Variable Subset Selection • Feature Constr. and Dim. Reduction • Methods • Filtering (open loop) • Wrapper (closed loop) • Embedding (min. class. error)

  6. Variable Ranking • Look at one feature at a time! • Criteria • Correlation between feature and corresponding class. labels • Pearson correlation coeff. squared • Non-linear generalizations • Information theoretic criteria • Mutual information between features and corr. class. labels (aka saliency)

  7. Some Observations • Features that are iid are not redundant! • Perfect correlation  no new information • High correlation  could add information • A useless feature by itself can improve performance when combined • Multiple useless features by themselves can improve performance when combined

  8. Redundant variables ?

  9. Correlated variables ?

  10. Useless Variable?

  11. Variable Subset Selection • Need to select features jointly! • NP-hard problem • Forward selection • Backward selection • Greedy searches avoid over-fitting • Embedded methods • Finite difference • Quadratic approximation of cost function

  12. Feature Selection:Computational Complexity • Select subset of k from m features • Scalar Feature Selection O(km) • Feature Vector Selection (Wrapper) • Filter (full search) m!/k!(m-k)! • Sequential Backward O(m2+k2) • Sequential Forward O(k(m+k)) • Floating search • a combination of forward and backward search • Features can be added back after being rejected • Alternates between inclusion (forward) and exclusion (backward) steps • For monotonic criteria C(X) leq C(X,xk+1), dynamic programming solutions

  13. Variable Subset Selection (cont.) • Direct objective optimization (e.g., minimum description length) • Goodness of fit (maximize) • Number of variables (minimize) • Combination of wrappers/embedded methods + filters • Markov blankets

  14. Dimensionality Reduction • PCA/SVD, LDA etc. • Clustering (unsupervised, supervised) • Fisher linear discriminant • Information bottleneck

More Related