140 likes | 276 Views
This presentation by Joseph DePasquale at the 2007 Student Activities Conference explores the challenges of analyzing data with missing features, a common issue due to various real-world factors such as equipment failure, human error, or natural phenomena. The discussion focuses on a missing feature algorithm and its effectiveness in selecting features for training classifiers. It emphasizes the impact of parameters and the number of features utilized, providing insights through case studies from several databases. Funded by the National Science Foundation, this work presents crucial findings for improving data analysis methods.
E N D
Random Subspace Feature Selection for Analysis of Data with Missing Features Presented by: Joseph DePasquale Student Activities Conference 2007 This material is based upon work supported by the National Science Foundation under Grant No ECS-0239090. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Outline • Motivation • Missing feature algorithm • Selecting features for training • Finding usable classifiers for testing • Impact of free parameters • Number of features used for training • Distribution update parameter β
Motivation • Missing data is a real world issue • Failed equipment • Human error • Natural phenomena • Matrix multiplication can not be used if a single data value is left out Missing Feature
Training Usable Classifiers fi Ci X Feature used in training Usable classifier Feature not used in training
Experimental Setup • Research has been done for static selection of features used for training
Conclusions • β does not significantly impact the algorithm, the number of features used for training does have an impact
References [1]Hussein, S., “Random feature subspace ensemble based approaches for the analysis of data with missing features,” Submitted Spring 2006. [2] Haykin, S., “Neural Networks A Comprehensive Foundation,” New Jersey: Prentice Hall, 1999. [3] “UCI repository,” [Online Document], Accessed: 25 Nov 2006. http://www.ics.uci.edu/~mlearn/MLRepository.html
Learn++.MF • Training • Selecting features from distribution • Training the network • Update likelihood of selecting features • Testing • Data corruption • Identify usable classifiers • Simulation