90 likes | 210 Views
This study explores strategies for effectively identifying predictive genes from small sample, high-dimensional microarray data. Using a Naïve Bayes classifier minimizes parameters while employing simulated annealing for feature selection to improve accuracy. We applied repeated cross-validation to ensure robustness, revealing that global search techniques outperform local searches in gene selection consistency. Our findings demonstrate that simple models are preferred in small sample contexts, and the proposed RSN method successfully identifies high-confidence genes, paving the way for further biological analysis.
E N D
Making the Most of Small Sample High Dimensional Micro-Array Data Allan Tucker, Veronica Vinciotti, Xiaohui Liu; Brunel University Paul Kellam; Windeyer Institute
MicroArray Data • High dimensional • Small number of samples • Need to identify predictive genes • E.g. classification • Rate confidence on genes based upon predictive ability / classification
Identifying Predictive Genes • We use Naïve Bayes Classifier • Well established • Minimises parameters • Feature selection using SA • Repeated 10 times • Apply cross validation
Identifying Predictive Genes • Identify genes robustly • Data perturbed during CV • Repeats of stochastic SA search • Assign confidence based upon the frequencies of genes being selected • Limit maximum number of links
Classification Accuracy • Generally RSN performs best • SA global search better than local • Anomaly with B-Cell? • Synthetic data supports global over local
Confidence Scores • Relatively small number of genes • Identified with high confidence • Consistency between runs
Conclusions • When micro-array data only has small samples: • Simple models with small parameters best • Global search for parameters better • Proposed RSN successfully identifes genes of interest paving way for further biological analysis • Need to explore different parameters