Mutual Information. Diagnostic Feature Detection of Leukemia Serum Proteins in 2-200 kDa TOF-MS Spectra. Maureen B. Tracy, Dariya I. Malyarenko, Karl W. Kuschner, Eugene R. Tracy, William E. Cooke, and Dennis M. Manos College of William and Mary, Williamsburg, VA 23187-8795. Overview.
Diagnostic Feature Detection of Leukemia Serum Proteins
in 2-200 kDa TOF-MS Spectra
Maureen B. Tracy, Dariya I. Malyarenko, Karl W. Kuschner, Eugene R. Tracy, William E. Cooke, and Dennis M. Manos
College of William and Mary, Williamsburg, VA 23187-8795
Signal Processing Steps
Bayesian Network Analysis
Summary Bayesian Network Analysis
A subset of TOF-MS spectra from a 2004 leukemia serum protein profiling study conducted by EVMS  is analyzed. Data includes broad mass range (2-200kDa) spectra from two clinical groups, adult T-cell leukemia and normal. Using signal processing parameters optimized for Quality Control (QC) pooled sera, MS spectra from 67 leukemia and 78 normal patients (2-3 replicates each) are processed with exponential-model baseline removal, integrative down-sampling (IDS), optimal linear filtering (OLF) , pedestal removal, peak detection, and alignment . Variable selection is performed on the resulting peak-intensity data matrix using techniques  based on mutual information and Bayesian network analysis. Results are compared with previous results [1,4] obtained for the standard low mass focusing range (3 - 12 kDa).
Frequency of Links Between All Peaks
Frequency of Peaks Linked to Class
The ability to obtain diagnostic information from high mass TOF-MS spectra has been limited by low signal-to-noise and peak broadening. In order to reliably detect features for a broad mass range, signal processing methods must achieve higher sensitivity and selectivity. Further, features detected in these spectra can be highly correlated which can cause instability in variable selection and classifier behavior. In order to obtain meaningful identification of potentially diagnostic features, variable selection and classification methods must be robust and able to handle correlations and uncertainties in the data. In our work, the challenges of low signal-to-noise, broad peaks, peak-to-peakcorrelations and experimental uncertainty have been met with new signal processing, variable selection and classification methods.
Alignment of Spectra From Two Mass Ranges
Frequency of Peaks Linked to 11.5 kDa
This work was supported by NIH-National Cancer Institute SBIR Phase II CA101479 and R01 Grant CA126118.
We thank Dr. L. H. Cazares and Prof. O. John Semmes of Eastern Virginia Medical School, Norfolk for acquiring and providing us the data.
We thank INCOGEN, Inc for maintaining the database containing the data and the VIBE software package used to access the data.
Peaks Linked to Class
1. Semmes, O. J. et al, Leukemia (2005) 19, 1229-1238
2. Malyarenko, Dariya I., et al., Rapid Commun. Mass Spectrom (2006) 20, 1670–1678
3. Gatlin-Bunai, C. L., et al., J Proteome Res (2007) 6, 4517-4524
4.Kuschner, Karl W.,PhD Dissertation, College of William and Mary (May 2009)
Peaks Linked to 11.5 kDa
Error Rate = 11.7%
10-fold cross validation, 100 repetitions