320 likes | 453 Views
Presenter: Russell Greiner. Bio- and Medical-Informatics. Vision Statement. *. data. Helping the world understand … and make informed decisions. bio- and medical- informatics. * Potential beneficiaries: biological and medical researchers, practicing clinicians, and
E N D
Presenter: Russell Greiner Bio- and Medical-Informatics
Vision Statement * data Helping the world understand … and make informed decisions. bio- and medical- informatics • * Potential beneficiaries: • biological and medical researchers, • practicing clinicians, and • the people they serve. 2
Motivation • High impact on bio-science and society • Local bioinformatics expertise • ML has a key role: • actual patterns (predictors, …) not known • lots of data • Challenging ML problems • data is high dimensional, noisy, … • often structured data • need to obtain training data, labels, … • …
Personnel • PI synergy: • R. Greiner, R. Goebel, C. Szepesvari • 18 Software developers • 4 Postdocs (3 AICML) • 14 UGrad / IIP students • 17 Grad students (11 MSc, 6 PhD)
Partners/Collaborators • 6 UofA CS profs • 5 UofA Bioscientists • Non-UofA collaborators: • Cross Cancer Institute (Alberta Cancer Board) • University of Alberta Hospital • Boston University, Maimi University, Dept of Homeland Security
Additional Resources • Grants • $440K PENCE (Proteome Analyst) • $600K ACB (Brain Tumour) • Part of • $3.6M GenomeCanada (Human Metabolome Project) • $5.5M GenomeCanada (Alberta Transplant Institute) • $1.7M ACB (misc PolyomX grants) • In Kind: Data from CCI, ATI • 1970+ MRI scans (260 patients); 270 labeled • 300 (30K – 50K) Microarray chips • 80 (250K) SNP Chips
Highlights • The Human Metabolome is ~completed and annotated • described in Science, Nature, … • Human Metabolome DataBase used by 78,673 Visitors (438,481 pageviews) • Proteome Analyst is world’s best predictor of subcell location • analyzed >1,000,000 proteins, for >1,000 users • Patent filed for Brain Tumor Software • Effective new approach for learning to classify Microarrays • Virus classifier obtained 98.5% accuracy!
SNP Analysis Microarray Metabolomics Proteomics 30,000 8
Projects and Status Subcellular Locations • Brain Tumour Analysis (ongoing) (poster # 5) • Human Metabolome(new) • PolyomX(ongoing) (poster #8) • Proteome Analysis(ongoing) (posters # 6,7) • Whole Genome Analysis(ongoing) Metabolomics Proteomics Genomics 1500 Chemicals 3000 Enzymes 30,000 Genes 9
Technical Details Brain Tumour Project
Standard Practice! How to Treat Brain Tumours? • Irradiate ONLY visible tumor • No! Must also kill “(radiographically) occult”cancer cells surrounding tumour ! • Irradiate everything within 2 cm margin around tumor But that … • also includes normal cells • still misses other occult cells
How to Treat Brain Tumours? BETTER: • Predict (from earlier data) location of occult cells • Just irradiate that region! • Minimize number of normal cells zappedto minimize loss of brain function • Meaningful, as conformal radiotherapy can zap arbitrary shapes!
How to Predict? • Occult cells region where tumour cell will grow next(Assumption) use prior data (260 patients) • Observe each patient over time– how tumours have grown • Predict patterns, based on properties of tumour, patient, region, …
Technology… • Using Discriminative Random Field • Segmentation • Growth Prediction • Extensions: • Increase Accuracy: Support Vector Random Field • Increase Computational Efficiency: Decoupled SVRF • Exploit Unlabeled Region: Semi-Supervised (D)SVRF
Brain Tumour: Future Work • Incorporate other modalities • Diffusion Tensor Imaging • PET • … • Compute other features: • Textures (BGLAM) • Using alignment • Improve learning algorithms • Use Active Learning techniques to determine • which regions/slices/studies/patients to label • using which human labeler
Projects and Status Subcellular Locations • Brain Tumour Analysis (ongoing) (poster # 5) • Human Metabolome(new) • PolyomX(ongoing) (poster #8) • Proteome Analysis(ongoing) (poster # 6,7) • Whole Genome Analysis(ongoing) Metabolomics Proteomics Genomics 1500 Chemicals 3000 Enzymes 30,000 Genes 16
Technical Details Human Metabolome Project
Metabolomics Proteomics Genomics 2300 Chemicals 3200 Enzymes 30,000 Genes HMP Overview • Goal:identity & quantify the entire human “metabolome” • all small endogamous and exogenous chemicals that appear in a non-trivial quantity in people… ``HMDB: The Human Metabolome Database'‘, Nucleic Acids Research, January 2007.
HMP #1: Fast Profiling • Given an NMR spectrum (blood, urine, CSF), • autonomously find & quantify >100 compounds, • in < 2 minutes • If know “NMR signature” of each metabolite…then linear least squares • Except …“signature” not stable – shifts with unobservable ions • Think EM… • ML challenge • Acquire “conditional NMR signature” • Active Learning
HMP #2: Classify Patients Cachexia? Collect patient urine Compute Metabolic Profile Obtain NMR spectrum • Given: • Metabolic profile of patient • NMR/Mass spec of patient’s urine, blood, CSF • Predict: • Patient’s disease state • Reaction to Rx; Cachexia; Cancer • The role of ML … • Learn Profile Dx classifier Classify Profile Classifier Cachexia = Yes!
HMP #3: Chemical Property • Given: • Specific metabolite (chemical) • Predict: • Chemical properties of metabolite • Solubility, Melting point, … • Biological properties of metabolite • which reactions consume it, … • The role of ML … • Learn Metabolite Property classifier
Technical Details PolyomX Project
PolyomX • Given: • Description of a patient • (SNP, Microarray, Metabolomic Profile, …) • Predict: • Dx: Breast Cancer, Ovarian Cancer, … • Rx: Prostate Cancer Toxicity, Cachexia, … • The role of ML … • Learn Patient Dx classifier, … ``Predictive Models for Breast Cancer Susceptibility from Multiple, Single Nucleotide Polymorphisms'', Clinical Cancer Research, April 2004. ``Association of DNA Repair and Steroid Metabolism Gene Polymorphisms with Clinical Late Toxicity in Patients Treated with Conformal Radiotherapy for Prostate Cancer'', Clinical Cancer Research, April 2006.
PolyomX: Future Work • Better tools for analyzing microarrays • Rank-One Bicluster Classifier (RoBiC) • Scaling up to 250K SNP chips • Incorporating >1 modality • Many other tasks: • Ovarian Cancer (microarray) • Use pathways to understand microarray • Microtubules docking • …
Technical Details Proteome Analyst
Proteome Analysis • Given: • Protein (FASTA format) • Predict:Properties of Protein • General function • Subcellular localization • The role of ML … • Learn Protein Location classifier
Results so far • Proteome Analyst classifiers • General Function: 80 – 90% • SubCellular Location: ~90% • Best known, by any system! (BioInformatics, 2004) • “Explain” facility has already helped users to identify problems in dataset… ``Proteome Analyst: Custom Predictions with Explanations in a Web-based Tool for High-Throughput Proteome Annotations'', Nucleic Acids Research, July 2004 ``Proteome Analyst: Custom Predictions with Explanations in a Web-based Tool for High-Throughput Proteome Annotations'', Nucleic Acids Research, July 2004 ``Visual Explanation and Auditing of Evidence with Additive Classifiers'‘, IAAI06, July 2006 ``PA-GOSUB: A Searchable Database of Model Organism Protein Sequences With Their Predicted GO Molecular Function and Subcellular Localization'', Nucleic Acids Research, Dec 2005. ``The Path-A metabolic pathway prediction web server'', Nucleic Acids Research, July 2006.
Current Proteome Analyst Tasks • Analyze metabolic pathways • Incorporate hierarchy (GO) • Use other information • Motifs in protein, … • Other applications • Relate to Microarray data • Use GLOBAL properties of complete-proteome … phylogenetic hierarchy • …
Technical Details Whole Genome Analysis
Whole Genome Analysis • heuristic selection of whole genome substrings, to increase efficiency and accuracy of subtype identification in HIV genome • construct Complete Composition Vector (CCV) nucelotide presentation, as approximate signature of viral genome • 100% recognition of subtypes in 867 whole genome examples
Other Bioinformatics Tasks • Predict Bull’s Expected Breeding Value • from SNPs • Bovine Haplotype • Predict Tumour Rejection • from Microarray • Other challengesfrom colleagues atUniv Hospital,Cross Cancer Inst. • …