60 Views

Download Presentation
##### Bayesian Machine learning and its application

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Bayesian Machine learning and its application**Alan Qi Feb. 23, 2009**Motivation**• massive data from various sources: web pages, facebook, high-throughput biological data, high-throughput chemical data, etc. • Challenging goal: how to model complex systems and extract knowledge from data.**Bayesian machine learning**• Bayesian learning method Principled way to fuse prior knowledge and new evidence in data • Key issues • Model Design • Computation • Wide-range applications**Bayesian learning in practice**• Applications: • Recommendation systems (Amazon, NetFlix) • Text Parsing (Finding latent topics in documents) • Systems biology (where computations meets biology) • Computer vision (parsing handwritten diagram automatically) • Wireless communications • Computational finance ....**Protein, product of Gene B**DNA Learning for biology: understanding gene regulation during organism development • Learning functionalities of genes for development • Inferring high-resolution protein-DNA binding locations from low-resolution measurement Gene A • Learning regulatory cascades during embryonic stem cell development**Data: gene expression profiles from wide-types & mutants**No C lineage Wild-type lineage Extra ‘C’ lineages (Baugh et al, 2005)**Graph-based kernels**(F. Chung, 1997, Zhu et al., 2003, Zhou et al. 2004) • Gaussian process classifier that is trained by EP and classifies the whole genome efficiently • Estimating noise and probe quality by approximate leave-one-out error Classifier Bayesian semisupervised classification for finding tissue-specific genes Labeled expression BGEN: (Bayesian GENeralization from examples, Qi et al., Bioinformatics 2006) Gene expression Labeled expression**Ge’s lab**Biological experiments support our predictions Non C C Epidermis Muscle K01A2.5 Non C C Epidermis Muscle R11A5.4**Consensus Sequences**Useful for publication IUPAC symbols for degenerate sites Not very amenable to computation Nature Biotechnology 24, 423 - 425 (2006)**Count frequencies**Add pseudocounts 1 K Probabilistic Model M1 MK M1 A C G T .1 .2 .1 .4 .1 .1 .2 .2 .2 .2 .5 .1 .4 .5 .4 .2 .2 .1 .3 .1 .2 .2 .2 .7 Pk(S|M) Position Frequency Matrix (PFM)**Bayesian learning: Estimating motif models by Gibbs sampling**P(Sequences|params1,params2) Parameter1 Parameter2 In theory, Gibbs Sampling less likely to get stuck a local maxima**Bayesian learning: Estimating motif models by expectation**maximization P(Sequences|params1,params2) Parameter1 Parameter2 To minimize the effects of local maxima, you should search multiple times from different starting points**A**C G T A C G T .1 .2 .1 .4 .1 .1 -1.3 -0.3 -1.3 0.6 -1.3 -1.3 .2 .2 .2 .2 .5 .1 -0.3 -0.3 0.3 -0.3 1 -1.3 .4 .5 .4 .2 .2 .1 0.6 1 0.6 -0.3 -0.3 -1.3 .3 .1 .2 .2 .2 .7 0.3 -1.3 -0.3 -0.3 -0.3 1.4 Scoring A Sequence To score a sequence, we compare to a null model Log likelihood ratio Position Weight Matrix (PWM) Background DNA (B) PFM**Scoring a Sequence**Common threshold = 60% of maximum score MacIsaac & Fraenkel (2006) PLoS Comp Bio**Visualizing Motifs – Motif Logos**Represent both base frequency and conservation at each position Height of letter proportional to frequency of base at that position Height of stack proportional to conservation at that position**Software implemenation: AlignACE**• Implements Gibbs sampling for motif discovery • Several enhancements • ScanAce – look for motifs in a sequence given a model • CompareAce – calculate “similarity” between two motifs (i.e. for clustering motifs) http://atlas.med.harvard.edu/cgi-bin/alignace.pl**Network Decomposition**• Infinite Non-negative Matrix Factorization • Formulate the discovery of network legos as a non-negative factorization problem • Develop a novel Bayesian model which automatically learns the number of the bases.**Network Decomposition**• Synthetic Network Decomposition**Data: Movie rating**• User-item Matrix of Ratings • Recommend: 5 • Not Recommend: 1 X =**Task: how to predict user preference**• “Based on the premise that people looking for information should be able to make use of what others have already found and evaluated.” (Maltz & Ehrlich, 1995) • E.g., if you like movies A, B, C, D, and E. And I like A, B, C, D but have not seen E yet. What would be my possible rating on E?**Collaborative filtering for recommendation systems**• Matrix factorization as an collaborative filtering approach: X ≈ Z A where X is N by D, Z is N by K and A is K by D. xi,j: user i’s rating on movie j zi,k: user i’s interests in movie category k (e.g., action, thriller, comedy, romance, etc.) Ak,j: how likely movie j belong to movie category k Such that xi,j ≈ zi,1 A1,j + zi,2 A22,j + … + zi,K AK,j**Bayesian learning of matrix factorization**• Training: Use probability theory, in particular, Bayeisan inference, to learn the model parameters Z, A given data X, which contains missing elements, i.e., unknown ratings • Prediction: use estimated Z and A to predict unkown ratings in X**Test resutls**• ‘Jester’ dataset: • Map from [-10,10] to [0,20] • 10 random chosen datasets, each with 1000 users. For each user we randomly hold out 10 ratings for testing • IMF, INMF and NMF(K=2…9)**Task**• How to find latent topics and group documents, such as emails, papers, or news into different clusters?**Data: text documents**Computer science papers Biology papers X =**Assumptions**• The keywords are shared in different documents of one topic. • The more important the keyword is, the more frequent it appears.**Matrix factorization models (again)**X = Z A xi,j: the frequency word j appears in document zi,k: how much content in document i is related to topic k (e.g., biology, computer science, etc.) Ak,j: how important word j to topic k**Bayesian Matrix Factorization**• We will use Bayesian methods again to estimate Z and A. • Once we can identify hidden topics by examining A and cluster documents.**Text Clustering**• ‘20 newsgroup’ dataset • A subset of 815 articles and 477 words.**Summary**• Bayesian machine learning: A powerful tool enables computers to learn hidden relations from massive data and make sensible predictions. • Applications in computational biology, e.g., gene expression analysis and motif discovery, and information extraction, e.g., text modeling.