Bayesian machine learning and its application
Download
1 / 37

Bayesian Machine learning and its application - PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on

Bayesian Machine learning and its application. Alan Qi Feb. 23, 2009. Motivation. massive data from various sources: web pages, facebook, high-throughput biological data, high-throughput chemical data, etc. Challenging goal: how to model complex systems and extract knowledge from data.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Bayesian Machine learning and its application' - bly


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Motivation
Motivation

  • massive data from various sources: web pages, facebook, high-throughput biological data, high-throughput chemical data, etc.

  • Challenging goal: how to model complex systems and extract knowledge from data.


Bayesian machine learning
Bayesian machine learning

  • Bayesian learning method

    Principled way to fuse prior knowledge and new evidence in data

  • Key issues

    • Model Design

    • Computation

  • Wide-range applications


Bayesian learning in practice
Bayesian learning in practice

  • Applications:

    • Recommendation systems (Amazon, NetFlix)

    • Text Parsing (Finding latent topics in documents)

    • Systems biology (where computations meets biology)

    • Computer vision (parsing handwritten diagram automatically)

    • Wireless communications

    • Computational finance ....


Learning for biology understanding gene regulation during organism development

Protein, product of Gene B

DNA

Learning for biology: understanding gene regulation during organism development

  • Learning functionalities of genes for development

  • Inferring high-resolution protein-DNA binding locations from low-resolution measurement

Gene A

  • Learning regulatory cascades during embryonic stem cell development


Data: gene expression profiles from wide-types & mutants

No C lineage

Wild-type lineage

Extra ‘C’ lineages

(Baugh et al, 2005)


Bayesian semisupervised classification for finding tissue specific genes

  • Graph-based kernels

    (F. Chung, 1997, Zhu et al., 2003, Zhou et al. 2004)

  • Gaussian process classifier that is trained by EP and classifies the whole genome efficiently

  • Estimating noise and probe quality by approximate leave-one-out error

Classifier

Bayesian semisupervised classification for finding tissue-specific genes

Labeled

expression

BGEN: (Bayesian GENeralization from examples, Qi et al., Bioinformatics 2006)

Gene expression

Labeled

expression


Ge’s lab

Biological experiments support our predictions

Non C

C

Epidermis

Muscle

K01A2.5

Non C

C

Epidermis

Muscle

R11A5.4




Consensus sequences
Consensus Sequences

Useful for publication

IUPAC symbols for degenerate sites

Not very amenable to computation

Nature Biotechnology 24, 423 - 425 (2006)


Probabilistic model

Count frequencies

Add pseudocounts

1

K

Probabilistic Model

M1

MK

M1

A

C

G

T

.1

.2

.1

.4

.1

.1

.2

.2

.2

.2

.5

.1

.4

.5

.4

.2

.2

.1

.3

.1

.2

.2

.2

.7

Pk(S|M)

Position Frequency

Matrix (PFM)


Bayesian learning estimating motif models by gibbs sampling
Bayesian learning: Estimating motif models by Gibbs sampling

P(Sequences|params1,params2)

Parameter1

Parameter2

In theory, Gibbs Sampling less likely to get stuck a local maxima


Bayesian learning estimating motif models by expectation maximization
Bayesian learning: Estimating motif models by expectation maximization

P(Sequences|params1,params2)

Parameter1

Parameter2

To minimize the effects of local maxima, you should search

multiple times from different starting points


Scoring a sequence

A maximization

C

G

T

A

C

G

T

.1

.2

.1

.4

.1

.1

-1.3

-0.3

-1.3

0.6

-1.3

-1.3

.2

.2

.2

.2

.5

.1

-0.3

-0.3

0.3

-0.3

1

-1.3

.4

.5

.4

.2

.2

.1

0.6

1

0.6

-0.3

-0.3

-1.3

.3

.1

.2

.2

.2

.7

0.3

-1.3

-0.3

-0.3

-0.3

1.4

Scoring A Sequence

To score a sequence, we compare to a null model

Log likelihood

ratio

Position Weight

Matrix (PWM)

Background DNA (B)

PFM


Scoring a sequence1
Scoring a Sequence maximization

Common threshold = 60% of maximum score

MacIsaac & Fraenkel (2006) PLoS Comp Bio


Visualizing motifs motif logos
Visualizing Motifs – Motif Logos maximization

Represent both base frequency and conservation at each position

Height of letter proportional

to frequency of base at that position

Height of stack proportional

to conservation at that position


Software implemenation alignace
Software maximizationimplemenation: AlignACE

  • Implements Gibbs sampling for motif discovery

    • Several enhancements

  • ScanAce – look for motifs in a sequence given a model

  • CompareAce – calculate “similarity” between two motifs (i.e. for clustering motifs)

http://atlas.med.harvard.edu/cgi-bin/alignace.pl



Network decomposition
Network Decomposition maximization

  • Infinite Non-negative Matrix Factorization

  • Formulate the discovery of network legos as a non-negative factorization problem

  • Develop a novel Bayesian model which automatically learns the number of the bases.


Network decomposition1
Network Decomposition maximization

  • Synthetic Network Decomposition


Network decomposition2
Network Decomposition maximization


Data movie rating
Data: Movie rating maximization

  • User-item Matrix of Ratings

  • Recommend: 5

  • Not Recommend: 1

X =


Task how to predict user preference
Task: how to predict user preference maximization

  • “Based on the premise that people looking for information should be able to make use of what others have already found and evaluated.” (Maltz & Ehrlich, 1995)

  • E.g., if you like movies A, B, C, D, and E. And I like A, B, C, D but have not seen E yet. What would be my possible rating on E?


Collaborative filtering for recommendation systems
Collaborative filtering for recommendation systems maximization

  • Matrix factorization as an collaborative filtering approach:

    X ≈ Z A

    where X is N by D, Z is N by K and A is K by D.

    xi,j: user i’s rating on movie j

    zi,k: user i’s interests in movie category k (e.g., action, thriller, comedy, romance, etc.)

    Ak,j: how likely movie j belong to movie category k

    Such that xi,j≈ zi,1 A1,j + zi,2 A22,j + … + zi,KAK,j


Bayesian learning of matrix factorization
Bayesian learning of matrix factorization maximization

  • Training: Use probability theory, in particular, Bayeisan inference, to learn the model parameters Z, A given data X, which contains missing elements, i.e., unknown ratings

  • Prediction: use estimated Z and A to predict unkown ratings in X


Test resutls
Test resutls maximization

  • ‘Jester’ dataset:

  • Map from [-10,10] to [0,20]

  • 10 random chosen datasets, each with 1000 users. For each user we randomly hold out 10 ratings for testing

  • IMF, INMF and NMF(K=2…9)



Task maximization

  • How to find latent topics and group documents, such as emails, papers, or news into different clusters?


Data text documents
Data: text documents maximization

Computer science papers

Biology papers

X =


Assumptions
Assumptions maximization

  • The keywords are shared in different documents of one topic.

  • The more important the keyword is, the more frequent it appears.


Matrix factorization models again
Matrix factorization models (again) maximization

X = Z A

xi,j: the frequency word j appears in document zi,k: how much content in document i is related to topic k (e.g., biology, computer science, etc.)

Ak,j: how important word j to topic k


Bayesian matrix factorization
Bayesian Matrix Factorization maximization

  • We will use Bayesian methods again to estimate Z and A.

  • Once we can identify hidden topics by examining A and cluster documents.


Text clustering
Text Clustering maximization

  • ‘20 newsgroup’ dataset

  • A subset of 815 articles and 477 words.



Summary
Summary maximization

  • Bayesian machine learning: A powerful tool enables computers to learn hidden relations from massive data and make sensible predictions.

  • Applications in computational biology, e.g., gene expression analysis and motif discovery, and information extraction, e.g., text modeling.


ad