Chapter 11: Sampling Methods (Section 1,2,3)

Chapter 11: Sampling Methods (Section 1,2,3) Shiqi Zhang Machine Learning Class Instructor: Dr. Sridharan Mohan Nov 2, 2009 Link to Importance Sampling Example: A Paper from Dieter Fox Shiqi Zhang Machine Learning Nov 2, 2009

1 Overview • What is sampling here? • To obtain a set of samples z(l) (where l = 1,..., L) drawn independently from a distribution p(z) • Difference from the previous • Before, the goal is to get the distribution from the given data points in order to do classification or regression • Now, the goal is to get data points from the given distribution p(z) is evaluable Usually, the function of the distributuion is hidden, but it must be evaluable Shiqi Zhang Machine Learning Nov 2, 2009

2 Schematic Illustration: A Preface resort to As long as the samples are drawn from the distribution , then Shiqi Zhang Machine Learning Nov 2, 2009

3 Outline of Basic Sampling Algorithms • Standard distributions • Rejection sampling • Adaptive rejection sampling • Importance sampling • Sampling-importance-resampling • Sampling and the EM algorithm Shiqi Zhang Machine Learning Nov 2, 2009

4 Standard Distributions The goal is to generate random numbers from simple nonuniform distributions where, z is uniformly distributed over the interval (0,1), y is the random numbers we want where, is one of the numbers we are trying to get * Note that must be integrable here Shiqi Zhang Machine Learning Nov 2, 2009

5 Standard Distributions Box-Muller, a method for generating Gaussiandistributed random numbers • A uniform distribution is needed at first, which is... Shiqi Zhang Machine Learning Nov 2, 2009

6 Standard Distributions Highly depends on the ability to calculate and then invert the indefinite integral of the required distribution Shiqi Zhang Machine Learning Nov 2, 2009

7 Rejection Sampling • Suppose, it’s easy to evaluate for any given value of z where, can readily be evaluated, but is unknown • For instance, could be ... In more general cases, we don’t have the function p(z) but can evaluate this distribution Shiqi Zhang Machine Learning Nov 2, 2009

8 Rejection Sampling • Here, should be selected at first, then a is chosen Step1, z0 from q(z) Step2, u0 from uniform Step3, compare and kq(z0), and decide whether accept it Then, all the z points are from Shiqi Zhang Machine Learning Nov 2, 2009

9 An Example of Rejection Sampling -- is ... -- is ... Shiqi Zhang Machine Learning Nov 2, 2009

9 An Example of Rejection Sampling -- is ... -- is ... • It is hard to find a good to make this area measurement very small Shiqi Zhang Machine Learning Nov 2, 2009

10 Adaptive Rejection Sampling • So, artificial is set up like this ... (of course, it it integrable and invertible) The goal is to use q(z) to substitute p(z) Shiqi Zhang Machine Learning Nov 2, 2009

10 Adaptive Rejection Sampling • So, artificial is set up like this ... (of course, it it integrable and invertible) Problem: in D-dimensions, even if -- exceeds -- by just one percent, for D = 1,000 the acceptance rate will be ~ 1/20,000 The goal is to use q(z) to substitute p(z) Shiqi Zhang Machine Learning Nov 2, 2009

11 Importance Sampling • Importance Sampling is designed for approximating expectations where, is drawn from uniform distribution where, is drawn from are know as importance weights Shiqi Zhang Machine Learning Nov 2, 2009

12 Importance Sampling where and So, we get where Shiqi Zhang Machine Learning Nov 2, 2009

13 Sampling-Importance-Resanmpling SIR Example Three steps: 1, samples are drawn from 2, Weights are constructed by * 3, A second set of samples is drawn from the distributuion with probabilities given by the weights * Shiqi Zhang Machine Learning Nov 2, 2009

13 Sampling-Importance-Resanmpling SIR Example Three steps: 1, samples are drawn from 2, Weights are constructed by * 3, A second set of samples is drawn from the distributuion with probabilities given by the weights To prove the correctness ... when , we get Eq(11.26)... * Shiqi Zhang Machine Learning Nov 2, 2009

14 Sampling And The EM Algorithm In section 9.3, was introduced as the Expectation of the complete-data log likelihood Shiqi Zhang Machine Learning Nov 2, 2009

15 Sampling And The EM Algorithm Shiqi Zhang Machine Learning Nov 2, 2009

16 Markov Chain Monte Carlo • The current state is maintained, depends on this current state • In this way, a Markov chain is formed like this • algorithm: Shiqi Zhang Machine Learning Nov 2, 2009

16 Markov Chain Monte Carlo • The current state is maintained, depends on this current state • In this way, a Markov chain is formed like this • algorithm: An Example of Random Walk Shiqi Zhang Machine Learning Nov 2, 2009

16 Markov Chain Monte Carlo • The current state is maintained, depends on this current state • In this way, a Markov chain is formed like this • algorithm: An Example of Random Walk This is just the required distribution? Shiqi Zhang Machine Learning Nov 2, 2009

17 Some Definations on Markov Chains First-order Markov chain is defined as with this property ... • Transition probabilities ... • If this is all the same, the Markov chain is called “Homogeneous” • If , the distribution is said to be “Invariant/Stationary” • If , the transition probabilities are of “detailed balance” • Usually, a homogeneous Markov chain will be “ergodic” Sufficient condition Shiqi Zhang Machine Learning Nov 2, 2009

18 Two Forms of Markov Chains Shiqi Zhang Machine Learning Nov 2, 2009

19 The Metropolis-Hastings Algorithm • In this section, it is proved that the Metropolis-Hastings algorithm just samples from the required distribution Shiqi Zhang Machine Learning Nov 2, 2009

20 Gibbs Sampling Consider the distribution , from which we wish to sample Each time, we replace by a value drawn from the distriution For ... Shiqi Zhang Machine Learning Nov 2, 2009

21 An Example of Gibbs Sampling • Step size is of order • The number of steps need to obtain independent samples is of order • To avoid the random walk, is replaced with Shiqi Zhang Machine Learning Nov 2, 2009

22 Gibbs Sampling, From the View of Graphs • For an undirected graph, this distribution is a function of Neighbours • For a directed graph, it is a function of the parents, the children and the co-parents Shiqi Zhang Machine Learning Nov 2, 2009

The End Thank you Shiqi Zhang Machine Learning Nov 2, 2009

Chapter 11: Sampling Methods (Section 1,2,3)