1 / 30

Learning Mixtures of Structured Distributions over Discrete Domains 

Learning Mixtures of Structured Distributions over Discrete Domains . Xiaorui Sun Columbia University. Joint work with Siu -On Chan(UC Berkeley), Ilias Diakonikolas (U Edinburgh ), Rocco Servedio ( Columbia University ). Density Estimation. PAC-type learning model

zoe
Download Presentation

Learning Mixtures of Structured Distributions over Discrete Domains 

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Mixtures of Structured Distributions over Discrete Domains  Xiaorui Sun Columbia University Joint work with Siu-On Chan(UC Berkeley), IliasDiakonikolas(U Edinburgh), Rocco Servedio(Columbia University)

  2. Density Estimation • PAC-type learning model • Set of possible target distributions over • Learner • Know the set but does not know the target distribution • Independently draws a few samples from • Outputs (succinct description of a) distribution which is -close to • Total variation distance is standard measure in statistics

  3. Learn a structured distribution • If = {all distributions over }, samples are required • Much better sample complexities possible for structured distributions • Poisson binomial distributions [DDS12a] • samples • Monotone/k-modal [Bir87, DDS12b] • samples/samples

  4. This work: Learn mixture of structured distributions • Learn mixture of distributions? • A set of distributions over • Target distribution is a mixture of distributions from • i.e. , such that • Our result: learn mixtures for several structured distributions • Sample complexity close to optimal • Efficient running time

  5. Our results: learning mixture of log-concave • Log-concave distribution over [n] • for 1 n

  6. Our results: log-concave • Algorithm to learn a mixture of log-concave distributions • Sample complexity: • Running time: bit operations • Lower bound: samples

  7. Our results: mixture of unimodal • Unimodal distribution over [n] • s.t. 1 n

  8. Our results: mixture of unimodal • A mixture of 2 unimodal distributions may have modes • Algorithm to learn a mixture of unimodal distributions • Sample complexity: samples • Running time: bit operations • Lower bound: samples

  9. Our results: mixture of MHR • Monotone hazard rate distribution • Hazard rate of : • if • MHR distribution: is a non-decreasing function over 1 n

  10. Our results: mixture of MHR • Algorithm to learn a mixture of MHR distributions • Sample complexity: • Running time: bit operations • Lower Bound: samples

  11. Compare with parameter estimation • Parameter estimation [KMV10, MV 10] • Learn a mixture of Gaussians • Independently draw a few samples from • Estimate the parameters of each Gaussian component accurately • Number of samples inherently exponentially depends on , even for a mixture of 1-dimensional normal distributions [MV10]

  12. Compare with parameter estimation • Parameter estimation needs at least exp() samples to learn a mixture of binomial distributions • Similar to the lower bound in [MV 10] • Density estimation allows to estimate non parametric distributions • E.g. log-concave, unimodal, MHR • Density estimation for mixture of binomial distributions over using samples • Binomial distribution is log-concave

  13. Outline • Learning algorithm based on decomposition • Structural results for log-concave, unimodal, MHR distributions

  14. Flat decomposition • Key definition: distribution is -flat if there exists a partition of into intervals such that • is an -flat decomposition for • is obtained by "flattening" within each interval • for

  15. Flat decomposition 1 n

  16. Learn -flat distributions • Main general Thm: Let = {all the -flat distributions}. There is an algorithm which draws samples from , and outputs a hypothesis such that . • Linear running time with respect to the number of samples

  17. Easier problem: known decomposition • Given • Samples from an -flat distribution • -flat decompositionfor • Idea: estimate probability mass of every interval in • samples are enough

  18. Real problem: unknown decomposition • Only given samples from a -flat distribution • Exists some-flat decomposition for , but unknown • A useful fact [DDS+ 13]: Ifis a -flat decomposition of , and is a “refinement” of , is a -flat decomposition of • If know a refinement of , it is good

  19. Unknown flat decomposition (cont) • Idea: partition [n] into intervals each with small probability mass, • Achieve by sampling from 1 n

  20. Unknown flat decomposition (cont) • Exist (unknown) • Refinement of both and • intervals 1 n

  21. Unknown flat decomposition (cont) • Exist • Refinement of bothand • intervals • -flat decomposition for 1 n

  22. Unknown flat decomposition (cont) • Compare and 1 1 n n

  23. Unknown flat decomposition (cont) • If the total probability mass of every intervals of is at most , then • Partition [n] into intervals each with probability mass at most • samples are enough

  24. Learn -flat distributions • Main general Thm: Let {all the -flat distributions}. There is an algorithm which draws samples from , and outputs a hypothesis such that

  25. Learn mixture of distributions • Lem:Amixture of -flat distributions has an -flat decomposition • Tight for interesting distribution classes • Thm(Learn mixture): Let be a mixture of -flat distributions. There is an algorithm which draws samples, and outputs a hypothesis s.t.

  26. First application: learning mixture of log-concave distributions • Recall definition: • for • Lem: Every log-concave distribution is-flat • Learn a mixture of log-concave distributions with samples

  27. Second application: learning mixture of unimodal distribution • Lem: Every unimodal distribution is -flat [Bir87, DDS+13] • Learn a mixture of unimodal distribution with samples

  28. Third application: learning mixture of MHR distribution • Monotone hazard rate distribution • Hazard rate of : • if • is a non-decreasing function over • Lem: Every MHR distribution is -flat • Learn a mixture of MHR distributions with samples

  29. Conclusion and further directions • Flat decomposition is a useful way to study mixtures of structured distributions • Extend to higher dimension? • Efficient algorithm with optimal sample complexity

  30. Thank you !

More Related