Talk Announcement

Talk Announcement • Michael Jordan (no, not that Michal Jordan) • Statistical Machine Learning Researcher from Berkeley • Very relevant topic:“Recent Developments in Nonparametric Hierarchical Bayesian Modeling” • November 13 (Monday), 2006 at 4:00 p.m. • 1404 Siebel Center for Computer Science • We will require attendance for CS 446 – try to arrive earlyYour next assignment! • If you cannot attend (or even if you can):http://www.stat.berkeley.edu/~jordan/674.pdf • Reception after the talk in the 2nd Floor Atrium of Siebel Center – you are invited CS446-Fall ’06

Written Assignment due Wednesday 11/15 • Due next Wednesday in class • One paragraph (or so but no more than one page double spaced, readable font…) • “What I learned from Michael Jordan’s research of talk/paper” • Something you didn’t know before • Something you understand now (at least a little) • Something IMPORTANT • REACH! • Some faculty love to show off technical prowess • Michael will not be easy to follow • Be tenacious • Try to see the forest while he’s describing tree leaves • Tell me about what you think the forest is / might be / shouldbe CS446-Fall ’06

Next (future) Programming Assignment • Not assigned yet • Compare naïve Bayes and logistic regression as examples of generative and discriminative classifiers • A new text chapter available from Mitchell:Generative and Discriminative Classifiers: Naive Bayes and Logistic Regressionhttp://www.cs.cmu.edu/%7Etom/NewChapters.htmlor navigate down fromhttp://www.cs.cmu.edu/~tom/ • A classic paper:A. Y. Ng and M. Jordan. On discriminative vs. generative classifiers: A comparison of logistic regression and naive Bayes. In Proceeding of Fourteenth Neural Information Processing Systems, 2002. CS446-Fall ’06

Jordan’s Abstract Much research in statistics and machine learning is concerned with controlling some form of tradeoff between flexibility and variability. In the Bayesian approach, such control is often exerted via hierarchies---stochastic relationships among prior distributions. Nonparametric Bayesian statisticians work with priors that are general stochastic processes (e.g., distributions on spaces of continuous functions, spaces of monotone functions, or general measure spaces). Thus flexibility is emphasized and it is of particular importance to exert hierarchical control. In this talk I discuss Bayesian hierarchical modeling in the setting of two particularly interesting stochastic processes: the Dirichlet process and the beta process. These processes are discrete with probability one, and have interesting relationships to various random combinatorial objects. They yield models with open-ended numbers of "clusters" and models with open-ended numbers of "features," respectively. I discuss Bayesian modeling based on hierarchical Dirichlet process priors and hierarchical beta process priors, and present applications of these models to problems in bioinformatics, information retrieval and computational vision. CS446-Fall ’06

Jordan’s Abstract • Much research in statistics and machine learning is concerned with controlling some form of tradeoff between flexibility and variability. Modeling, Bias, Variance • In the Bayesian approach, such control is often exerted via hierarchies---stochastic relationships among prior distributions. Hierarchical Bayes, Hyper-Parameters • Nonparametric Bayesian statisticians work with priors that are general stochastic processes (e.g., distributions on spaces of continuous functions, spaces of monotone functions, or general measure spaces). Non-parametric Models, Order Statisitcs, Weaker but More Robust Prior Assumptions, ex: samples from increasing fcn? (linear regression, goodness of fit) • Thus flexibility is emphasized and it is of particular importance to exert hierarchical control. • In this talk I discuss Bayesian hierarchical modeling in the setting of two particularly interesting stochastic processes: the Dirichlet process and the beta process. Stochastic Processes as characterizing transitions among states where a state is an assignment to a set of random variables (recall MDPs) • These processes are discrete with probability one, and have interesting relationships to various random combinatorial objects. Dirichlet Process ex: Chinese Restaurant Process • They yield models with open-ended numbers of "clusters" and models with open-ended numbers of "features," respectively. Ex. Chinese Restaurant Process • I discuss Bayesian modeling based on hierarchical Dirichlet process priors and hierarchical beta process priors, and present applications of these models to problems in bioinformatics, information retrieval and computational vision. CS446-Fall ’06

Chinese Restaurant Process • A Chinese restaurant serves an infinite number of alternative dishes and has an infinite number of tables, each with infinite capacity. • Each new customer either sits at a table that is already occupied, with probability proportional to the number of customers already sitting at that table, or sits alone at a table not yet occupied, with probability θ / (n + θ), where n is how many customers were already in the restaurant. • Customers who sit at an occupied table must order some dish already being served in the restaurant, but customers starting a new table are served a dish at random according to D. • DP(θ,D) is the distribution over the different dishes as n increases • Note the extreme flexibility afforded over the dishes • Clustering microarray gene expression data, Natural language modeling, Visual scene classification • It “invents” clusters to best fit the data. • These clusters can be semantically interpreted: images of shots in basketball games, outdoor scenes on gray days, beach scenes CS446-Fall ’06

Talk Announcement

Talk Announcement

Presentation Transcript

ANNOUNCEMENT

Announcement

Announcement

Announcement

Announcement

Announcement

ANNOUNCEMENT

Announcement

Announcement

ANNOUNCEMENT

Announcement

Announcement

Announcement

Announcement

Announcement

Announcement

Announcement

Announcement

Announcement

Announcement