Bayesian sets
This presentation is the property of its rightful owner.
Sponsored Links
1 / 15

Bayesian Sets PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on
  • Presentation posted in: General

Bayesian Sets. Zoubin Ghahramani and Kathertine A. Heller NIPS 2005. Presented by Qi An Mar. 17 th , 2006. Outline. Introduction Bayesian Sets Implementation Binary data Exponential families Experimental results Conclusions. Introduction. Inspired by “Google TM Sets”

Download Presentation

Bayesian Sets

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bayesian sets

Bayesian Sets

Zoubin Ghahramani and Kathertine A. Heller

NIPS 2005

Presented by Qi An

Mar. 17th, 2006


Outline

Outline

  • Introduction

  • Bayesian Sets

  • Implementation

    • Binary data

    • Exponential families

  • Experimental results

  • Conclusions


Introduction

Introduction

  • Inspired by “GoogleTM Sets”

  • What do Jesus and Darwin have in common?

    • Two different views on the origin of man

    • There are colleges at Cambridge University named after them

  • The objective is to retrieve items from a concept of cluster, given a query consisting of a few items from that cluster


Introduction1

Introduction

  • Consider a universe of items , which can be a set of web pages, movies, people or any other subjects depending on the application

  • Make a query of small subset of items , which are assumed be examples of some cluster in the data.

  • The algorithm provides a completion to the query set, . It presumably includes all the elements in and other elements in that are also in this cluster.


Introduction2

Introduction

  • View the problem from two perspectives:

    • Clustering on demand

      • Unlike other completely unsupervised clustering algorithm, here the query provides supervised hints or constraints as to the membership of a particular cluster.

    • Information retrieval

      • Retrieve the information that are relevant to the query and rank the output by relevance to the query


Bayesian sets1

Bayesian Sets

  • Very simple algorithm

  • Given and , we aim to rank the elements of by how well they would “fit into” a set which includes

  • Define a score for each :

  • From Bayes rule, the score can be re-written as:


Bayesian sets2

Bayesian Sets

  • Intuitively, the score compares the probability that x and were generated by the same model with the sameunknown parameters θ, to the probability that x and came from models with different parameters θ and θ’.


Bayesian sets3

Bayesian Sets


Sparse binary data

Sparse Binary Data

  • Assume each item is a binary vector where each component is a binary variable from an independent Bernoulli distribution:

  • The conjugate prior for a Bernoulli distribution is a Beta distribution:

  • For a query

    where


Sparse binary data1

Sparse Binary Data

  • The score can be computed as:

  • If we take a log of the score and put the entire data set into one large matrix X with J columns, we can compute a vector s of log scores for all points using a single matrix vector multiplication:

    where

    and


Exponential families

Exponential Families

  • If the distribution for the model is not a Bernoulli distribution, but in the form of exponential families:

    we can use the conjugate prior:

    so that the score is:


Experimental results

Experimental results

  • The experiments are performed on three different datasets: the Grolier Encyclopedia dataset, the EachMovie dataset and NIPS authors dataset.

  • The running times of the algorithm is very fast on all three datasets:


Experimental results1

Experimental results


Experimental results2

Experimental results


Conclusions

Conclusions

  • A simple algorithm which takes a query of a small set of items and returns additional items from belonging to this set.

  • The score is computed w.r.t a statistical model and unknown model parameters are all marginalized out.

  • With conjugate priors, the score can be computed exactly and efficiently.

  • The methods does well when compared to Google Sets in terms of set completions.

  • The algorithm is very flexible in that it can be combined with a wide variety of types of data and probabilistic model.


  • Login