- 100 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Bayesian Sets' - ethan-patton

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Outline

- Introduction
- Bayesian Sets
- Implementation
- Binary data
- Exponential families
- Experimental results
- Conclusions

Introduction

- Inspired by “GoogleTM Sets”
- What do Jesus and Darwin have in common?
- Two different views on the origin of man
- There are colleges at Cambridge University named after them
- The objective is to retrieve items from a concept of cluster, given a query consisting of a few items from that cluster

Introduction

- Consider a universe of items , which can be a set of web pages, movies, people or any other subjects depending on the application
- Make a query of small subset of items , which are assumed be examples of some cluster in the data.
- The algorithm provides a completion to the query set, . It presumably includes all the elements in and other elements in that are also in this cluster.

Introduction

- View the problem from two perspectives:
- Clustering on demand
- Unlike other completely unsupervised clustering algorithm, here the query provides supervised hints or constraints as to the membership of a particular cluster.
- Information retrieval
- Retrieve the information that are relevant to the query and rank the output by relevance to the query

Bayesian Sets

- Very simple algorithm
- Given and , we aim to rank the elements of by how well they would “fit into” a set which includes
- Define a score for each :
- From Bayes rule, the score can be re-written as:

Bayesian Sets

- Intuitively, the score compares the probability that x and were generated by the same model with the sameunknown parameters θ, to the probability that x and came from models with different parameters θ and θ’.

Sparse Binary Data

- Assume each item is a binary vector where each component is a binary variable from an independent Bernoulli distribution:
- The conjugate prior for a Bernoulli distribution is a Beta distribution:
- For a query

where

Sparse Binary Data

- The score can be computed as:
- If we take a log of the score and put the entire data set into one large matrix X with J columns, we can compute a vector s of log scores for all points using a single matrix vector multiplication:

where

and

Exponential Families

- If the distribution for the model is not a Bernoulli distribution, but in the form of exponential families:

we can use the conjugate prior:

so that the score is:

Experimental results

- The experiments are performed on three different datasets: the Grolier Encyclopedia dataset, the EachMovie dataset and NIPS authors dataset.
- The running times of the algorithm is very fast on all three datasets:

Conclusions

- A simple algorithm which takes a query of a small set of items and returns additional items from belonging to this set.
- The score is computed w.r.t a statistical model and unknown model parameters are all marginalized out.
- With conjugate priors, the score can be computed exactly and efficiently.
- The methods does well when compared to Google Sets in terms of set completions.
- The algorithm is very flexible in that it can be combined with a wide variety of types of data and probabilistic model.

Download Presentation

Connecting to Server..