580.691 Learning Theory Reza Shadmehr Maximum likelihood Integration of sensory modalities - PowerPoint PPT Presentation

580.691  Learning Theory
Download
1 / 28

  • 78 Views
  • Uploaded on
  • Presentation posted in: General

580.691 Learning Theory Reza Shadmehr Maximum likelihood Integration of sensory modalities. We showed that linear regression, steepest decent algorithm, and LMS all minimize the cost function.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

580.691 Learning Theory Reza Shadmehr Maximum likelihood Integration of sensory modalities

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

580.691 Learning Theory

Reza Shadmehr

Maximum likelihood

Integration of sensory modalities


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

We showed that linear regression, steepest decent algorithm, and LMS all minimize the cost function

This is just one possible cost function. What is the justification for this cost function? Today we will see that this cost function gives rise to the maximum likelihood estimate if the data is normally distributed.


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

Expected value and variance of scalar random variables


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

  • Statistical view of regression

  • Suppose the outputs y were actually produced by the process:

The “true” underlying process

What we measured

Our model of the process

Given a constant X, the underlying process would give us different y every time we observe it. Given each “batch”, we fit our parameters. What is the probability of observing the particular y in trial i?


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

  • Probabilistic view of linear regression

  • Linear regression expresses the random variable y(n) in terms of the input-independent variation e around the mean

y

Let us assume:

x

variance

Normal distribution

Mean zero

Then outputs y given x are:


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

0.8

0.6

0.4

0.2

1

2

3

4

5

6

  • Probabilistic view of linear regression

  • As variance, i.e., spread, of the residual e increases, our confidence about our model’s guess decreases.


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

6

4

2

0

-1.5

-1

-0.5

0

0.5

1

1.5

  • Probabilistic view of linear regression

  • Example: suppose the underlying process was:

  • Given some data points, we estimate w and also guess the variance of the noise, we could compute probability of each y that we observed.

  • In this example,

y

0.4

7

0.3

0.2

0.1

y

1.5

x

x

0

-1.5

We want to find a set of parameters that maximize P for all the data.


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

  • Maximum likelihood estimation

  • We view the outputs y(n) as random variables that were generated by a probabilistic process that had some distribution with unknown parameters q (e.g., mean and variance). The “best” guess for q is one that maximizes the joint probability that the observed data came from that distribution.


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

  • Maximum likelihood estimation: uniform distribution

  • Suppose that n numbers y(i) were drawn from a distribution and we need to estimate the parameters of that distribution.

  • Suppose that the distribution was uniform.

Likelihood that the data came from a model with our specific parameter value


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

Maximum likelihood estimation: exponential distribution

Log-likelihood


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

Maximum likelihood estimation: Normal distribution

Now we see that if s is a constant, the log-likelihood is proportional to our cost function (the sum of squared errors!)


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

Maximum likelihood estimation: Normal distribution


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

  • Probabilisticview of linear regression

  • If we assume that y(i) are independently and identically distributed (I.I.D.), conditional on x(i), then the joint conditional distribution of the data y is obtained by taking the product of the individual conditional probabilities:

Given our model, we can assign a probability to our observation.

We want to find parameters that maximize the probability that we will observe data like the one that we were given.


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

  • Probabilisticview of linear regression

  • Given some data D, and two models: (w1,s) and (w2,s), the better model has the larger joint probability for the actually observed data.

8

8

6

6

4

4

2

2

0

0

-1.5

-1

-0.5

0

0.5

1

1.5

-1.5

-1

-0.5

0

0.5

1

1.5


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

7

6

5

4

3

2

1

0

-1.5

-1

-0.5

0

0.5

1

1.5

  • Probabilisticview of linear regression

  • Given some data D, and two models: (w,s1) and (w,s2), the better model has the larger joint probability for the actually observed data.

-38

7

10

´

-38

6

10

´

-38

5

10

´

-38

4

10

´

-38

3

10

´

-38

2

10

´

-38

1

10

´

0

0.6

0.8

1

1.2

1.4

The underlying process here was generated with a s=1, our model was second order, and our joint probability on this data set happened to peak near s=1.


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

7

6

5

4

3

2

1

0

-1.5

-1

-0.5

0

0.5

1

1.5

-40

3

10

´

8

-40

2.5

10

´

-38

7

10

6

´

-40

2

10

´

-38

-40

4

6

10

1.5

10

´

´

-40

1

10

´

-38

2

5

10

´

-41

5

10

´

-38

0

4

10

´

0

0.6

0.8

1

1.2

1.4

-38

-1.5

-1

-0.5

0

0.5

1

1.5

3

10

´

-37

6

10

´

-38

2

10

´

-37

5

10

´

6

-38

1

10

´

-37

4

10

´

-37

3

10

´

4

0

-37

2

10

´

0.6

0.8

1

1.2

1.4

2

-37

1

10

´

0

0.6

0.8

1

1.2

1.4

0

-1.5

-1

-0.5

0

0.5

1

1.5

The same underlying process will generate different D on each run, resulting in different estimates of w and s, despite the fact that the underlying process did not change.


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

  • Likelihood of our model

  • Given some observed data:

  • and model structure:

  • Try to find the parameters w and s that maximize the joint probability over the observed data:

Likelihood that the data came from a model with our specific parameter values


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

  • Maximizing the likelihood

  • It’s easier to maximize the log of the likelihood function.

Log-likelihood

Finding w to maximize the likelihood is equivalent to finding w so to minimize loss function:


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

Finding weights that maximize the likelihood

Log-likelihood

(mx1)

(nxm)

(all remaining terms are scalars)

Above is the ML estimate of w, given model:


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

Finding the noise variance that maximizes the likelihood

Above is the ML estimate of s2, given model:


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

The hiking in the woods problem: combining information from various sources

We have gone on a hiking trip and taken with us two GPS devices, one from a European manufacturer, and the other from a US manufacturer. These devices use different satellites for positioning. Our objective is to figure out how to combine the information from the two sensors.

(a 4x1 vector)

Likelihood function

We want to find the position x that maximizes this likelihood.


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

Our most likely location is one that weighs the reading from each device by the inverse of the device’s probability covariance. In other words, we should discount the reading from each device according to the inverse of each device’s uncertainty.

If we stay still and do not move, the variance in our readings is simply due to noise in the devices.

By combining the information from the two devices, the variance of our estimate is less than the variance of each device.


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

ML estimate


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

Marc Ernst and Marty Banks (2002) were first to demonstrate that when our brain makes a decision about a physical property of an object, it does so by combining various sensory information about that object in a way that is consistent with maximum likelihood state estimation.

Ernst and Banks began by considering a hypothetical situation in which one has to estimate the height of an object. Suppose that you use your index and thumb to hold an object. Your haptic system and your visual system report its height.

If the noise in the two sensors is equal, then the weights that you apply to the sensors are equal as well. This case is illustrated in the left column of next figure. On the other hand, if the noise is larger for proprioception, your uncertainty is greater for that sensor and so you apply a smaller weight to its reading (right column of next fig).


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

Equal uncertainty in vision and prop.

More uncertain of proprioception


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

0.4

0.3

0.2

0.1

2

4

6

8

10

12

0.25

0.2

0.15

0.1

0.05

-2

2

4

6

8

1

0.8

0.6

0.4

0.2

-3

-2

-1

1

2

3

Measuring the noise in a biological sensor

If one was to ask you to report the height of the object, of course you would not report your belief as a probability distribution. To estimate this distribution, Ernst and Banks acquired a psychometric function, shown in the lower part of the graph. To acquire this function, they provided their subjects a standard object of height 5.5cm. They then presented a second object of variable length and asked whether it was taller than the first object. If the subject represented the height of the standard object with a maximum likelihood estimate, then the probability of classifying the second object as being taller is simply the cumulative probability distribution. This is called a psychometric function. The point of subject equality (PSE) is the height at which the probability function is at 0.5.


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

The authors estimated that the noise in the haptic sense was four times larger than the noise in the visual sense.

This implies that in integrating visual and haptic information about an object, the brain should ‘weigh’ the visual information 4 times are much as haptic information.

To test for this, subjects were presented with a standard object for which the haptic information indicated a height of and visual information indicated a height of

Subjects would assign a weight of around 0.8 to the visual information and around 0.2 to the haptic information. To estimate these weights, they presented a second object (for which the haptic and visual information agreed) and ask which one was taller.


580 691 learning theory reza shadmehr maximum likelihood integration of sensory modalities

Summary

The “true” underlying process

What we measured

Our model of the process

ML estimate of model parameters, given X:


  • Login