a survey on distance metric learning part 2 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Survey on Distance Metric Learning (Part 2) PowerPoint Presentation
Download Presentation
A Survey on Distance Metric Learning (Part 2)

Loading in 2 Seconds...

play fullscreen
1 / 30

A Survey on Distance Metric Learning (Part 2) - PowerPoint PPT Presentation


  • 136 Views
  • Uploaded on

A Survey on Distance Metric Learning (Part 2). Gerry Tesauro IBM T.J.Watson Research Center. Acknowledgement. Lecture material shamelessly adapted from the following sources: Kilian Weinberger: “Survey on Distance Metric Learning” slides IBM summer intern talk slides (Aug. 2006)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Survey on Distance Metric Learning (Part 2)' - lane-beard


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a survey on distance metric learning part 2

A Survey on Distance Metric Learning (Part 2)

Gerry Tesauro

IBM T.J.Watson Research Center

acknowledgement
Acknowledgement
  • Lecture material shamelessly adapted from the following sources:
    • Kilian Weinberger:
      • “Survey on Distance Metric Learning” slides
      • IBM summer intern talk slides (Aug. 2006)
    • Sam Roweis slides (NIPS 2006 workshop on “Learning to Compare Examples”)
    • Yann LeCun talk slides (CVPR 2005, 2006)
outline part 2
Outline – Part 2
  • Neighbourhood Components Analysis (Golderberger et al.), Metric Learning by Collapsing Classes (Globerson & Roweis)
  • Metric Learning for Kernel Regression (Weinberger & Tesauro)
  • Metric learning for RL basis function construction (Keller et al.)
  • Similarity learning for image processing (LeCun et al.)
neighborhood component analysis
Neighborhood Component Analysis

Distance metric for visualization and kNN

(Goldberger et. al. 2004)

killing three birds with one stone
Killing three birds with one stone:

We construct a method for linear dimensionality reduction

that generates a meaningful distance metric

optimally tuned for distance-based kernel regression

kernel regression
Kernel Regression
  • Given training set {(xj , yj), j=1,…,N} where x is -dim vector and y is real-valued, estimate value of a test point xi by weighted avg. of samples:

where kij = kD (xi, xj) is a distance-based kernel function using distance metric D

choice of kernel
Choice of Kernel
  • Many functional forms for kijcan be used in MLKR; our empirical work uses the Gaussian kernel

where σ is a kernel width parameter (can set σ=1 W.L.O.G. since we learn D)

softmax regression estimate similar to Roweis’ softmax classifier

slide9

Distance Metric for Nearest Neighbor Regression

Learn a linear transformation that allows to estimate the value of a test point from its nearest neighbors

mahalanobis metric
Mahalanobis Metric

Distance function is a pseudo Mahalanobis metric (Generalizes Euclidean distance)

general metric learning objective
General Metric Learning Objective
  • Find parmaterized distance function Dθ that minimizes total leave-one-out cross-validation loss function
    • e.g. params θ = elements Aij of A matrix
  • Since we’re solving for A not M, optimization is non-convex  use gradient descent
gradient computation
Gradient Computation

where xij = xi – xj

  • For fast implementation:
    • Don’t sum over all i-j pairs, only go up to ~1000 nearest neighbors for each sample i
    • Maintain nearest neighbors in a heap-tree structure, update heap tree every 15 gradient steps
    • Ignore sufficiently small values of kij ( < e-34 )
    • Even better data structures: cover trees, k-d trees
learned distance metric example
Learned Distance Metric example

orig. Euclidean D < 1

learned D < 1

twin peaks test
“Twin Peaks” test

Training:

n=8000

we added 3 dimensions with 1000% noise

we rotated 5 dimensions randomly

input variance
Input Variance

Noise

Signal

output variance
Output Variance

Signal

Noise

dimreduction with mlkr
DimReduction with MLKR
  • FG-NET face data: 82 persons, 984 face images w/age
dimreduction with mlkr1
DimReduction with MLKR
  • FG-NET face data: 82 persons, 984 face images w/age
dimreduction with mlkr2
DimReduction with MLKR

PowerManagement data (d=21)

  • Force A to be rectangular
  • Project onto eigenvectors of A
  • Allows visualization of data
unity data center prototype

Resource

Arbiter

App

Manager

App

Manager

Server

Server

Server

Server

Server

Server

Server

Server

App

Manager

Unity Data Center Prototype
  • Objective: Learn long-range resource value estimates for each application manager
  • State Variables (~48):
    • Arrival rate
    • ResponseTime
    • QueueLength
    • iatVariance
    • rtVariance
  • Action: # of servers allocated
  • by Arbiter
  • Reward: SLA(Resp. Time)

Maximize Total SLA Revenue

5 sec

Demand

(HTTP req/sec)

Demand

(HTTP req/sec)

Value(#srvrs)

Value(#srvrs)

Value(#srvrs)

SLA

SLA

SLA

Value(RT)

WebSphere 5.1

Value(#srvrs)

WebSphere 5.1

Value(RT)

DB2

DB2

Trade3

Batch

Trade3

8 xSeries servers

(Tesauro, AAAI 2005; Tesauro et al., ICAC 2006)

power performance management
Power & Performance Management
  • Objective: Managing systems to multi-discipline objectives: minimize Resp. Time and minimize Power Usage
  • State Variables (21):
    • Power Cap
    • Power Usage
    • CPU Utilization
    • Temperature
    • # of requests arrived
    • Workload intensity (# Clients)
    • Response Time
  • Action: Power Cap
  • Reward: SLA(Resp. Time) – Power Usage

(Kephart et al., ICAC 2007)

metric learning for rl basis function construction keller et al icml 2006
Metric Learning for RL basis function construction (Keller et al. ICML 2006)
  • RL Dataset of state-action-reward tuples {(si, ai, ri), i=1,…,N}
value iteration
Value Iteration
  • Define an iterative “bootstrap” calculation:
  • Each round of VI must iterate over all states in the state space
  • Try to speed this up using state aggregation (Bertsekas & Castanon, 1989)
  • Idea: Use NCA to aggregate states:
    • project states into lower-dim rep; keep states with similar Bellman error close together
    • use projected states to define a set of basis functions {}
    • learn linear value function over basis functions: V =  θii
chopra et al 2005
Chopra et. al. 2005

Similarity metric for image verification.

Problem: Given a pair of face-images,

decide if they are from the same person.

chopra et al 20051
Chopra et. al. 2005

Similarity metric for image verification.

Problem: Given a pair of face-images,

decide if they are from the same person.

Too difficult for linear mapping!