edit distances n.
Download
Skip this Video
Download Presentation
Edit Distances

Loading in 2 Seconds...

play fullscreen
1 / 35

Edit Distances - PowerPoint PPT Presentation


  • 109 Views
  • Uploaded on

Edit Distances. William W. Cohen. Midterm progress reports. Talk for 5min per team You probably want to have one person speak Talk about The problem & dataset The baseline results What you plan to do next Send Brendan 3-4 slides in PDF by Mon night. Plan for this week.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Edit Distances' - gregory-bernard


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
edit distances

Edit Distances

William W. Cohen

midterm progress reports
Midterm progress reports
  • Talk for 5min per team
    • You probably want to have one person speak
  • Talk about
    • The problem & dataset
    • The baseline results
    • What you plan to do next
  • Send Brendan 3-4 slides in PDF by Mon night
plan for this week
Plan for this week
  • Edit distances
    • Distance(s,t) = cost of best edit sequence that transforms st
    • Found via….
  • Learning edit distances
    • Probabilistic generative model: pair HMMs
    • Learning now requires EM
      • Detour: EM for plain ‘ol HMMS
    • EM for pair HMMs
  • Why EM works
  • Discriminative learning for pair HMMs
motivation
Motivation
  • Common problem: classify a pair of strings (s,t) as “these denote the same entity [or similar entities]”
    • Examples:
      • (“Carnegie-Mellon University”, “Carnegie Mellon Univ.”)
      • (“Noah Smith, CMU”, “Noah A. Smith, Carnegie Mellon”)
  • Applications:
    • Co-reference in NLP
    • Linking entities in two databases
    • Removing duplicates in a database
    • Finding related genes
    • “Distant learning”: training NER from dictionaries
levenshtein distance example
Levenshtein distance - example
  • distance(“William Cohen”, “Willliam Cohon”)

s

gap

alignment

t

op

cost

computing levenshtein distance 2
Computing Levenshtein distance - 2

D(i,j) = score of best alignment from s1..si to t1..tj

D(i-1,j-1) + d(si,tj) //subst/copy

D(i-1,j)+1 //insert

D(i,j-1)+1 //delete

= min

(simplify by letting d(c,d)=0 if c=d, 1 else)

also let D(i,0)=i (for i inserts) and D(0,j)=j

computing levenshtein distance 4
Computing Levenshtein distance – 4

D(i-1,j-1) + d(si,tj) //subst/copy

D(i-1,j)+1 //insert

D(i,j-1)+1 //delete

D(i,j) = min

A trace indicates where the min value came from, and can be used to find edit operations and/or a best alignment (may be more than 1)

extensions
Extensions
  • Add parameters for differential costs for delete, substitute, … operations
    • Eg “gap cost” G, substitution costs dxy(x,y)
  • Allow s to match a substring of t (Smith-Waterman)
  • Model cost of length-n insertion as A + Bn instead of Gn
    • “Affine distance”
    • Need to remember if a gap is open in s, t, or neither
forward backward for hmms
Forward-backward for HMMs

All paths to st=i and all emissions up to and including t

All paths after st=i and all emissions aftert

em for hmms
EM for HMMs

pass thru state i at t

and emit a at t

pass thru states i,j at t,t+1

…and con’t to end

pair hmm example
Pair HMM Example

1

Sample run: zT = <h,t>,<e,e><e,e><h,h>,<e,->,<e,e>

Strings x,y produced by zT: x=heehee, y=teehe

Notice that x,y is also produced by z4 + <e,e>,<e,-> and many other edit strings

pair hmm inference
Pair HMM Inference

h

e

α(3,2)

h

a

h

pair hmm inference1
Pair HMM Inference

h

e

α(3,2)

h

a

h

em to learn edit distances
EM to learn edit distances
  • Is this really like edit distances? Not really:
    • Sim(x,x) ≠1
    • Generally sim(x,x) gets smaller with longer x
    • Edit distance is based on single best sequence; Pr(x,y) is based on weighted cost of all successful edit sequences
  • Will learning work?
    • Unlike linear models no guarantee of global convergence: you might not find a good model even if it exists
back to r y paper
Back to R&Y paper...
  • They consider “coarse” and “detailed” models, as well as mixturesof both.
  • Coarse model is like a back-off model – merge edit operations into equivalence classes (e.g. based on equivalence classes for chars).
  • Test by learning distance for K-NN with an additional latent variable
k nn with latent prototypes
K-NN with latent prototypes

y

test example y (a string of phonemes)

learned phonetic distance

possible prototypes x (known word pronounciation )

x1

x2

x3

xm

words from dictionary

w1

w2

wK

k nn with latent prototypes1
K-NN with latent prototypes

Method needs (x,y) pairs to train a distance – to handle this, an additional level of E/M is used to pick the “latent prototype” to pair with each y

y

learned phonetic distance

x1

x2

x3

xm

w1

w2

wK

plan for this week1
Plan for this week
  • Edit distances
    • Distance(s,t) = cost of best edit sequence that transforms st
    • Found via….
  • Learning edit distances: Ristad and Yianolis
    • Probabilistic generative model: pair HMMs
    • Learning now requires EM
      • Detour: EM for plain ‘ol HMMS
    • EM for pair HMMs
  • Why EM works
  • Discriminative learning for pair HMMs
slide21

EM:

X = data

θ = model

z = something you can’t observe

Problem:

“complete data likelihood”

Algorithm: Iteratively improveθ1 θ2 …

Θn=

  • Mixturess: z is hidden mixture component …
  • HMMs: z is hidden state sequence string
  • Pair HMMs: z is hidden sequence of pairs (x1,y1),… given (x,y)
  • Latent-variable topic models (e.g., LDA): z is assignment of words to topics
  • ….
jensen s inequality1
Jensen’s inequality

and f convex 

x3

slide25

X = data

θ = model

z = something you can’t observe

Let’s think about moving from θn(our current parameter vector) to some new θ(the next one, hopefully better)

We want to optimize L(θ)- L(θn ) …. using something like…

comments
Comments
  • Nice because we often know how to
    • Do learning in the model (if hidden variables are known)
    • Do inference in the model (to get hidden variables)
    • And that’s all we need to do….
  • Convergence: local, not global
  • Generalized EM: E but don’t M, just improve
key ideas
Key ideas
  • Pair of strings (x,y) associated with a label: {match,nonmatch}
  • Classification done by a pair HMM with two non-initial states: {match, non-match} w/o transitions between them
  • Model scores alignments – emissions sequences – as match/nonmatch.
key ideas1
Key ideas

Score the alignment sequence:

Edit sequence is featurized:

Marginalize over all alignments to score match v nonmatch:

key ideas2
Key ideas
  • To learn, combine EM and CRF learning:
  • compute expectations over (hidden) alignments
  • use LBFGS to maximize (or at least improve )the parameters, λ
  • repeat……
  • Initialize the model with a “reasonable” set of parameters:
  • hand-tuned parameters for matching strings
  • copy match parameters to non-match state and shrink them to zero.
results
Results

We will come back to this family of methods in a couple of weeks (discriminatively trained latent-variable models).

ad