Fst morphology
Download
1 / 33

FST Morphology - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

FST Morphology. Miriam Butt October 2003. Based on Beesley and Karttunen 2003. Recap. Last Time: Finite State Automata can model most anything that involes a finite amount of states. We modeled a Coke Machine and saw that it could also be thought of as defining a language.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' FST Morphology' - clare-ware


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Fst morphology

FST Morphology

Miriam Butt

October 2003

Based on Beesley and Karttunen 2003


Recap
Recap

Last Time: Finite State Automata can model most anything that involes a finite amount of states.

We modeled a Coke Machine and saw that it could also be thought of as defining a language.

We will now look at the extension to natural language more closely.



A three word language
A Three-Word Language

i

g

r

e

t

c

a

n

t

o

m

a

e

s


Analysis a successful match
Analysis: A Successful Match

i

g

r

e

t

c

a

n

t

o

m

a

e

s

Input:

m

e

s

a


Rejects
Rejects

The analysis of libro, tigra, cant, mesas will fail.

Why?

Use: for example for a spell checker


Transducers beyond accept and reject
Transducers: Beyond Accept and Reject

i

g

r

e

t

i

g

r

e

t

c

a

n

t

o

c

a

n

t

o

m

a

e

s

m

a

e

s


Transducers beyond accept and reject1
Transducers: Beyond Accept and Reject

Analysis Process:

  • Start at the Start State

  • Match the input symbols of string against the lower-side symbol on the arcs, consuming the input symbols and finding a path to a final state.

  • If successful, return the string of upper-side symbols on the path as the result.

  • If unsucessful, return nothing.


A two level transducer
A Two-Level Transducer

i

g

r

e

t

i

g

r

e

t

c

a

n

t

o

c

a

n

t

o

m

a

e

s

m

a

e

s

Input:

m

e

s

a

Output:

m

e

s

a


A lexical transducer
A Lexical Transducer

+PresInd

+Sg

+Verb

c

a

n

t

a

r

+1P

c

a

n

t

0

0

0

0

o

0

Input:

c

a

n

t

o

Output:

c

a

n

t

a

r

+Verb

+PresInd

+1P

+Sg

One Possible Path through the Network


A lexical transducer1
A Lexical Transducer

c

a

n

t

o

+Sg

+Noun

+Masc

c

a

n

t

o

0

0

0

Input:

c

a

n

t

o

Output:

c

a

n

t

o

+Noun

+Masc

+Sg

Another Possible Path through the Network

(found through backtracking)


Why transducer
Why Transducer?

General Definition: Device that converts energy from one form to another.

In this Context: Device that converts one string of symbols into another another string of symbols.


The tags
The Tags

Tags or Symbols like +Noun or +Verb are arbitrary: the naming convention is determined by the (computational) linguist and depends on the larger picture (type of theory/type of application).

One very successful tagging/naming convention is the Penn Treebank Tag Set


The tags1
The Tags

What kind of Tags might be useful?


Generation vs analysis
Generation vs. Analysis

The same finite state transducers we have been using for the analysis of a given surface string can also be used in reverse: for generation.

The XRCE people think of analysis as lookup, of generation of lookdown.


Generation lookdown
Generation --- Lookdown

+PresInd

+Sg

+Verb

c

a

n

t

a

r

+1P

c

a

n

t

0

0

0

0

o

0

Input:

c

a

n

t

a

r

+Verb

+PresInd

+1P

+Sg

Output:

c

a

n

t

o


Generation lookdown1
Generation --- Lookdown

Analysis Process:

  • Start at the Start State and the beginning of the input string

  • Match the input symbols of string against the upper-side symbols on the arcs, consuming the input symbols and finding a path to a final state.

  • If successful, return the string of lower-side symbols on the path as the result.

  • If generation is unsuccessful, return nothing.


Tokenization
Tokenization

General String Conversion: Tokenizer

Task: Divide up running text into individual tokens

  • couldn’t -> could not

  • to and from -> to-and-fro

  • ,,, -> ,

  • The farmer walked -> The^TBfarmer^TBwalked


Sharing structure and sets
Sharing Structure and Sets

Networks can be compressed quite cleverly (p. 16-17).

Sets:

FST Networks are based on formal language theory. This includes basics of set theory:

membership, union, intersection,

subtraction, complementation


Some basic concepts representations
Some Basic Concepts/Representations

Empty Language

Empty-String Language


Some basic concepts representations1

a

Some Basic Concepts/Representations

A Network for an Infinite Language

?

A Network for the Universal Language


Relations
Relations

Ordered Set: members are ordered.

Ordered Pair: <A,B> vs. <B,A>

  • Relation: set whose members are ordered pairs

    • Family Trees (p. 21)

    • { <“cantar+Verb+PresInd+1P+Sg”, “canto”>,

    • <“canto+Noun+Masc+Sg”, “canto”>,

    • ... }


Relations1

A

B

a

C

b

...

c

Z

z

Relations

An Infinite Relation:.

Identity Relation: <“canto”, “canto”>


Basic set operations
Basic Set Operations

  • union(p. 24)

  • intersection (p. 25)

  • subtraction (p. 25)


Concatenation
Concatenation

One can also concatenate two existing languages (finite state networks with one another to build up new words productively/dynamically).

This works nicely, but one has to write extra rules to avoid things like: *trys, *tryed, though trying is okay.


Concatenation1
Concatenation

w

o

r

k

Network for the Language {“work”}

s

n

g

i

d

e

Network for the Language {“s”, “ed”, “ing”}


Concatenation2
Concatenation

s

k

w

o

r

g

n

i

e

d

Concatenation of the two networks

What strings/language does this result in?


Composition
Composition

Composition is an operation on two relations.

Composition of the two relations <x,y> and <y,z> yields <x, z>

Example: <“cat”, “chat”> with <“chat”, “Katze”> gives <“cat”, “Katze”>


Composition1
Composition

c

a

t

c

h

a

t

c

h

a

t

K

a

t

z

e


Composition2
Composition

c

a

t

c

h

a

t

K

a

t

z

e

Merging the two networks


Composition3
Composition

c

a

t

K

a

t

z

e

The Composition of the Networks

What is this reminiscent of?


Composition rule application
Composition + Rule Application

Composition can be used to encode phonology-style rules.

E.g., Lexical Phonology assumes several iterations/levels at which different kinds of rules apply.

This can be modeled in terms of cascades of FST transducers (p. 35).

These can then be composed together into one single transducer.


Next time
Next Time

  • Begin working with xfst

  • Now: Exercises 1.10.1, 1.10.2, 1.10.4

  • Optional: 1.10.3


ad