1 / 87

Finite-State Transducers

Finite-State Transducers. Shallow Processing Techniques for NLP Ling570 October 10, 2011. Announcements. Wednesday online GP meeting scheduling Seminar on Friday: Luke Zettlemoyer (CSE) Automatic grammar induction Treehouse Friday: Classifiers – Memory Lane. Roadmap. Motivation:

job
Download Presentation

Finite-State Transducers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finite-State Transducers Shallow Processing Techniques for NLP Ling570 October 10, 2011

  2. Announcements • Wednesday online • GP meeting scheduling • Seminar on Friday: Luke Zettlemoyer (CSE) • Automatic grammar induction • Treehouse Friday: Classifiers – Memory Lane

  3. Roadmap • Motivation: • FST applications • FST perspectives • FSTs and Regular Relations • FST Operations

  4. FSTs • Finite automaton that maps between two strings • Automaton with two labels/arc • input:output

  5. FST Applications • Tokenization • Segmentation • Morphological analysis • Transliteration • Parsing • Translation • Speech recognition • Spoken language understanding….

  6. Approaches to FSTs • FST as recognizer: • Takes pair of input:output strings • Accepts if in language, o.w. rejects

  7. Approaches to FSTs • FST as recognizer: • Takes pair of input:output strings • Accepts if in language, o.w. rejects • FST as generator: • Outputs pairs of strings in languages

  8. Approaches to FSTs • FST as recognizer: • Takes pair of input:output strings • Accepts if in language, o.w. rejects • FST as generator: • Outputs pairs of strings in languages • FST as translator: • Reads an input string and prints output string

  9. Approaches to FSTs • FST as recognizer: • Takes pair of input:output strings • Accepts if in language, o.w. rejects • FST as generator: • Outputs pairs of strings in languages • FST as translator: • Reads an input string and prints output string • FST as set relator: • Computes relations between sets

  10. FSTs & Regular Relations • FSAs: equivalent to regular languages

  11. FSTs & Regular Relations • FSAs: equivalent to regular languages • FSTs: equivalent to regular relations • Sets of pairs of strings

  12. FSTs & Regular Relations • FSAs: equivalent to regular languages • FSTs: equivalent to regular relations • Sets of pairs of strings • Regular relations: • For all (x,y) in Σ1x Σ2, {(x,y)} is a regular relation • The empty set is a regular relation • If R1,R2 are regular relations, • R1R2 , R1 U R2 and R1* are regular relations

  13. Regular Relation Closures • By definition, Regular Relations are closed under: • Concatenation: R1R2 • Union: R1 U R2 • Kleene *: R1* • Like regular languages

  14. Regular Relation Closures • By definition, Regular Relations are closed under: • Concatenation: R1R2 • Union: R1 U R2 • Kleene *: R1* • Like regular languages • Unlike regular languages, they are NOT closed under: • Intersection:

  15. Regular Relation Closures • By definition, Regular Relations are closed under: • Concatenation: R1R2 • Union: R1 U R2 • Kleene *: R1* • Like regular languages • Unlike regular languages, they are NOT closed under: • Intersection:R1 ={(anb*,cn)} & R2={(a*bm,cm)}, intersection is {(anbn,cn)} => not regular

  16. Regular Relation Closures • By definition, Regular Relations are closed under: • Concatenation: R1R2 • Union: R1 U R2 • Kleene *: R1* • Like regular languages • Unlike regular languages, they are NOT closed under: • Intersection:R1 ={(anb*,cn)} & R2={(a*bn,cn)}, intersection is {(anbn,cn)} => not regular • Difference

  17. Regular Relation Closures • By definition, Regular Relations are closed under: • Concatenation: R1R2 • Union: R1 U R2 • Kleene *: R1* • Like regular languages • Unlike regular languages, they are NOT closed under: • Intersection:R1 ={(anb*,cn)} & R2={(a*bn,cn)}, intersection is {(anbn,cn)} => not regular • Difference • Complementation

  18. Regular Relation Closures • Regular relations are also closed under: • Composition:

  19. Regular Relation Closures • Regular relations are also closed under: • Composition: • Inversion:

  20. Regular Relation Closures • Regular relations are also closed under: • Composition: • Inversion: • Operations: • Projection:

  21. Regular Relation Closures • Regular relations are also closed under: • Composition: • Inversion: • Operations: • Projection: • Identity & cross-product of regular languages

  22. FST Formal Definition • A Finite-State Transducer is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ

  23. FST Formal Definition • A Finite-State Transducer is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ

  24. FST Formal Definition • A Finite-State Transducer is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F

  25. FST Formal Definition • A Finite-State Transducer is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F • A set of transition relations between states: • δsubset Q x (Σuε) x (ΓU ε) x Q

  26. FST Formal Definition • A Finite-State Transducer is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F • A set of transition relations between states: • δsubset Q x (Σuε) x (ΓU ε) x Q • FSAs are a special case of FSTs

  27. FST Operations • Union:

  28. FST Operations • Union: • Concatenation:

  29. FST Operations • Inversion: Switching input and output labels • If T maps from I to O, T-1 maps from O to !

  30. FST Operations • Inversion: Switching input and output labels • If T maps from I to O, T-1 maps from O to I • Composition: • If T1 is a transducer from I1 to O2 and T2 is a transducer from O2 to O3, then T1T2 is a transducer from I1 to O3

  31. FST Operations • Inversion: Switching input and output labels • If T maps from I to O, T-1 maps from O to I • Composition: • If T1 is a transducer from I1 to O2 and T2 is a transducer from O2 to O3, then T1T2 is a transducer from I1 to O3

  32. FST Examples • R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….}

  33. FST Examples • R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….}

  34. FST Examples • R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….}

  35. FST Examples • R(T) = {(ε,ε),(a,b),(aa,bb),(aaa,bbb)….} • R(T) = {(a,x),(ab,xy),(abb,xyy),…}

  36. FST Application Examples • Case folding: • He said  he said

  37. FST Application Examples • Case folding: • He said  he said • Tokenization: • “He ran.”  “ He ran . “

  38. FST Application Examples • Case folding: • He said  he said • Tokenization: • “He ran.”  “ He ran . “ • POS tagging: • They can fish  PRO VERB NOUN

  39. FST Application Examples • Pronunciation: • B AH T EH R  B AH DX EH R • Morphological generation: • Fox s  Foxes • Morphological analysis: • cats  cat s

  40. FST Application Examples • Pronunciation: • B AH T EH R  B AH DX EH R

  41. FST Application Examples • Pronunciation: • B AH T EH R  B AH DX EH R • Morphological generation: • Fox s  Foxes

  42. FST Application Examples • Pronunciation: • B AH T EH R  B AH DX EH R • Morphological generation: • Fox s  Foxes • Morphological analysis: • cats  cat s

  43. FST Algorithms • Recognition: • Is a given string pair (x,y) accepted by the FST? • (x,y)  yes/no

  44. FST Algorithms • Recognition: • Is a given string pair (x,y) accepted by the FST? • (x,y)  yes/no • Composition: • Given a pair of transducers T1 and T2, create a new transducer T1T2.

  45. FST Algorithms • Recognition: • Is a given string pair (x,y) accepted by the FST? • (x,y)  yes/no • Composition: • Given a pair of transducers T1 and T2, create a new transducer T1T2. • Transduction: • Given an input string and an FST, compute the output string. • x  y

  46. WFST Definition • A Probabilistic Finite-State Automaton is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F • A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q

  47. WFST Definition • A Probabilistic Finite-State Automaton is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F • A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q • Initial state probabilities: Q  R+

  48. WFST Definition • A Probabilistic Finite-State Automaton is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F • A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q • Initial state probabilities: Q  R+ • Transition probabilities: δ R+

  49. WFST Definition • A Probabilistic Finite-State Automaton is a 7-tuple: • A finite set of states: Q • A finite set of input symbols: Σ • A finite set of output symbols: Γ • A finite set of initial states: I • A finite set of final states: F • A set of transitions: δsubset Q x (Σuε) x (ΓU ε) x Q • Initial state probabilities: Q  R+ • Transition probabilities: δ R+ • Final state probabilities: Q  R+

  50. Summary • FSTs • Equivalent to regular relations • Transduce strings to strings • Useful for range of applications

More Related