1 / 29

Finite state transducer (FST)

Finite state transducer (FST). LING 570 Fei Xia Week 3: 10/10/2007. Applications of FSTs. ASR Tokenization Stemmer Text normalization Parsing …. Outline. Regular relation Finite-state transducer (FST) Hw3 Carmel: an FST package. Regular relation. Definition of regular relation.

shasta
Download Presentation

Finite state transducer (FST)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finite state transducer (FST) LING 570 Fei Xia Week 3: 10/10/2007

  2. Applications of FSTs • ASR • Tokenization • Stemmer • Text normalization • Parsing • …

  3. Outline • Regular relation • Finite-state transducer (FST) • Hw3 • Carmel: an FST package

  4. Regular relation

  5. Definition of regular relation • The set of regular relations is defined as follows: • For all , {(x, y)} is a regular relation • The empty set is a regular relation • If R1, R2 are regular relations, so are R1¢ R2 = {(x1 x2, y1 y2) | (x1, y1) 2 R1, (x2, y2) 2 R2}. R1 R2, and R*. • Nothing else is a regular relation.

  6. Closure properties • Like regular languages, regular relations are closed under • union • concatenation • Kleene closure • Unlike regular languages, regular relations are not closed under • Intersection • difference • complementation

  7. Closure properties (cont) • New operations for regular relations: • Composition: • Projection: extract x or y in the pairs • Inversion: switch the x and y in (x,y) pairs • Take a regular language and create the identity regular relation • Take two regular languages and create the cross product relation

  8. Finite state transducer

  9. Finite-state transducers • x:y is a notation for a mapping between two alphabets: • An FST processes an input string, and outputs another string as the output. • Finite-state automata equate to regular languages, and FSTs equate to regular relations. • Ex: R = { (an, bn) | n >= 0} is a regular relation. It maps a string of a’s into an equal length string of b’s

  10. b:y a:x q0 q1 An FST R(T) = { (a, x), (ab, xy), (abb, xyy), …}

  11. Definition of FST A FST is • Q: a finite set of states • Σ: a finite set of input symbols • Γ: a finite set of output symbols • I: the set of initial states • F: the set of final states • : the transition relation between states.  FSA can be seen as a special case of FST

  12. Definition of transduction • The extended transition relation is the smallest set such that • T transduces a string x into a string y if there exists a path from the initial state to a final state whose input is x and whose output is y:

  13. More FST examples • Lowercase a string of any length • Tokenize a string he said:”Go away.”  he said : “ Go away . “ • Convert a word to its morpheme sequence • Ex: cats  cat s • POS tagging: • Ex: He called Mary  PN V N • Map Arabic numbers to words • Ex: 123  one hundred and twenty three

  14. Operations on FSTs • Union: • Concatenation: • Composition:

  15. b:y a:x q0 q1 x:ε y:z q0 b:z a:² q0 q1 An example of composition operation

  16. FST Algorithms • Recognition: Is a given pair of strings accepted by an FST? • (x,y)  yes/no • Composition: Given two FSTs T1and T2 defining regular relations R1 and R2, create the FSTthat computes the composition of R1 and R2. • R1={(x,y)}, R2={(y,z)}  {(x,z) | (x,y) 2 R1, (y,z) 2 R2} • Transduction: given an input string and an FST, provide the output as defined by the regular relation? • x  y

  17. Weighted FSTs A FST is • Q: a finite set of states • Σ: a finite set of input symbols • Γ: a finite set of output symbols • I: Q R+ (initial-state probabilities) • F: Q R+ (final-state probabilities) • : the transition relation between states. • P: (transition probabilities)

  18. An example: build a unigram tagger P(t1 … tn | w1 … wn) ¼ P(t1|w1) * … * P(tn | wn) Training time: Collect (word, tag) counts, and store P(t | w) in an FST. Test time: in order to choose the best tag sequence, • create an FSA for the input sentence • compose it with the FST. • choose the best path in the new FST

  19. Summary • Finite state transducers specify regular relations • FST closure properties: union, concatenation, composition • FST special operations: • creating regular relations from regular languages (Id, crossproduct); • creating regular languages from regular relations (projection) • FST algorithms • Recognition • Transduction • Composition • … • Not all FSTs can be determinized. • Weighted FSTs are used often in NLP.

  20. Hw3

  21. Part III: Creating a unigram POS tagger using FSTs • Input: w1 w2 … wn • Output: w1/t1 w2/t2 … wn/tn • Training data: w1/t1 w2/t2 … wn/tn

  22. Major steps • Training time: create an FST from the training data: • calc_unigram_prob.sh: create “word tag prob cnt” • create_fst.sh: create an FST from the unigram_voc • Test time: • Preprocessing: preproc.sh • Decoding (finding the best path): run carmel with some options • Postprocessing: postproc.sh • Calculate tagging accuracy • Write a wrapper

  23. Carmel

  24. The format of FSA / FST final_state (from_state1 (to_state1 “input_symbol” “output_symbol”? weight?)* ) (from_state2 (to_state2 “input_symbol” “output_symbol”? weight?)* ) … A state can be a number or string. The from_state in the first edge-line is the start state. ² is represented as *e* output_symbol and prob are optional.

  25. An FSA example: fsa1 0 1 2 3 4 5

  26. An WFSA example: wfsa1

  27. An WFST example: wfst1

  28. To use Carmel • carmel fst1 fst2 => return a new fst, which composes fst1 and fst2. • carmel -k N wfst1 => return the N most probable paths • carmel -Ok N wfst1 => return the N most probable output strings

  29. To use Carmel (cont) • cat input_file | carmel –sli fst1 • create a foo_fst that corresponds to the first line in input_file • carmel foo_fst fst1 • Ex: input_file is “they” “can” “fish” • cat input_file | carmel –sri fst1 • create a foo_fst that corresponds to the first line in input_file • carmel fst1 foo_fst • Ex: input_file is “PRO” “AUX” “VERB” • cat input_file | carmel –b –sli fst1

More Related