1 / 35

Finite-state automata 2 Day 13

Finite-state automata 2 Day 13. LING 681.02 Computational Linguistics Harry Howard Tulane University. Course organization. http://www.tulane.edu/~ling/NLP/ NLTK is installed on the computers in this room! How would you like to use the Provost's $150?. SLP §2.2 Finite-state automata.

step
Download Presentation

Finite-state automata 2 Day 13

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finite-state automata 2Day 13 LING 681.02 Computational Linguistics Harry Howard Tulane University

  2. Course organization • http://www.tulane.edu/~ling/NLP/ • NLTK is installed on the computers in this room! • How would you like to use the Provost's $150? LING 681.02, Prof. Howard, Tulane University

  3. SLP §2.2 Finite-state automata 2.2.1 Sheeptalk

  4. Find your files >>> import sys >>>sys.path.append("/Users/harryhow/Documents/Work/Research/Sims/NLTK") LING 681.02, Prof. Howard, Tulane University

  5. Run program >>> import fsaproc >>> test = 'baaa!' >>> test = 'baaa!$' >>> fsaproc.machine(test) LING 681.02, Prof. Howard, Tulane University

  6. Go over print-out LING 681.02, Prof. Howard, Tulane University

  7. Key points • D-recognize is a simple table-driven interpreter. • The algorithm is universal for all unambiguous regular languages. • To change the machine, you simply change the table. • Crudely therefore… matching strings with regular expressions (ala Perl, grep, etc.) is a matter of: • translating the regular expression into a machine (a table) and • passing the table and the string to an interpreter. LING 681.02, Prof. Howard, Tulane University

  8. Recognition as search • You can view this algorithm as a kind of state-space search. • States are pairings of tape positions and state numbers. • The goal state is a pairing with the end of tape position and a final accept state. LING 681.02, Prof. Howard, Tulane University

  9. SLP §2.2 Finite-state automata 2.2.2 Formal languages

  10. Generative Formalisms • Formal Languages are sets of strings composed of symbols from a finite set of symbols. • Finite-state automata define formal languages (without having to enumerate all the strings in the language). • The term Generative is based on the view that you can run the machine as a generator to get strings from the language. LING 681.02, Prof. Howard, Tulane University

  11. Generative Formalisms • A FSA can be viewed from two perspectives, as: • an acceptor that can tell you if a string is in the language. • a generators to produce all and only the strings in the language. LING 681.02, Prof. Howard, Tulane University

  12. SLP §2.2 Finite-state automata 2.2.4 Determinism

  13. Determinism • A deterministic FSA has one unique thing to do at each point in processing. • i.e. there are no choices LING 681.02, Prof. Howard, Tulane University

  14. Non-determinism LING 681.02, Prof. Howard, Tulane University

  15. Non-determinism cont. • Epsilon transitions • An arc has no symbol on it, represented as . • Such a transition does not examine or advance the tape during recognition: LING 681.02, Prof. Howard, Tulane University

  16. SLP §2.2 Finite-state automata 2.2.5 Use of a nFSA to accept strings

  17. Read on your own • pp. 33-5 LING 681.02, Prof. Howard, Tulane University

  18. SLP §2.2 Finite-state automata 2.2.6 Recognition as search

  19. Non-deterministic recognition: Search • In a ND FSA there is at least one path through the machine for a string that is in the language defined by the machine. • But not all paths directed through the machine for an accept string lead to an accept state. • No paths through the machine lead to an accept state for a string not in the language. LING 681.02, Prof. Howard, Tulane University

  20. Non-deterministic recognition • So success in non-deterministic recognition occurs when a path is found through the machine that ends in an accept. • Failure occurs when all of the possible paths for a given string lead to failure. LING 681.02, Prof. Howard, Tulane University

  21. Example b a a ! \ a q0 q2 q1 q2 q3 q4 LING 681.02, Prof. Howard, Tulane University

  22. Example LING 681.02, Prof. Howard, Tulane University

  23. Example LING 681.02, Prof. Howard, Tulane University

  24. Example LING 681.02, Prof. Howard, Tulane University

  25. Example LING 681.02, Prof. Howard, Tulane University

  26. Example LING 681.02, Prof. Howard, Tulane University

  27. Example LING 681.02, Prof. Howard, Tulane University

  28. Example LING 681.02, Prof. Howard, Tulane University

  29. Example LING 681.02, Prof. Howard, Tulane University

  30. Key points • States in the search space are pairings of tape positions and states in the machine. • By keeping track of as yet unexplored states, a recognizer can systematically explore all the paths through the machine given an input. LING 681.02, Prof. Howard, Tulane University

  31. Ordering of states • But how do you keep track? • Depth-first/last in first out (LIFO)/stack • Unexplored states are added to the front of the agenda, and they are explored by going to the most recent. • Breadth-first/first in first out (FIFO)/queue • Unexplored states are added to the back of the agenda, and they are explored by going to the most recent. LING 681.02, Prof. Howard, Tulane University

  32. SLP §2.2 Finite-state automata 2.2.7 Comparison

  33. Equivalence • Non-deterministic machines can be converted to deterministic ones with a fairly simple construction. • That means that they have the same power: • non-deterministic machines are not more powerful than deterministic ones in terms of the languages they can accept. LING 681.02, Prof. Howard, Tulane University

  34. Why bother? • Non-determinism doesn’t get us more formal power and it causes headaches, so why bother? • More natural (understandable) solutions. LING 681.02, Prof. Howard, Tulane University

  35. Next time SLP §2.3 briefly SLP §3

More Related