1.26k likes | 1.28k Views
Learn about Finite-State Automata (FSA), a significant computational linguistics tool, its variations, and real-world applications in speech recognition, synthesis, and more. Explore its efficiency, uses in parsing and speech processing, and the protocol for electronic money transfer.
E N D
Digital State Machines Finite Automata & Regular Languages
Chapter Outline • Introduction • Finite-State Automata • Regular Languages and Finite-State Automata • Summary Veton Këpuska
Introduction: Finite State Automata • Finite-state automaton is one of the most significant tools of computational linguistics. Its variations: • Finite-state transducers • Hidden Markov Models, and • N-gram grammars Important components of the Speech Recognition and Synthesis, spell-checking, and information-extraction applications. • The FSA theory was designed in the beginning of computer science as a model of abstract computing machines pioneered by the work Allan Turing. • FSA’s are devices that accept-recognize or reject an input stream of characters. • FSA’s are very efficient in term of speed and memory • The most frequent usage of Finite-State Automata is searching words or phrases. • Additional uses in application areas such as: • Morphological parsing, • Parts of speech annotation, and • Speech Processing and Recognition. Veton Këpuska
This FSA accepts (recognizes) or generates strings like: ac abc abbc abbbc, abbbbbbbbbbc, etc. Example of Finite State Automata Veton Këpuska
Introduction: D-FSA vs. ND-FSA • Adding non-determinism to FSA will not allow us define any language that can not be defined by deterministic FSAs. • Why then bother with ND-FSAs: • It turns out that there can be substantial efficiency in describing an application using ND-FSAs. • ND-FSAs allows us to program solutions to problems using a higher-level language. • This program then is compiled, by the algorithm (that we will learn in this chapter), into a deterministic FSA that can be executed on a conventional computer. Veton Këpuska
Finite State Automata An Informal Description of Finite State Automata
Finite Automata • Study extended example of a real-world problem whose solution uses finite automata. • Investigate protocols that support “electronic money” – files that: • a customer can use to pay for goods on the internet, retains a copy of the same file to spend again, and • a seller can receive with assurance that “money” is real. It must know that the file has not been forged, nor has it been copied and sent to the seller. • Nonforgeability of the file must guaranteed by a third party – a bank and by a cryptography policy. • Encryption of the money files ensures that forgery is not a problem. • Bank must also keep a database of al the valid money that it has issued: • It can verify to a store that the file it has recived represents real money and can be credited to the store’s account. • Encryption is not going to be addressed as it is beyond the scope of the topic covered in this class. Veton Këpuska
Finite Automata • Nevertheless, in order to use electronic money, protocols need to be devised to allow the manipulation of the money in a variety of ways that the users want. • Monetary systems always invite fraud, and the protocol must verify whatever policy is adopted regarding home money is used. • The solution needs to ensure that the only things that can happen are things we intend to happen: an unscrupulous user will not be allowed to steal from others or to “manufacture” money. Veton Këpuska
The Ground Rules • The participants: • The customer • The store • The bank • Only one money file in existence (for simplicity) • The customer: • Pay, which initiates transfer of “his” money file to the store, or • Cancel the transfer, effectively asking the bank to place the money back in the customer’s account. • The store: • Ship goods to the customer, • Redeem the money, effectively asking the bank to transfer the money to the store’s account. • The bank: • Transfer the money by creating a new, suitable encrypted money file and sending it to the store. Veton Këpuska
The Protocol • The customer – • Assume that the customer can not be relied to act responsibly. • Customer may try to copy the money file, • Use the same money file to pay several times, or both • The bank – • Assuming that the bank must behave responsibly, or it can not be a bank. • It must ensure that tow stores cannot both redeem the same money file, • It will not allow money file to be both canceled and redeemed. • The store – • Will not ship goods until it is sure it has been given valid money. Veton Këpuska
The Protocol • FSA can represents the protocols as the one being discussed. • States – will represent each possible “state”/situation that each participants could be in. • The state remembers important events that have happened, • Also it knows which ones did not yet happen. • Transitions – occur between states whone one of the five events described previously occur. Veton Këpuska
FSAs for Money Transfer Example Bank: • Beginning State is state “1” • The bank has issues a money file • No requests have been made to either redeem it or cancel it. • Cancel request • Bank restores the money and enders state 2. • Bank can not leave state 2 since it can not allow the same money to be canceled again or to be spent by the customer. • Redeem request • Enters state 3, and • Initiates transfer and upon completion enters state 4. • In state 4 it will no longer accept cancel, nor redeem requests, nor will it perform any other transactions regarding this particular money file. Veton Këpuska
FSAs for Money Transfer Example Store: Procedures in the store are assumed to be imperfect. • Beginning State is state “a” • Pay request • Customer orders the goods by performing pay action. • Enters state “b” and initiates both shipping and redemption process. • Ship and Redeem request • Enters state c or d in any order, and • Initiates redeem /transfer or ship and enters state e/f or e. Customer: • Pay and Cancel request • Can do them any number of times and in any order. Veton Këpuska
Enabling Automata to Ignore Actions Missing transitions: • Store is not affected by a “cancel” action. • According to the formal definition of FSA (next) whenever an input X is received by an automaton, the automaton must follow an arc labeled X from the state that it is in to a new state. • Store FSA must me augmented with transitions that correspond to “cancel” actions. • Effects of unexpected actions: • Customer executed “pay” action second time, while store is in state e. • Since store automaton does not have an arc corresponding to pay action in that state it will case FSA to “die”. • The two kinds of actions that must be ignored by FSA’s: • Actions that are irrelevant to the participant involved. • For the store FSA : “cancel” action. • For the bank FSA: “pay” and “ship” • For the customer FSA: “ship”, “redeem” and “transfer” • Actions that must not be allowed to kill an automaton. • For the store FSA: customers second “pay”, or “cancel” actions should not be allowed to kill its FSA. • For the bank FSA: stores multiple “redeem” actions should be ignored. Veton Këpuska
Completed FSA’s Veton Këpuska
Complete System as FSA • Previous models accounted actions of each participants independently. • Customer’s FSA is simple – no-matter what actions are taken it resides in the same state. • Bank’s and Store’s FSAs are complex and it is not immediately obvious in what combinations of states these tow automata can be. • Product Automaton: • The normal way to explore the interaction of automata is to construct product automaton. • New product FSA states are composed of pairs of states from each original FSAs: (3,d) – state denotes the situation where the bank is in state 2 and store in state d. • Bank = 4 states, Store = 7 states, Product FSA = 4x7=28 states Veton Këpuska
Product Automaton for the Store and Bank Veton Këpuska
Product Automaton • Each of the two component of the product automaton independently makes transitions on the various inputs. • If an input action is received, and one fo the two automata has no sate to go to on that input, then the product automaton “dies”; it has no state to go to. • Formal Rule: • Assume (bank, store) product automaton being in state (i, x). • Let Z be one of the input actions. • Observe if there is a transition from state i under the input Z. Suppose there is a transition to state j. • Similarly, observe if there is a transition from state x under the same input Z to state y. • Thus, there is a transition from (i, x) to state (j, y) under input Z. If any of the states j or y do not exist than there is not transition arc labeled Z from (i, x). Example: • Consider the input redeem. If bank receives a redeem message when in state 1, it goes to state 3. If it in state 3 or 4 it stays there. If in state 2 the bank automaton dies. Veton Këpuska
Using Product Automaton to validate the Protocol • Only 10 states are accessible from start state • Example of states that are not accessible. • Real purpose of analyzing a protocol such as this one using automata is to ask and answer questions that mean: “Can the following type of error occur?” • Example: “Is it possible that the store can ship goods and never get paid?” State is c, e, or g and no transition on input T was ever made? • Problem State (2,c) 1 3 5 2 4 6 ? 7 8 9 10 Veton Këpuska
Deterministic Finite State Automaton Formalism of a Deterministic Finite State Automaton Veton Këpuska
Deterministic Finite State Automaton • “Deterministic” refers to the fact that on each input there is one and only one state to which the automaton can transition from its current state. • Non-deterministic automaton can transition from its present state to more than one states on the same input. Veton Këpuska
Definition of D-FSA • A deterministic Finite State Automaton consists of: • A finite set of states – Q • A finite set of input symbols, • A transition function, , that takes as arguments: • a state, and • an input symbol, and • returns a state : • A start state, q0, one of the states in Q • A set of final, or accepting, states F. FQ • Five-tuple notation of a D-FSA named A: A=(Q, , , q0,F) Veton Këpuska
Formal Definition of Automaton Veton Këpuska
String Processing with D-FSA • Suppose a1a2…an is a sequence of inputs symbols. • Initial state of D-FSA is its start state q0, then • q1= (q0, a1) • q2= (q1, a2) … i.qi= (qi-1, ai) … n.qn= (qn-1, an) • If qnF then the input a1a2…an sequence “accepted” otherwise it is “rejected”. Veton Këpuska
D-FSA Example • Using FSA to Recognize Sheeptalk “baa…!” Veton Këpuska
FSA Use • The FSA can be used for recognizing (we also say accepting) strings in the following way. First, think of the input as being written on a long tape broken up into cells, with one symbol written in each cell of the tape, as figure below: Veton Këpuska
Recognition Process • The machine starts in the start state (q0), and iterates the following process: • Check the next letter of the input. • If it matches the symbol on an arc leaving the current state, then • cross that arc • move to the next state, also • advance one symbol in the input • If we are in the accepting state (q4) when we run out of input, the machine has successfully recognized an instance of sheeptalk. • If the machine never gets to the final state, • either because it runs out of input, or • it gets some input that doesn’t match an arc (as in Fig in previous slide), or • if it just happens to get stuck in some non-final state, we say the machine rejects or fails to accept an input. Veton Këpuska
FSA For “ShpeepTalk” Example • Q = {q0,q1,q2,q3,q4}, • = {a, b, !}, // Sheep Language • F = {q4}, and • δ(q, i) // Defined in next slide Veton Këpuska
State Transition Table We’ve marked state 4 with a * to indicate that it’s a final/accepting state (you can have as many final states as you want), and the Ø indicates an illegal or missing transition. We can read the first row as “if we’re in state 0 and we see the input b we must go to state 1. If we’re in state 0 and we see the input a or !, we fail”. Veton Këpuska
Deterministic Algorithm for Recognizing a String function D-RECOGNIZE(tape,machine) returns accept or reject index←Beginning of tape current-state←Initial state of machine loop if End of input has been reached then if current-state is an accept state then return accept else return reject elsif transition-table[current-state,tape[index]] is empty then return reject else current-state←transition-table[current-state,tape[index]] index←index + 1 end Veton Këpuska
Tracing Execution for Some Sheep Talk Before examining the beginning of the tape, the machine is in state q0. Finding a b on input tape, it changes to state q1 as indicated by the contents of transition-table[q0,b] in Fig. It then finds an a and switches to state q2, another a puts it in state q3, a third a leaves it in state q3, where it reads the “!”, and switches to state q4. Since there is no more input, the End of input condition at the beginning of the loop is satisfied for the first time and the machine halts in q4. State q4 is an accepting state, and so the machine has accepted the string baaa! as a sentence in the sheep language. Veton Këpuska
Fail State • The algorithm will fail whenever there is no legal transition for a given combination of state and input. The input abc will fail to be recognized since there is no legal transition out of state q0 on the input a, (i.e., this entry of the transition table has a Ø). • Even if the automaton had allowed an initial a it would have certainly failed on c, since c isn’t even in the sheeptalk alphabet! We can think of these “empty” elements in the table as if they all pointed at one “empty” state, which we might call the fail state or sink state. • In a sense then, we could FAIL STATE view any machine with empty transitions as if we had augmented it with a fail state, and drawn in all the extra arcs, so we always had somewhere to go from any state on any possible input. Just for completeness, next Fig. shows the FSA from previous Figure with the fail state qF filled in. Veton Këpuska
Adding a Fail State to FSA Veton Këpuska
Example • Suppose we have a D-FSA that accepts all and only the strings of 0’s and 1’s that have the sequence 01 somewhere in the string. We can write this language L as follows: {w|w is of the form x01y for some strings x and y consisting of 0’s and 1’s} • Equivalent description is: {x01y | x and y are any strings of 0’s and 1’s} • Example strings in this language L include 01, 110110, 100011. • Example strings not in this language L are ∊, 0, and 111000. Veton Këpuska
Example • What can be said about this D-FSA (A) that accepts this language L. • S = {0, 1} • It has a number (of yet unknown) set of states with one of them say q0 a starting state. • It has to remember some important facts about what inputs it has seen so far. This is necessary to decide whether 01 is a substring of the input. • A needs to remember: • Has it already seen 01? If yes than it will be in accepting state from now on. • Has not seen 01, but its most recent input was 0, thus if now sees a 1, it will have seen 01 and can accept everything it sees from here on? • Has not seen 01, but its last input was either nonexistent (it just started) or it has saw a 1? In this case A cannot accept until it first sees a 0 and then sees a 1 immediately after. Veton Këpuska
Example • Each condition presented in previous slide can be represented by a state. • Condition (3) is represented by the start (first) state q0: • If we are in the q0 state, and next input is “0” we are then governed by condition (2): 1 1 0 q0 q0 q2 0 Veton Këpuska
Example • If we are in the state (2) and we receive input “1” – FSA should transit to the accepting state, which in case we choose to name it state q1. • Finally in accepting state q1 any combination of 0’s and 1’s should not change the state. Thus Q = {q0, q1, q2} and F={q1} A=({q0, q1, q2}, {0,1}, , q0,{q1}) 0 1,0 1 0 q0 q2 q1 1 Veton Këpuska
Simpler Notations for D-FSA • A five-tuple with a detailed description of the d transitions is both tedious and hard to read. There are two preferred notations: • A transition diagram, which is a graph such as the ones we have seen previously. • A transition table, which is a tabular listing of the d function, which provides the set of states and the input alphabet. Veton Këpuska
Transition Diagrams • A transition diagram for a FSA A=(Q, , , q0,F) is a graph defined as follows: • For each state in Q there is a node • For each state q in Q and each input symbol a in S, let d(q,a)=p.The transition diagram has an arc from node q to node p, labeled a. If there are several input symbols that cause transitions from q to p, then the transition diagram can have one arc, labeled by the list of these symbols. • There is an arrow into the start state q0, labeled Start. • Nodes corresponding to accepting states (set F) are marked with double circle. Veton Këpuska
Example A=(Q, , , q0,F) A=({q0, q1, q2}, {0,1}, , q0, {q1}) Veton Këpuska
Transition Tables • Transition table is a conventional, tabular representation of a function like d that takes two arguments and returns a value. • Rows – correspond to states • Columns – correspond to inputs Transition table for the D-FSA of previous example Veton Këpuska
Extending the Transition Function to Strings • D-FSA defines a language: • The set of all strings that result in a sequence of state transitions from the start state to an accepting state, or alternatively • The set of labels along all the paths that lead from the start state to any accepting state - in terms of the transition diagram. • Formulate precisely the notation of the language expressed by D-FSA: • Define extended transition function of d • It describes what happens when we start in any state and follow any sequence of inputs. Veton Këpuska
Definition of Extended Transition Function BASIS: • If we are in state q and read no inputs, then we are still in state q. INDUCTION: • Suppose w is a string of the form xa; • w = 1101 x = 110 & a = 1 Veton Këpuska
Example • Design D-FSA to accept the language: L={w|w has both an even number of 0’s and 1’s} • Solution: • Use states to count how many 0’s and 1’s has seen. Since even number requires counting modulo 2 we need to have 2 states for each symbol of the alphabet total of 4. • S = {0,1} • Q = {q0,q1, q2,q3} • q0 – both number of 0’s and 1’s seen so far is even Accepting State; F = {q0} • q1 – number of 0’s is even and number of 1’s seen so far is odd • q2 – number of 0’s is odd and number of 1’s seen so far is even • q3 – number of 0’s and 1’s seen so far is odd Veton Këpuska
Transition Diagram of D-FSA Veton Këpuska
Transition Table Veton Këpuska
Test • The check involves computing for an input, say w=110101 starting from ∊. Veton Këpuska
Formal Languages • Key Concept #1. Formal Language: • A model which can both generate and recognize all and only the strings of a formal language acts as a definition of the formal language. • A formal language is a set of strings, each string composed of symbols from a finite symbol-set called an alphabet (the same alphabet used above for defining an automaton!). • The alphabet for a “sheep” language is the set = {a,b, !}. • Given a model m (such as FSA) we can use L(m) to mean “the formal language characterized by m”. • L(m)={baa!,baaa!, baaaa!, baaaaa!,….} Veton Këpuska
The Formal Language Defined by D-FSA • Language defined by D-FSA A=(Q, , , q0,F), denoted as L(A) is defined as: • That is, the language A is the set of strings w that take the start state q0 to one of the accepting states of D-FSA. If L is L(A) for a D-FSA, then we say L is a regular language. Veton Këpuska
Homework: • 2.2.1, 2.2.2, 2.2.3,2.2.4,2.2.5, 2.2.6, 2.2.7, 2.2.8, 2.2.9,2.2.10 Veton Këpuska