1 / 59

Uses of DFAs

Uses of DFAs. DFAs are a way of handling certain specialized issues involving character strings For example, one might want to determine whether a given input string is a legal identifier in some language is a legal literal of a particular type in some particular programming language

aradia
Download Presentation

Uses of DFAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Uses of DFAs • DFAs are a way of handling certain specialized issues involving character strings • For example, one might want to determine whether a given input string • is a legal identifier in some language • is a legal literal of a particular type in some particular programming language • is a possible word in a given natural language • contains a particular substring

  2. Recognizing legal identifiers/literals • Typical sets of legal identifers (or literals) include those • strings beginning with a letter followed by some digits • strings consisting only of letters • strings that containing at most one digit • strings that begin with the underscore character • Note that all of these sets are languages

  3. What’s a DFA (roughly)? • A DFA is an abstraction of a machine • It corresponds more to a program than to a (programmable) computer • It answers questions about membership in certain languages • That is, it answers yes/no questions of the form: is a string in a particular set of strings? • Note that the examples given above all have this form

  4. An abstract machine for a simple switch • An abstraction of light switch might have two states • One would correspond to the “on” position • There would be transitions possible between the states • It would need to be clear which state was the initial state • This information could be summarized as in the diagram below

  5. An abstract machine diagram for a simple switch

  6. Labeled transitions • This machine is not yet a DFA, since it has nothing to do with strings or languages. • By labeling the transitions with symbols from an alphabet, we’d get a DFA (shown below) • If both transitions were labeled with a 0, the resulting DFA would correspond to the set of strings of odd length over the alphabet {0}. • That is, sequences of transitions that took us from the initial state to the “on” state would correspond to sequences of 0’s of odd length

  7. A DFA for a simple switch • This DFA was constructed with JFLAP

  8. Specifying a DFA • Just as in our example, specifying a DFA requires specifying • An input alphabet ({0} in our example) • A set of states • An initial state (the “off” state in our example) • A set of final states (consisting only of the “on” state in our example) • A set of transitions from state to state, each labeled with a symbol from the input alphabet

  9. DFA Pragmatics • Realistic examples can have large alphabets and thus ugly diagrams • so we’ll use simple alphabets in most examples • States from which no final states are reachable can be omitted from diagrams • We call these dead states (or trap states)

  10. Generalizing the alphabet • Suppose we generalized our example to a question about a larger alphabet – say {0,1} • Let’s (arbitrarily) say that we want to recognize those strings with an odd number of both 0’s and 1’s. • As you might guess, we need more than two states to do this. • A DFA for this language is given below • Note the conventions for initial and final states, and for state names

  11. Recognizing strings with an odd number of both 0's and 1's

  12. What’s a DFA – precisely? • A DFA (deterministic finite acceptor – or automaton) consists of • A finite (input) alphabet S • A finite set Q of states • An initial (or start) state q0 (from Q) • A set F of final states (a subset of Q) • A transition functiond : Q x S -> Q • Note that both the alphabet and set of states are finite -- a DFA has a finite description

  13. Tabular representation of DFAs • Tabular representation of the DFA above: • d | 0 1 • q0 | q1 q2 • q1 | q0 q3 • q2 | q3 q0 • F q3 | q2 q1 • Note that the table implicitly gives the states, the start state, and the input alphabet • The final states have to be explicitly labeled

  14. States and strings • Note that states in the DFA above correspond to strings as follows: • q0: even number of 0's; even number of 1's • q1: odd number of 0's; even number of 1's • q2: even number of 0's; odd number of 1's • q3: odd number of 0's; odd number of 1's • DFA states often have a (fairly) simple representation in terms of the strings that are taken there

  15. The extended transition function • d may be extended to handle strings • This gives a function d*: Q x S* -> Q • d* is defined for all states q, strings w, and symbols a by • d*(q,l) = q • d*(q,wa) = d(d*(q,w), a) • Note that d* extends d in the sense that d*(q,a) = d(q,a) • to see this, let w = l above

  16. Notational conventions • Lower-case letters near q in the English alphabet are used for states, perhaps with subscripts • We use S, Q, q0,F, d, and d* without comment if only one DFA is under discussion

  17. Acceptance of a string • A string x is accepted by a DFA M iff d*(q0,x) e F, • where d* is the extended transition function • Fact: For this function d*, d*(q,xy) = d* (d*(q,x), y) for any q, x, and y

  18. Acceptance of a language • A language L is accepted (or recognized) by M iff every string in L is accepted by M, and no string outside of L is accepted by M • A DFA M accepts exactly one language. • This language is denoted L(M) • A language is regular iff it is accepted by some DFA. • Fact: All finite languages are regular • Two DFAs are equivalent iff they accept the same language.

  19. More examples of DFAs • strings alternating 0's and 1's State q3 is a dead state

  20. Still another example of a DFA • For binary representations of integers (without leading 0’s) • all strings over {0,1} beginning with 1 • dead state omitted

  21. Yet another example of a DFA • Strings over {0,1,.} that contain at most one period • This DFA is not minimal

  22. A DFA for binary multiples of 3 • Here we identify l with 0 – a good exercise is to redo this example without this assumption

  23. Regular expressions • Regular expressions are used to represent languages. • It turns out that every regular expression represents a regular language, • Also, every regular language may be represented by a regular expression. • The language represented by the regular expression r is denoted L(r).

  24. Regular expressions defined • An expression is a regular expression iff it has one of the following forms: • f, l, or a, where a is a member of some S • r+s, rs, r*, or (r), where r and s are regular expressions. • The expressions f, l and a respectively represent f, {l}, and {a} • The expressions r+s, rs, r*, and (r) respectively representL(r) U L(s), L(r)L(s), L(r)*, and L(r)

  25. About regular expressions • Don’t confuse the empty language f with the nonempty language {l} • The precedence of the * operator is higher than for juxtaposition; for +, it’s lower. • These pairs of regular expressions are equivalent (represent the same languages) • rl and r; and r and lr • (rs)t and r(st) • (r+s)t and rt + st; and r(s+t) and rs + rt • r* and (r*)*; and (r+s)* and (r*s*)*

  26. Examples of regular expressions • (0+1+2)* • all strings over {0,1,2} • a(a+b+c)* • all strings over {a,b,c} starting with a • (0+1)*010(0+1)* • all strings over {0,1} having 010 as a substring • 000+001+010+011+100+101+110+111 • all bit strings of length 3 • (0+1)(0+1)(0+1)(0+1) • all bit strings of length 4

  27. Regular expressions and regular languages • For every regular expression r, L(r) is a regular language. • If L is a regular language, then L = L(r) for some regular expression r. • We will soon see constructive proofs of these claims. • The proof of the first claim requires a new concept -- nondeterminism

  28. Problems with constructing DFAs • For some languages, the FAs that are most natural are incorrect. For example, • for (a+b)*ab(a+b)*, the natural FA below isn't deterministic (d is not a function) • for 0*1*, the natural FA recognizes (0+1)* • for 10+(12)*, the natural FA isn't deterministic, and accepts too many strings • for (10)*12, the natural FA isn't deterministic

  29. An attempt at a DFA for L( (a+b)*ab(a+b)* )

  30. Nondeterminism • One way of dealing with this issue is to allow finite automata a choice of moves for a given state and symbol. • This approach is called nondeterminism. • In some cases, it is also useful to allow l-moves that don't consume an input symbol. • It turns out that adding nondeterminism (even with l-moves) does not allow us to accept any new languages.

  31. Modeling nondeterminism • Ways of thinking of nondeterminism: • omniscience • suppose that you could always guess correctly • branching processes • allocate a new processor for each choice • simultaneous processes • allow being in several states at once • backtracking • try each choice one after another

  32. Sets of states • Our treatment of nondeterminism is based on the third approach above. • We will allow the computation to be in several states at once (or no state at all). • This requires a new transition function: • d: Q x (S U {l})-> 2Q • Note that l is a legal 2nd argument, and that the value is a (possibly empty) set of states.

  33. NFAs • If our definition of a DFA is modified to use the new transition function, we get an NFA • Note that the “N” stands for “nondeterministic” rather than “not”. • The FA shown above for L((a+b)*ab(a+b)*) is a legal NFA for that language, with table: d | a b l q0 | {q0,q1} {q0} {} q1 | {} {q2} {} F q2 | {q2} {q2} {} U U

  34. Acceptance by NFAs • An NFA accepts a string x iff d*(q0, x) contains a state of F, where d* is defined informally in Linz (p. 51). More precisely, d*(q, l) is the set of states accessible from q by zero or more l moves d*(q, a) is the set of states accessible from q by zero or more l moves, then one a move and then zero or more l moves d*(q, wa) is the union over all p in d*(q, w) of d*(p, a) d*(S, x) is the union over all q in S of d*(q, x)

  35. An NFA for L((10)*12) • A table for an NFA for (10)*12: • δ | 0 1 2 l • q0 | {} {q1,q2} {} {} • q1 | {q0} {} {} {} • q2 | {} {} {q3} {} • F q3 | {} {} {} {}

  36. The use of l moves in NFAs • A bad NFA for 0*1*: • A good NFA for 0*1*:

  37. Constructing DFAs • We’ll soon have algorithms to find equivalent DFAs for NFAs or regular expressions. • In other cases, try starting with a start state. • For every new state, determine destination states from this state on every symbol • Tricky part: must these destination states be new states, or can old states be reused? • Useful observation: if x and y go to the same state, so do xz and yz for all z

  38. Equivalence of NFAs and DFAs • Claim: Any language accepted by an NFA is regular • To show the claim, we need to find, for any NFA MN, a DFA MD with L(MD) = L(MN) • Idea: let states of the DFA correspond to sets of states of the NFA • we’ll use brackets to denote states of MD • Although there will be 2m states (if MN has m states), many will be dead or inaccessible

  39. Constructing equivalent DFAs • [q0] will be the start state of MD • [S] is a final state iff S contains a final state of MN, or if [S] = [q0] and MN accepts l • The transition function dD is defined by letting dD([S],a) correspond to the union over all q in S of dN*(q,a), where dN is MN’s d function • We use dN* rather than dN to allow l moves • By induction, dD*([S],x)= [dN*(S,x)] for all x with |x| > 0, so that dD*([q0],x) = [dN*(q0,x)]

  40. Example 1 of an equivalent DFA • The NFA below accepts L(0*1*) • δ | 0 1 l • q0 | {q0} {} {q1} • F q1 | {} {q1} {} • The equivalent DFA is given below • F [q0] | [q0,q1] [q1] • F [q0,q1] | [q0,q1] [q1] • F [q1] | [] [q1] • [] | [] [] • Note: the 4th state is a trap (dead) state

  41. Example 2 of an equivalent DFA • The DFA equivalent to the NFA given above for L((10)*12) is: • δ | 0 1 2 • [q0] | [] [q1, q2] [] • [q1, q2] | [q0] [] [q3] • F [q3] | [] [] [] • Here the dead state and inaccessible states have been omitted

  42. Another example with l moves • The NFA below accepts L(10+(12)*)

  43. The equivalent DFA • The equivalent DFA is: • δ | 0 1 2 • F [q0] | [] [q2,q5] [] • [q2,q5] | [q3] [] [q4] • F [q3] | [] [] [] • F [q4] | [] [q5] [] • [q5] | [] [] [q4]

  44. Breadth-first search (BFS) • Several steps in the algorithms above require finding all states accessible from some state q. • These steps can be implemented by a breadth-first search (BFS) algorithm • this algorithm is a specialization of the algorithm of Linz, p. 9 • it can be used in step 2 of Linz’s nfa-to-dfa algorithm of p. 59

  45. The BFS algorithm • The algorithm maintains a queue of generated states whose successors have not been generated. • Until the queue becomes empty, the next state is dequeued and marked, and its successors found. • Those successors that are not marked are enqueued. • The queue will eventually empty, since there are only finitely many states

  46. Using the BFS algorithm • BFS can find all l-paths leaving a state q • a “successor” is the destination of a l-transition • the queue is initialized with q only • Or all accessible states of a DFA • a “successor” is the destination of any transition • the queue is initialized with q0 only • Or all nondead states of a DFA • q’s “successor” is the source of any transition to q • the queue is initialized with all states of F

  47. Two useful notions • A useful notion in both DFA minimization and in finding regular expressions for DFAs is identifying states with the strings taken there • That is, we identify q with {x | d*(q0,x) = q} • Another useful notion for minimization is distinguishability • For a language L, two strings x and y in S* are distinguishable iff for some z in S*, exactly one of {xz, yz} is in L

  48. Minimal DFAs and distinguishability • A DFA separates x and y iff d*(q0,x) /= d*(q0,y) • DFAs must separate distinguishable strings • DFAs may separate indistinguishable strings • One might expect minimal DFAs to separate only distinguishable strings • Key to seeing that this is both correct and feasible is the notion of equivalence relations

  49. Equivalence relations and minimization • Note first that indistinguishability is an equivalence relation on S*. • We’ll use the infix operator ~ for this relation • Also, being taken to the same state by a DFA M is an equivalence relation on S*. • The equivalence classes are the states of M • We’ll use the infix operator ~M for this relation • We’ll use the convention that [x] is the (equivalence) class containing x

  50. The minimal DFA for L • Note that for a given L, ~M refines ~ • That is, if x ~M y, then x ~ y • So every class of ~M is in a single class of ~ • And ~ has no more classes than any ~M • If the classes of ~ can represent the states of some DFA for L, this DFA would be minimal. • And they can – we define a DFA M^ to have this set of states, and alphabet S, and …

More Related