1 / 74

Digital State Machines

Digital State Machines. Regular Expressions & Languages. Chapter Outline. Regular Expressions Basic Regular Expression Patterns Disjunction, Grouping and Precedence Examples Advanced Operators Regular Expression Substitution, Memory and ELIZA Summary. Regular Expressions (RE).

canthony
Download Presentation

Digital State Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digital State Machines Regular Expressions & Languages

  2. Chapter Outline • Regular Expressions • Basic Regular Expression Patterns • Disjunction, Grouping and Precedence • Examples • Advanced Operators • Regular Expression Substitution, Memory and ELIZA • Summary Veton Këpuska

  3. Regular Expressions (RE) • Algebraic Description of finite state automata. • Regular Expressions can define exactly the same languages that the various forms of automata describe: regular languages. • Regular Expressions (RE) offer a declarative way to express the strings we want to accept – FSA do not! • REs serve as the input language for many systems that process strings: • Search commands such as UNIX grep (egrep, etc.) for finding strings: • WWW Browsers, • Text-formatting systems, etc. • Search Systems convert REs into FSA(s) (D-FSA or N-FSA). • Lexical-analyzer generators, such as LEX or FLEX. • Compiler, • Language Modeling System in a Speech Recognizer. • Grammar and Spell Checkers. Veton Këpuska

  4. FSA, RE and Regular Languages Regularexpressions Regularlanguages Finiteautomata Veton Këpuska

  5. The Operators of Regular Expressions • Regular Expressions denote languages. • 01*+10* -denotes the language consisting of all strings that are either a: • {0, 01, 011, 0111, 01111,…}, or • {1, 10, 100, 1000, 10000, …} • Operations on Regular Languages that Regular Expressions Represent. Let L, L1 and L2 be regular languages, L={0,1}, L1 = {10, 001, 111} & L2 = {e, 001}, then • The union: L1 ∪L2, the union or disjunction of L1 and L2. • L1 ∪L2 ={e, 10, 001, 111} • The concatenation: L1L2 = {xy|x ∈ L1, y ∈ L2}. • L1 L2 ={10, 001, 111, 10001, 00001, 111001} • The closure (or star, *, or Kleene closure): L*. • L* = {L0, L1, L2,…, Li,…, L∞} Veton Këpuska

  6. Example • L={0,11}, • L0 = {e} – independent of what language L is. • L1 = L – represents the choice of one string from L. • {L0, L1} = {e, 0, 11} • L2 = {00, 011,110,1111} • L3 = {000, 0011, 0110, 01111,1100,11011,11110,111111} • To compute L* must compute Lifor each i (i) • Li has 2i members. • Union of infinite number of terms Li is generally an infinite language (L*) as it is this example. Veton Këpuska

  7. Example • Let L={e, 0, 00, 000, …} – a set of strings consisting all zeros. L – is infinite language • L0 = {e} – independent of what language L is. • L1 = L – represents the choice of one symbol from L. • {L0, L1} = {e, 0, 00, 000, 0000, …...} • L2 = {e, 0, 00,000,0000, ...} = L • L3 = L • L*= L0  L1  L2  … = L •  - empty set. One of only two languages that its closure, *, is not infinite. • 0 = {e} • 1 = {e} • i = {e} • * = {e} Veton Këpuska

  8. Distinction of Star (*) and Closure (*) Operator • Star *: *- forms all strings whose symbols were chosen from alphabet . • Closure * operator is essentially the same with a subtle difference. Let: • L – be a language containing strings of length 1, and • for each symbol a in  there is a string a in L. Thus: •  - set of symbols, while • L – set of strings • * and L* denote the same language. Veton Këpuska

  9. Building Regular Expressions • The algebra of regular expressions follows the pattern of classical algebra. • Constants and Variables denote Languages • Operators ⇒ {Union, Product, Star/Closure} • Define Regular Expression (E - the language that it represents is denoted by L(E)), Recursively: BASIS: • The constants e and and are regular expressions, denoting the languages L(e)={e} and L()= respectively. • If a is any symbol, then a is a regular expression. L(a)={a}. • Any variable, e.g., L, typically capitalized and italic represents any language. Veton Këpuska

  10. Building Regular Expressions INDUCTION: If E and F are regular expressions, than • E+F is a regular expressions denoting their union: L(E+F) = L(E)  L(F). • EF is a regular expression denoting their concatenation: L(EF) = L(E)L(F). • A dot can optionally be used to denote the concatenation operator on languages or in a regular expression. A regular expression 0.1 is same as 01 that represents the language {01} • E* is a regular expression denoting the closure of L(E): L(E*) = (L(E))*. • (E) is also a regular expression denoting the same language as E: L((E))=L(E) Veton Këpuska

  11. Example • Develop a regular expression for the language consisting of the single string 01. • 0 and 1 are expressions denoting the languages {0} and {1} • Concatenation of the two expressions results in regular expression 01 for the language {01}. • As a general rule, if we want a regular expression for the language consisting of only the string w, we use w itself as the regular expression. • Write a regular expression for set of strings that consists of alternating 0’s and 1’s. Thus from the above we get (01)* • Note 1: 01* ≠ (01)* • Note 2: L((01)*) – is not exactly what we want – what about when 1 is at the beginning and/or 0 at the end? • (01)*+(10)*+1(01)*+0(10)* • “+” operator indicates union of the corresponding languages. Veton Këpuska

  12. Example • Alternate Solution: • Note: L(e+1)= L(e)L(1)={e}{1}={e,1} (e+1)(01)*(e+0) Veton Këpuska

  13. Precedence of Regular Expression Operators • * operator has the highest precedence. • Concatenation or dot operator. • Union (+) operator • Controlling the order of operations by grouping operator “()”. • Example: • (0(1*))+1 • (01)*+1 • 0(1*+1) Veton Këpuska

  14. Exercise Examples Exercise 3.1.1: • Write regular expression for the following languages:a • The set of strings over alphabet {a, b, c} containing at least one a and at least one b. • (aba*b*c*) what about other combinations? • ((e+a*)+(e+b*)+(e+c*))*(ab + ba)((e+a*)+(e+b*)+(e+c*))* • The set of strings of 0’s and 1’s whose tenth symbols from the right end is 1. • (0+1)*1(0+1) (0+1)… (0+1) (0+1) • The set of strings of 0’s and 1’s with at most one pair of consecutive 1’s. • (0+1)(0+(00)+(01)+(10))* Veton Këpuska

  15. Finite Automata and Regular Expressions • Regular-expressions describe languages in fundamentally different form from the finite automata. • However, they both describe the same set of languages – “Regular Languages”. To show this one must: • Every language defined by one of these automata is also defined by a regular expression. Must show that the language is accepted by some D-FSA. • Every language defined by a regular expression is defined by one of these automata. Must show that there is an N-FSA with e-transitions accepting the same language. Veton Këpuska

  16. Finite Automata and Regular Expressions Plan for showing the equivalency of four different notations for regular languages. NFSA e-NFSA RE DFSA Veton Këpuska

  17. Converting Regular Expressions to Automata • We can show that every language L, that is L(R) for some regular expression R, is also L(E) for some e-NFSA E. • Start by showing how to construct automata for basis expressions, single symbols e and f. • Show how to combine these automata into larger automata that accept the union, concatenation, or closure. Veton Këpuska

  18. Converting Regular Expressions to Automata • Theorem: • Every language defined by a regular expression is also defined by a finite automata. • Proof: • Suppose L=L(R) for a regular expression R. We will show that L=L(E) for some e-NFSA E with: • Exactly one accepting state • No arcs into the initial state. • No arcs out of the accepting state. • The proof is by structural induction on R, following the recursive definition of regular expressions. Veton Këpuska

  19. Converting Regular Expressions to Automata BASIS: • The language of automaton is {e} • Depicts construction for f, since there is no path from start state to accepting state. Thus f is the language of automaton. • Language of the automaton is L(a) which is the one string a. Veton Këpuska

  20. Converting Regular Expressions to Automata INDUCTION:It assumed that the statement of the theorem is true for the immediate sub-expressions of a given regular expression. • R+S: L(R)  L(S) • RS: L(R)L(S) • R*: L(R*) Veton Këpuska

  21. Example • Convert (0+1)*1(0+1) to an e-NFSA. • (0+1) • (0+1)* • (0+1)*1(0+1) Veton Këpuska

  22. Converting D-FSA’s to Regular Expressions by Eliminating States • When a state s is eliminated from D-FSA, all the paths that go through s no longer will exist in automaton. Thus, if the language of the automaton is not to change, we must include, an arc that goes directly from state q to state p, the labels of the paths that went from state q to p through state s that is eliminated. Veton Këpuska

  23. Converting D-FSA’s to Regular Expressions by Eliminating States R11+Q1S*P1 Veton Këpuska

  24. Strategy from D-FSA to RE • For each q of D-FSA apply reduction process to produce D-FSA with regular expressions labels on the arcs. Eliminate all states except q and the start state q0. • If q≠q0then we shall be left with a two state automaton that looks like: (R+SU*T)*SU* VetonKëpuska

  25. Strategy from D-FSA to RE • It the start state is also an accepting state, then we must also perform a state-elimination from the original automaton that gets rid of every state but the last start state. When this is done, what is left is a one state automaton that looks like the following: • The desired regular expression is the sum (union) of all the expressions derived from the reduced automata for each accepting state, by rules (2) and (3): R* VetonKëpuska

  26. Example for: D-FSA to RE • Consider N-FSA below that accepts all strings of 0’s and 1’s such that either the second or third position form the end has a 1. Derive equivalent regular expression of the language of this N-FSA. • Solution: • Replace labels with regular expressions. VetonKëpuska

  27. Example for: D-FSA to RE • Eliminate State B: • Predecessor states: A • Successor states: C • Equivalent Expression A → C: 1(0+1) VetonKëpuska

  28. Example for: D-FSA to RE • Branching eliminating states C and D in separate reductions. Elimination of state C: • Predecessor states: A • Successor states: D • Equivalent Expression A → D: 1(0+1)(0+1) VetonKëpuska

  29. Example for: D-FSA to RE • Generic two-state automaton: • ((0+1)*1(0+1)(0+1)) • Eliminating D from Resulting in: • Corresponding RE: ((0+1)*1(0+1)) VetonKëpuska

  30. Example for: D-FSA to RE • Combining two expressions for the entire automaton by summing each RE: • ((0+1)*1(0+1)(0+1)) + ((0+1)*1(0+1)) VetonKëpuska

  31. Algebraic Laws for Regular Expressions 7 October 2008 Veton Këpuska 31

  32. Algebraic Laws for Regular Expressions • Collection of laws that define when two regular expressions are equivalent. • Arithmetic: • Commutativity: (x+y = y+x) • Switching of order of operands does not change results. • Associativity: (xy)z = x(yz) • Regroup the operands when the operator is applied twice. • Regular expressions have a number of laws similar to the laws for arithmetic. 7 October 2008 Veton Këpuska 32

  33. Associativity and Commutativity For L,M and N Languages (defined by Regular Expressions or equivalently by FSA) • Commutative Law for Union: • L+M=M+L • Associative Law for Union: • (L+M)+N=L+(M+N) • Associative Law for Concatenation: • (LM)N=L(MN) 7 October 2008 Veton Këpuska 33

  34. Identities and Annihilators Arithmetic • Identity: • 0 is identity for addition: 0+x = x+0 = x • 1 is identity for multiplication: 1x = x1 = x • Annihilator: • 0 is annihilator for multiplication: 0x = x0 = 0 Regular Expressions • Identity for Union and Concatenation: • ∅+L = L+∅ = L • ∊L = L∊ = L • Annihilator for Concatenation: • ∅L = L∅ = ∅ • Important in simplification of regular expressions. 7 October 2008 Veton Këpuska 34

  35. Distributive Laws Arithmetic • A distributive law involves two operators. Distributive law of multiplication over addition (most common): • x (y+z) = xy+ xz Regular Expressions • Left Distributive Law of Concatenation over union: L(M+N) = LM + LN • Right Distributive Law of Concatenation over union: (M+N)L = ML + NL 7 October 2008 Veton Këpuska 35

  36. Distributive Laws Theorem: • If L, M, and N are any languages, then: L(M  N) = LM  LN Proof: • Show first that a string w is in L(M  N) if and only if it is in LM  LN. • (Only-if) If w is in L(M  N) then w=xy, where xis in L and y is in (M  N) ⇒ y is in M or N. • If y is in M then w=xyis in LM ⇒ is in LM  LN • If y is in N then w=xyis in LN ⇒ is in LM  LN • (if) If w is in LM  LN then wis either in LM or in LN • If w=xyand w is in LM then x is in L and y in M ⇒ If y M then y is in M  N, thus w is in L(M  N) • If w=xyand w is in LN then x is in L and y in N ⇒ If y N then y isis in M  N, thus w is in L(M  N) 7 October 2008 Veton Këpuska 36

  37. The Idempotent Law Arithmetic: • Common arithmetic operators are not idempotent: • x+x ≠ x and • xx ≠ x Regular Expressions: • Idempotent law • L+L=L 7 October 2008 Veton Këpuska 37

  38. Laws Involving Closures • (L*)* = L* - Closing an expression that is already closed does not change the language. • ∅* =  - The closure of ∅ contains only the string . • * =  • L+ = LL* = L*L • L+ = L + LL + LLL + … • L* =  + L + LL + LLL + … =  + L+ LL* = L + LL + LLL + LLLL + … • L = L = L • L* = L+ +  • L? =  + L 7 October 2008 Veton Këpuska 38

  39. Discovering Laws for Regular Expressions • There is an infinite variety of laws about regular expressions that might be proposed. • Is there a general methodology that will make proofs of the correct laws easy? • The truth of a law reduces to a question of the equality of two specific languages. • Technique is closely tied to the regular-expression operators • It can not be extended to expressions involving some other operators (e.g., intersection)

  40. Discovering Laws for Regular Expressions • Consider a proposed law: (L+M)*=(L*M*)* Given two languages L and M: • Closure of the union of the languages, (L+M)*, is identical to closure of concatenation of individually closed languages; (L*M*)*. • Proof: • Suppose w is in the language of (L+M)*. Thus we can write w = w1 w2 w3 … wk for some k, where each wi is in either L or M. • If string wi is in L, this string is also in L*. If the string is not in M then one can pick  from M*. Thus the string is in L*M*. • Similarly we could rationalize forwi in M showing that the string is in L*M* • Since each wiof w = w1 w2 w3 … wk … is in L*M*, its closed language must be in (L*M*)* • Must also show that strings in (L*M*)* are in (L+M)* to complete the proof. • Exercise Problem.

  41. Regular Expressions Details

  42. Regular Expressions • Formally, a regular expression is an algebraic notation for characterizing a set of strings. • Thus they can be used to specify search strings as well as to define a language in a formal way. • Regular Expression requires • A pattern that we want to search for, and • A corpus of text to search through. • Thus when we give a search pattern, we will assume that the search engine returns the line of the documentreturned. This is what the UNIX grep command does. • We will underline the exact part of the pattern that matches the regular expression. • A search can be designed to return all matches to a regular expression or only the first match. We will show only the first match. Veton Këpuska

  43. Basic Regular Expression Patterns • The simplest kind of regular expression is a sequence of simple characters: • /woodchuck/ • /Buttercup/ • /!/ Veton Këpuska

  44. Basic Regular Expression Patterns • Regular Expressions are case sensitive • /s/ • /S/ • /woodchucks/ will not match “Woodchucks” • Disjunction: “[“ and “]”. Veton Këpuska

  45. Basic Regular Expression Patterns • Specifying range in Regular Expressions: “-” Veton Këpuska

  46. Basic Regular Expression Patterns • Negative Specification – what pattern can not be: “^” • If the first symbol after the open square brace “[” is “^” the resulting pattern is negated. • Example /[^a]/ matches any single character (including special characters) except a. Veton Këpuska

  47. Basic Regular Expression Patterns • How do we specify both woodchuck and woodchucks? • Optional character specification: /?/ • /?/ means “the preceding character or nothing”. Veton Këpuska

  48. Basic Regular Expression Patterns • Question-mark “?” can be though of as “zero or one instances of the previous character”. • It is a way to specify how many of something that we want. • Sometimes we need to specify regular expressions that allow repetitions of things. • For example, consider the language of (certain) sheep, which consists of strings that look like the following: • baa! • baaa? • baaaa? • baaaaa? • baaaaaa? • … Veton Këpuska

  49. Basic Regular Expression Patterns • Any number of repetitions is specified by “*” which means “any string of 0 or more”. • Examples: • /aa*/ - a followed by zero or more a’s • /[ab]*/ - zero or more a’s or b’s. This will match aaaa or abababa or bbbb Veton Këpuska

  50. Basic Regular Expression Patterns • We know enough to specify part of our regular expression for prices: multiple digits. • Regular expression for individual digit: • /[0-9]/ • Regular expression for an integer: • /[0-9][0-9]*/ • Why is not just /[0-9]*/? • Because it is annoying to specify “at least once” RE since it involves repetition of the same pattern there is a special character that is used for “at least once”: “+” • Regular expression for an integer becomes then: • /[0-9]+/ • Regular expression for sheep language: • /baa*!/, or • /ba+!/ Veton Këpuska

More Related