520 likes | 711 Views
CS 3240 – Chapter 3. Regular Languages and Grammars. Directory Operations. How would you delete all C++ files from a directory from the command line ? How about all PowerPoint files that start with the letter a ? PowerPoint file names that contain the string 3240 ?. Patterns for Strings.
E N D
CS 3240 – Chapter 3 Regular Languages and Grammars
Directory Operations • How would you delete all C++ files from a directory from the command line? • How about all PowerPoint files that start with the letter a? • PowerPoint file names that contain the string 3240? CS 3240 - Regular Languages and Grammars
Patterns for Strings • *.cpp • a*.ppt • *3240*.ppt • These are wildcard expressions • Not bona fide regular expressions CS 3240 - Regular Languages and Grammars
Where Are We? CS 3240 - Introduction
Regular Expressions • Text patterns that represent regular languages • We’ll show shortly that for every regular expression there is a finite automaton that accepts that language • And vice-versa • The operators are: • ( ) (Grouping) • * (Kleene Star) • + (Union) • xy (Concatenation) CS 3240 - Regular Languages and Grammars
Recursive Definitions of Sets • 1) Specify base case(s) • 2) Show how to generate other elements • Rules that use what’s in the set already • Example: Non-negative multiples of 5, F • 1) 0, 5 is in F • 2) For x, y in F, then x + y is in F • Alternate definition: • 1) 0 is in F • 2) For x in F, so is x + 5 CS 3240 - Regular Languages and Grammars
Regular ExpressionsRecursive Definition • Base cases: • The empty set: ∅ or ( ) • The empty string: λ • Any letter in Σ • Recursive rules: Given regular expressions r, r1, r2: • (r) (Grouping) • r* (Kleene Star) • r1 + r2 (Union) • r1r2 (Concatenation) CS 3240 - Regular Languages and Grammars
Regular ExpressionsExamples • All strings beginning with a: • a(a + b)* • All strings containing aba: • (a + b)*aba(a + b)* • All strings of even length: • ((a + b)(a + b))* = (aa + ba + ab + bb)* = ((a + b)2)* • All strings of odd length: • (a+b)((a + b)2)* • Valid decimal integers in C: • (1+2+3+4+5+6+7+8+9)(0+1+2+3+4+5+6+7+8+9)* CS 3240 - Regular Languages and Grammars
Taking Liberties with Transition Graphs • Put anything you want on an edge • Use an “else” branch as well • [0-9] (if-branch) • ~[0-9] or [^(0-9)] or else (Decimal integers) CS 3240 - Regular Languages and Grammars
What Language? • (b*ab*ab*ab* + b) * • = b* (ab*ab*ab*) * • = b* + (b*ab*ab*ab*) * • (a(a+bb) *) * • ((a + b)a) * CS 3240 - Regular Languages and Grammars
Language Associated with a Regular Expression (Stating the Obvious) • L(∅) =∅ • L(λ) = λ • L(c) = c, for c∊Σ • L((r)) = L(r) • L(r*) = L(r)* • L(r1 + r2) = L(r1) ∪ L(r2) • L(r1r2) = L(r1)L(r2) CS 3240 - Regular Languages and Grammars
“Algebra” of Regular Expressions • r+s = s+r • (r+s)+t = r+(s+t) • r+r = r • r + ∅ = r • (rs)t = r(st) • rλ = λr = r • r ∅ = ∅r = ∅ • r(s+t) = rs+rt • (r+s)t = rt+st CS 3240 - Regular Languages and Grammars
Regular Expressions and Finite Automata (Section 3.2) • For every regular expression there is an associated NFA that accepts the same language • And therefore a DFA, by conversion • For every FA (either NFA or DFA) there is a regular expression that represents the same language CS 3240 - Regular Languages and Grammars
Regular Expression => NFA • We will show how to convert each element of the definition of regular expressions to an NFA • This is sufficient! • And shows the convenience of recursive definitions (review slide 7 now) • because if we can give a machine for every case in the definition of REs, we are done! CS 3240 - Regular Languages and Grammars
Mapping Primitives REs • Empty Language • Empty String • Single Character CS 3240 - Regular Languages and Grammars
Mapping Union of REs CS 3240 - Regular Languages and Grammars
Mapping Union of REsA Simplification • Just draw the lambdas from a new start state to the start states of each machine • Remove the start notation from the original start states • (No need to have a new final state) CS 3240 - Regular Languages and Grammars
Mapping Concatenation of REs CS 3240 - Regular Languages and Grammars
Mapping Concatenation of REsA Simplification • 1) Just draw a lambda from each final state of the first machine to the start state of the second machine • 2) remove the acceptability of those final states of the first machine CS 3240 - Regular Languages and Grammars
Mapping Kleene Star of a RE CS 3240 - Regular Languages and Grammars
Mapping Kleene Star of a REA Simplification • We need to do two things: • 1) Add the empty string, if needed • 2) Loop from each final state back to the start state • Procedure: • 1) If the empty string is not accepted, create a new start state which accepts, and connect to the original start state with λ • 2) Add a λ-edge from each final state to the original (or the new) start state CS 3240 - Regular Languages and Grammars
Practice • Draw NFAs for the REs on slides 8 and 9 CS 3240 - Regular Languages and Grammars
FA => Regular Expression • First remove all jails • Then, if needed, convert the DFA to an equivalent NFA with • A start state with no incoming edges • A single final state with no outgoing edges • Will need lambda transitions for this • Then “eliminate” all but the start and final states • Without changing the language accepted • Using GTGs… CS 3240 - Regular Languages and Grammars
Generalized Transition GraphsGTGs • Allow regular expressions on the edges Accepts a* + a*(a+b)c* [Note: (c*)* = c*] CS 3240 - Regular Languages and Grammars
FA => REStep 1 • If the start state has an incoming edge (even if it’s a loop), create a new start state with a lambda transition to the old start state: CS 3240 - Regular Languages and Grammars
FA => REStep 2 • If there is more than one final state, or if the single final state has an outgoing edge (even if it’s a loop), create a new final state and link to it with a lambda transition from each final state: CS 3240 - Regular Languages and Grammars
FA => REStep 3 • “Remove” each intermediate state, one at a time: • Combine each incoming path with each outgoing path (only “through” paths; not loops) • Determine the regular expression equivalent to the combined path through the current state • Add an edge with that RE between the incoming state and the outgoing state • Repeat until all intermediate states vanish CS 3240 - Regular Languages and Grammars
FA => REExample CS 3240 - Regular Languages and Grammars
FA => REExample: Steps 1 and 2 • To eliminate 2: • 1-2-1: af*b • 1-2-3: af*c • 3-2-1: df*b • 3-2-3: df*c CS 3240 - Regular Languages and Grammars
FA => REExample: Step 3a (State 2 removed) • To eliminate 1: • 0-1-3: (e+af*b)*(h+af*c) • 3-1-3: (i+df*b)(e+af*b)*(h+af*c) CS 3240 - Regular Languages and Grammars
FA => REExample: Step 3b (State 1 removed) Eliminate 3 (Final Result): (e+af*b) *(h+af*c)(g+df*c+(i+df*b)(e+af*b) *(h+af*c))* CS 3240 - Regular Languages and Grammars
FA => REEVEN - EVEN CS 3240 - Regular Languages and Grammars
Exercise • Find a regular expression for the language containing all strings that do not contain the substring aa CS 3240 - Regular Languages and Grammars
FA => REOnline Document • See bypass.doc • Shows different possibilities by eliminating states in different orders • But the REs obtained are equivalent • Meaning they represent the same language CS 3240 - Regular Languages and Grammars
Where Are We? CS 3240 - Introduction
Regular GrammarsSection 3.3 • There is a natural correspondence between FAs and grammars • Right-linear Grammars • “Linear” means there is at most one variableon the right-hand side of the rule • “Right-linear” means the variable occurs as the last entry in the rule: • A → abC CS 3240 - Regular Languages and Grammars
Equivalence of FAs and Grammars • The variables represent states • The right-hand side contains the character(s) on the edge, optionally followed by the target state • The accepting states have a lambda rule A → aB | bC | λ B → aA | bD C → aD | bA D → aC | bB CS 3240 - Regular Languages and Grammars
Rules Without a Variable • Go to an accepting state with no out-edges A → b CS 3240 - Regular Languages and Grammars
Another Grammar for EVEN-EVEN • S → aaS | bbS | abA | baA | λ • A → aaA | bbA | abS | baS a GTG CS 3240 - Regular Languages and Grammars
Exercise • Construct a regular grammar for the language denoted by aab*a • First build a GTG • Then map to a right-linear grammar CS 3240 - Regular Languages and Grammars
A Left-Linear Grammaraab*a • S → Xa • X → Xb | aa • How did I come up with this? CS 3240 - Regular Languages and Grammars
Left-linear = Right-linear • If you have the single variable only at the left ends, you have a left-linear grammar • This is also a regular grammar • We will show how to convert between right-linear and left-linear grammars • We will use two facts to establish the process: • If L is regular, so is LR (Section 2.3, exercise 12) • L(GR) = L(G)R(obvious, but on next slide…) CS 3240 - Regular Languages and Grammars
L(GR) = L(G)R • GR means you reverse the right-hand sides of each rule in a grammar, G • The language generated is L(G)R (the reverse of L(G)) S → abS | X X → bX | λ(ab)*b* S → Sba | X X → Xb | λ b*(ba)* CS 3240 - Regular Languages and Grammars
Convert Right-linear to Left-linearUsing 2 Reversals • Convert the right-linear grammar to a GTG • “Reverse” the GTG (a la Section 2.3, #12) • Ensure a single final state (use λ if needed) • Interchange the role of the start and final states • Reverse all arrows • Convert the reversed GTG to a right-linear grammar • Reverse the right-hand sides of each rule to obtain the left-linear grammar CS 3240 - Regular Languages and Grammars
ExampleConverting Right-linear to Left-linear: (aab)*ab A → aB B → abA | b (rev) C → bB B → aA A → baB | λ ba(baa)* C → Bb B → Aa A → Bab | λ (aab)*ab (rev) CS 3240 - Regular Languages and Grammars
Convert Left-linear to Right-linearReverse the Steps on Previous Slide • Reverse the grammar, G, obtaining right-linear grammar, GR, for L(G)R • Convert to GTG • Reverse the GTG • Convert to Right-linear CS 3240 - Regular Languages and Grammars
Summary CS 3240 - Regular Languages and Grammars