130 likes | 242 Views
Regular expressions (RE) are powerful tools for describing patterns in strings (lexemes) through constructs like empty string, concatenation, union, and closure. This description elaborates on how REs can be treated as a programming language, with expressions as programs that can be compiled. It discusses how to represent REs using data types and the algorithms for constructing finite state automata (FSA) from REs. Additionally, it provides examples and case rules governing REs, alongside practical exercises to implement these concepts using functional programming techniques.
E N D
Regular Expressions • Regular Languages and Regular expressions are used to describe the patterns which describe lexemes. • Regular expressions are composed of empty-string, concatenation, union, and closure. • Examples: A(A | D)* where A is alphabetic and Dis a digit (+ | - | ε ) D D* closure union Empty-string Concatenation is implicit
Meaning of Regular Expressions Let A,B be sets of strings: The empty string: "" ε= { "" } (sometimes <empty> ) Concatenation by juxtaposition: AB = a^b where a in A and b in B A = {"x", "qw"} and B = {"v", "A"} then AB = { "xv", "xA", "qwv", "qwA"}
Meaning of Regular Expressions (cont.) Union by | (or other symbols like U etc) A = {"x", "qw"} and B = {"v", "A"} then A|B = {"x", "qw", "v", "A"} Closure by * Thus A* = {""} | A | AA | AAA | ... = A0 | A1 | A2 | A3 | ... A = {"x", "qw"} then A* = { "" } | {"x", "qw"} | {"xqw", "qwx","xx", "qwqw"} | ...
Regular Expressions as a language • We can treat regular expressions as a programming language. • Each expression is a new program. • Programs can be compiled. • How do we represent the regular expression language? By using a datatype. datatype RE = Empty | Union of RE * RE | Concat of RE * RE | Star of RE | C of char;
Example RE program (+ | - | ε ) D D* val re1 = Concat(Union(C #”+”,Union(C #”-”,Empty)) ,Concat(C #”D”,Star (C #”D”)))
R.E.’s and FSA’s • Algorithm that constructs a FSA from a regular expression. • FSA • alphabet , A • set of states, S • a transition function, A x S -> S • a start state, S0 • a set of accepting states, SF subset of S • Defined by cases over the structure of regular expressions • Let A,B be R.E.’s, “x” in A, then • ε is a R.E. • “x” is a R.E. • AB is a R.E. • A|B is a R.E. • A* is a R.E. 1 Rule for each case
ε x B A ε ε A ε ε B ε ε ε A ε Rules • ε • “x” • AB • A|B • A*
Example: (a|b)*abb ε a 2 3 ε ε ε ε 6 7 1 0 b ε ε 5 4 a ε 8 b b 10 9 • Note the many ε transitions • Loops caused by the * • Non-Determinism, many paths out of a state on “a”
Building an NFA from a RE datatype Label = Epsilon | Char of char; type Start = int; type Finish = int; datatype Edge = Edge of Start * Label * Finish; val next = ref 0; fun new () = let val ref n = next in (next := n+1; n) end; Ref makes a mutable variable Semi colon separates commands (inside parenthesis)
ε x ε ε A ε ε B fun nfa Empty = let val s = new() val f = new() in (s,f,[Edge(s,Epsilon,f)]):Nfa end | nfa (C x) = let val s = new() val f = new() in (s,f,[Edge(s,Char x,f)]) end | nfa (Union(x,y)) = let val (sx,fx,xes) = nfa x val (sy,fy,yes) = nfa y val s = new() val f = new() val newes = [Edge(s,Epsilon,sx) ,Edge(s,Epsilon,sy) ,Edge(fx,Epsilon,f) ,Edge(fy,Epsilon,f)] in (s,f,newes @ xes @ yes) end
B A ε ε ε A ε | nfa (Concat(x,y)) = let val (sx,fx,xes) = nfa x val (sy,fy,yes) = nfa y in (sx,fy,(Edge(fx,Epsilon,sy)):: (xes @ yes)) end | nfa (Star r) = let val (sr,fr,res) = nfa r val s = new() val f = new() val newes = [Edge(s,Epsilon,sr) ,Edge(fr,Epsilon,f) ,Edge(s,Epsilon,f) ,Edge(f,Epsilon,s)] in (s,f,newes @ res) end
Example use val re1 = Concat(Union(C #”+”,Union(C #”-”,Empty)) ,Concat(C #”D”,Star (C #”D”))) Val ex6 = nfa re1; val ex6 = (8,15, [Edge (9,Epsilon,10),Edge (8,Epsilon,0) ,Edge (8,Epsilon,6),Edge (1,Epsilon,9) ,Edge (7,Epsilon,9),Edge (0,Char #,1) ,Edge (6,Epsilon,2),Edge (6,Epsilon,4) ,Edge (3,Epsilon,7),Edge (5,Epsilon,7) ,Edge (2,Char #,3),Edge (4,Epsilon,5),...]) : Nfa
Assignment #3 CS321 Prog Lang & Compilers Assignment # 3 Assigned: Jan 22, 2007 Due: Wed. Jan 24, 2007 Turn in a listing, and a transcript that shows you have tested your code. A minimum of 3 tests is necessary. Some functions may require more than 3 tests to receive full credit. 1) Write the following functions over lists. You must use pattern matching and recursion. A. reverse a list so that its elements appear in the oposite order. reverse [1,2,3,4] ----> [4,3,2,1] B. Count the number of occurrences of an element in a list count 4 [1,2,3,4,5,4] ---> 2 count 4 [1,2,3,2,1] ---> 0 C. concatenate together a list of lists concat [[1,2],[],[5,6]] ----> [1,2,5,6] 2) Using the datatype for Regular Expressions we defined in class datatype RE = Empty | Union of RE * RE | Concat of RE * RE | Star of RE | C of char; Write a function that turns a RE into a string, so that it can be printed. Minimize the number of parenthesis, but keep the string unambigouous by using the following rules. 1) Star has highest precedence so: ab* means a(b*) 2) Concat has the next highest precedence so: a+bc means a+(bc) 3) Union has lowest precedence so: a+bc+c* means a+(bc)+(c*) 4) Use the hash mark (#) as the empty string. 5) Special characters *+()\ should be escaped by using a preceeding backslash. So (Concat (C #"+") (C #"a")) should be "\+a" Hints: 1) The string concatenation operator is usefull: "abc" ^ "zx" -----> "abczx" 2) Write this is two steps. First, fully paranethesize every RE Second, Change the function to not add the parenthesis which the rules don't require.