1 / 13

Regular Expressions

Regular Expressions. Regular Languages and Regular expressions are used to describe the patterns which describe lexemes. Regular expressions are composed of empty-string, concatenation, union, and closure. Examples: A(A | D)* where A is alphabetic and D is a digit (+ | - | ε ) D D*.

Download Presentation

Regular Expressions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expressions • Regular Languages and Regular expressions are used to describe the patterns which describe lexemes. • Regular expressions are composed of empty-string, concatenation, union, and closure. • Examples: A(A | D)* where A is alphabetic and Dis a digit (+ | - | ε ) D D* closure union Empty-string Concatenation is implicit

  2. Meaning of Regular Expressions Let A,B be sets of strings: The empty string: "" ε= { "" } (sometimes <empty> ) Concatenation by juxtaposition: AB = a^b where a in A and b in B A = {"x", "qw"} and B = {"v", "A"} then AB = { "xv", "xA", "qwv", "qwA"}

  3. Meaning of Regular Expressions (cont.) Union by | (or other symbols like U etc) A = {"x", "qw"} and B = {"v", "A"} then A|B = {"x", "qw", "v", "A"} Closure by * Thus A* = {""} | A | AA | AAA | ... = A0 | A1 | A2 | A3 | ... A = {"x", "qw"} then A* = { "" } | {"x", "qw"} | {"xqw", "qwx","xx", "qwqw"} | ...

  4. Regular Expressions as a language • We can treat regular expressions as a programming language. • Each expression is a new program. • Programs can be compiled. • How do we represent the regular expression language? By using a datatype. datatype RE = Empty | Union of RE * RE | Concat of RE * RE | Star of RE | C of char;

  5. Example RE program (+ | - | ε ) D D* val re1 = Concat(Union(C #”+”,Union(C #”-”,Empty)) ,Concat(C #”D”,Star (C #”D”)))

  6. R.E.’s and FSA’s • Algorithm that constructs a FSA from a regular expression. • FSA • alphabet , A • set of states, S • a transition function, A x S -> S • a start state, S0 • a set of accepting states, SF subset of S • Defined by cases over the structure of regular expressions • Let A,B be R.E.’s, “x” in A, then • ε is a R.E. • “x” is a R.E. • AB is a R.E. • A|B is a R.E. • A* is a R.E. 1 Rule for each case

  7. ε x B A ε ε A ε ε B ε ε ε A ε Rules • ε • “x” • AB • A|B • A*

  8. Example: (a|b)*abb ε a 2 3 ε ε ε ε 6 7 1 0 b ε ε 5 4 a ε 8 b b 10 9 • Note the many ε transitions • Loops caused by the * • Non-Determinism, many paths out of a state on “a”

  9. Building an NFA from a RE datatype Label = Epsilon | Char of char; type Start = int; type Finish = int; datatype Edge = Edge of Start * Label * Finish; val next = ref 0; fun new () = let val ref n = next in (next := n+1; n) end; Ref makes a mutable variable Semi colon separates commands (inside parenthesis)

  10. ε x ε ε A ε ε B fun nfa Empty = let val s = new() val f = new() in (s,f,[Edge(s,Epsilon,f)]):Nfa end | nfa (C x) = let val s = new() val f = new() in (s,f,[Edge(s,Char x,f)]) end | nfa (Union(x,y)) = let val (sx,fx,xes) = nfa x val (sy,fy,yes) = nfa y val s = new() val f = new() val newes = [Edge(s,Epsilon,sx) ,Edge(s,Epsilon,sy) ,Edge(fx,Epsilon,f) ,Edge(fy,Epsilon,f)] in (s,f,newes @ xes @ yes) end

  11. B A ε ε ε A ε | nfa (Concat(x,y)) = let val (sx,fx,xes) = nfa x val (sy,fy,yes) = nfa y in (sx,fy,(Edge(fx,Epsilon,sy)):: (xes @ yes)) end | nfa (Star r) = let val (sr,fr,res) = nfa r val s = new() val f = new() val newes = [Edge(s,Epsilon,sr) ,Edge(fr,Epsilon,f) ,Edge(s,Epsilon,f) ,Edge(f,Epsilon,s)] in (s,f,newes @ res) end

  12. Example use val re1 = Concat(Union(C #”+”,Union(C #”-”,Empty)) ,Concat(C #”D”,Star (C #”D”))) Val ex6 = nfa re1; val ex6 = (8,15, [Edge (9,Epsilon,10),Edge (8,Epsilon,0) ,Edge (8,Epsilon,6),Edge (1,Epsilon,9) ,Edge (7,Epsilon,9),Edge (0,Char #,1) ,Edge (6,Epsilon,2),Edge (6,Epsilon,4) ,Edge (3,Epsilon,7),Edge (5,Epsilon,7) ,Edge (2,Char #,3),Edge (4,Epsilon,5),...]) : Nfa

  13. Assignment #3 CS321 Prog Lang & Compilers Assignment # 3 Assigned: Jan 22, 2007 Due: Wed. Jan 24, 2007 Turn in a listing, and a transcript that shows you have tested your code. A minimum of 3 tests is necessary. Some functions may require more than 3 tests to receive full credit. 1) Write the following functions over lists. You must use pattern matching and recursion. A. reverse a list so that its elements appear in the oposite order. reverse [1,2,3,4] ----> [4,3,2,1] B. Count the number of occurrences of an element in a list count 4 [1,2,3,4,5,4] ---> 2 count 4 [1,2,3,2,1] ---> 0 C. concatenate together a list of lists concat [[1,2],[],[5,6]] ----> [1,2,5,6] 2) Using the datatype for Regular Expressions we defined in class datatype RE = Empty | Union of RE * RE | Concat of RE * RE | Star of RE | C of char; Write a function that turns a RE into a string, so that it can be printed. Minimize the number of parenthesis, but keep the string unambigouous by using the following rules. 1) Star has highest precedence so: ab* means a(b*) 2) Concat has the next highest precedence so: a+bc means a+(bc) 3) Union has lowest precedence so: a+bc+c* means a+(bc)+(c*) 4) Use the hash mark (#) as the empty string. 5) Special characters *+()\ should be escaped by using a preceeding backslash. So (Concat (C #"+") (C #"a")) should be "\+a" Hints: 1) The string concatenation operator is usefull: "abc" ^ "zx" -----> "abczx" 2) Write this is two steps. First, fully paranethesize every RE Second, Change the function to not add the parenthesis which the rules don't require.

More Related