Regular Expressions

1 / 13

# Regular Expressions - PowerPoint PPT Presentation

Regular Expressions. Regular Languages and Regular expressions are used to describe the patterns which describe lexemes. Regular expressions are composed of empty-string, concatenation, union, and closure. Examples: A(A | D)* where A is alphabetic and D is a digit (+ | - | ε ) D D*.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Regular Expressions

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Regular Expressions
• Regular Languages and Regular expressions are used to describe the patterns which describe lexemes.
• Regular expressions are composed of empty-string, concatenation, union, and closure.
• Examples:

A(A | D)* where A is alphabetic and

Dis a digit

(+ | - | ε ) D D*

closure

union

Empty-string

Concatenation is implicit

Meaning of Regular Expressions

Let A,B be sets of strings:

The empty string: ""

ε= { "" }

(sometimes <empty> )

Concatenation by juxtaposition:

AB = a^b where a in A and b in B

A = {"x", "qw"} and B = {"v", "A"}

then AB = { "xv", "xA", "qwv", "qwA"}

Meaning of Regular Expressions (cont.)

Union by | (or other symbols like U etc)

A = {"x", "qw"} and B = {"v", "A"}

then A|B = {"x", "qw", "v", "A"}

Closure by *

Thus A* = {""} | A | AA | AAA | ...

= A0 | A1 | A2 | A3 | ...

A = {"x", "qw"}

then A* = { "" } | {"x", "qw"}

| {"xqw", "qwx","xx", "qwqw"} | ...

Regular Expressions as a language
• We can treat regular expressions as a programming language.
• Each expression is a new program.
• Programs can be compiled.
• How do we represent the regular expression language? By using a datatype.

datatype RE

= Empty

| Union of RE * RE

| Concat of RE * RE

| Star of RE

| C of char;

Example RE program

(+ | - | ε ) D D*

val re1 =

Concat(Union(C #”+”,Union(C #”-”,Empty))

,Concat(C #”D”,Star (C #”D”)))

R.E.’s and FSA’s
• Algorithm that constructs a FSA from a regular expression.
• FSA
• alphabet , A
• set of states, S
• a transition function, A x S -> S
• a start state, S0
• a set of accepting states, SF subset of S
• Defined by cases over the structure of regular expressions
• Let A,B be R.E.’s, “x” in A, then
• ε is a R.E.
• “x” is a R.E.
• AB is a R.E.
• A|B is a R.E.
• A* is a R.E.

1 Rule for each case

ε

x

B

A

ε

ε

A

ε

ε

B

ε

ε

ε

A

ε

Rules
• ε
• “x”
• AB
• A|B
• A*
Example: (a|b)*abb

ε

a

2

3

ε

ε

ε

ε

6

7

1

0

b

ε

ε

5

4

a

ε

8

b

b

10

9

• Note the many ε transitions
• Loops caused by the *
• Non-Determinism, many paths out of a state on “a”
Building an NFA from a RE

datatype Label

= Epsilon

| Char of char;

type Start = int;

type Finish = int;

datatype Edge

= Edge of Start * Label * Finish;

val next = ref 0;

fun new () = let val ref n = next

in (next := n+1; n) end;

Ref makes a mutable variable

Semi colon separates commands (inside parenthesis)

ε

x

ε

ε

A

ε

ε

B

fun nfa Empty =

let val s = new()

val f = new()

in (s,f,[Edge(s,Epsilon,f)]):Nfa end

| nfa (C x) =

let val s = new()

val f = new()

in (s,f,[Edge(s,Char x,f)]) end

| nfa (Union(x,y)) =

let val (sx,fx,xes) = nfa x

val (sy,fy,yes) = nfa y

val s = new()

val f = new()

val newes =

[Edge(s,Epsilon,sx)

,Edge(s,Epsilon,sy)

,Edge(fx,Epsilon,f)

,Edge(fy,Epsilon,f)]

in (s,f,newes @ xes @ yes) end

B

A

ε

ε

ε

A

ε

| nfa (Concat(x,y)) =

let val (sx,fx,xes) = nfa x

val (sy,fy,yes) = nfa y

in (sx,fy,(Edge(fx,Epsilon,sy))::

(xes @ yes))

end

| nfa (Star r) =

let val (sr,fr,res) = nfa r

val s = new()

val f = new()

val newes = [Edge(s,Epsilon,sr)

,Edge(fr,Epsilon,f)

,Edge(s,Epsilon,f)

,Edge(f,Epsilon,s)]

in (s,f,newes @ res) end

Example use

val re1 =

Concat(Union(C #”+”,Union(C #”-”,Empty))

,Concat(C #”D”,Star (C #”D”)))

Val ex6 = nfa re1;

val ex6 =

(8,15,

[Edge (9,Epsilon,10),Edge (8,Epsilon,0)

,Edge (8,Epsilon,6),Edge (1,Epsilon,9)

,Edge (7,Epsilon,9),Edge (0,Char #,1)

,Edge (6,Epsilon,2),Edge (6,Epsilon,4)

,Edge (3,Epsilon,7),Edge (5,Epsilon,7)

,Edge (2,Char #,3),Edge (4,Epsilon,5),...]) : Nfa

Assignment #3

CS321 Prog Lang & Compilers Assignment # 3 Assigned: Jan 22, 2007 Due: Wed. Jan 24, 2007

Turn in a listing, and a transcript that shows you have tested your code. A minimum of 3 tests is necessary.

Some functions may require more than 3 tests to receive full credit.

1) Write the following functions over lists. You must use pattern matching and recursion.

A. reverse a list so that its elements appear in the oposite order. reverse [1,2,3,4] ----> [4,3,2,1]

B. Count the number of occurrences of an element in a list

count 4 [1,2,3,4,5,4] ---> 2 count 4 [1,2,3,2,1] ---> 0

C. concatenate together a list of lists

concat [[1,2],[],[5,6]] ----> [1,2,5,6]

2) Using the datatype for Regular Expressions we defined in class

datatype RE

= Empty

| Union of RE * RE

| Concat of RE * RE

| Star of RE

| C of char;

Write a function that turns a RE into a string, so that it can be

printed. Minimize the number of parenthesis, but keep the string

unambigouous by using the following rules.

1) Star has highest precedence so: ab* means a(b*)

2) Concat has the next highest precedence so: a+bc means a+(bc)

3) Union has lowest precedence so: a+bc+c* means a+(bc)+(c*)

4) Use the hash mark (#) as the empty string.

5) Special characters *+()\ should be escaped by using a

preceeding backslash.

So (Concat (C #"+") (C #"a")) should be "\+a"

Hints:

1) The string concatenation operator is usefull:

"abc" ^ "zx" -----> "abczx"

2) Write this is two steps.

First, fully paranethesize every RE

Second, Change the function to not add the parenthesis which

the rules don't require.