1 / 4

Regular Expressions (RE's)– Review

Regular Expressions (RE's)– Review. A means of describing a possibly infinite language in finite terms. We aim to turn a RE into a Deterministic Finite State Automaton (DFA) Steps: 1: RE -> Non Deterministic Finite State Automaton (FSA) 2: FSA -> DFA 3: DFA -> minDFA.

emery
Download Presentation

Regular Expressions (RE's)– Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expressions (RE's)– Review • A means of describing a possibly infinite language in finite terms. • We aim to turn a RE into a Deterministic Finite State Automaton (DFA) • Steps: • 1: RE -> Non Deterministic Finite State Automaton (FSA) • 2: FSA -> DFA • 3: DFA -> minDFA • Aim is the create a mechanism to recognise valid words in a Language. • In our course it means recognising words like int, float, public etc. • These are called Tokens. • NB: Also it classifies the Tokens !!

  2. JLex • Java version of Lex. • Given a file containing RE's and JLex macros (.lex file) • We run JLex over this .lex file and a .java file is produced. • We then call JLex to produce a Token by using next_token(). • No need to code the DFA ourselves, it is automatic, saves time.

  3. Limitations of RE's • Say we define the following RE's: • digits = [0-9]+ • sum = (digits “+” )* digits • we can define sums like 3+78+9 etc. • If we have: • digits = [0-9]+ • sum = expr “+” expr • expr = “(“ sum “)” | digits • we can define (1+(5+8)) etc. • It is impossible for a RE to recognise balanced parenthesis. • A machine with only N states can onle recognise N levels of parenthesis nesting. • Therefore we need a new notation to represent the language above. • We move on to Context Free Grammars.

  4. Context Free Grammars (CFG's) • RE's define lexcial structure declaratively. • Similarly CFG's define syntactic structure declaratively. • Definitions: • A langauge is a set of strings. • Each string is a finite sequence of symbols. • Symbols come from a finite alphabet. • CFG's describe languages and is formed of productions. • E.g. symbol -> sym1 sym2 sym3 ...... sym(N) • Symbols are either • 1: Terminal < -- > Token • 2: Non Terminal : Variable to denote a set of Strings.

More Related