1 / 28

Languages and Machines

Languages and Machines. Unit two: Regular languages and Finite State Automata. Review of week one. A language is a set of strings (the set of different things you can say). May be infinite. A string is a sequence of symbols. Minimum length zero, maximum length some finite number.

jock
Download Presentation

Languages and Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Languages and Machines Unit two: Regular languages and Finite State Automata

  2. Review of week one • A language is a set of strings (the set of different things you can say). May be infinite. • A string is a sequence of symbols. Minimum length zero, maximum length some finite number. • A symbol is just some mark on the page or screen. A language has a finite alphabet of symbols.

  3. Review of week one • In a context-dependent language, the meaning of a phrase depends on the context • In a context-sensitive language, the structure of a phrase depends on the context • Most natural languages are context-dependent but not context-sensitive • A context-free language is one where the structure of a phrase is always the same, independent of context • A regular language is a context-free language which has simple rules for forming valid strings (e.g. "94", "getWidth()“)

  4. Classes of formal language phrase structure context-sensitive context-free regular

  5. Regular languages • Here are examples of strings from a regular language with alphabet {a,b}: • a • b • ab • aaaaa • ababab

  6. Regular languages • the empty set is a regular language • the set consisting of the empty string () is a regular language • the set consisting of a one-symbol string is a regular language • a new regular language can be made by taking a string from a regular language and concatenating it with a string from a regular language • a new regular language can be made by taking the disjoint union of two regular languages

  7. Recognizing regular languages • regular languages can be recognized and interpreted by a finite-state machine • for example, here is a machine to recognize a two-bit string: 0 0 acceptor states 1 1

  8. Regular expressions Wouldn’t it be nice if we had a compact way of specifying a regular language? • we have! • it’s a special notation called a regular expression

  9. Examples of regular languages • the set of all two-symbol strings containing the letters a and b (a|b)2 • the set of all two-bit strings (0|1)2 • the set of all possible words (a|..|z)+ • the set of all decimal integers (0|(1|..|9)(0|..|9)*) • the set of Java identifiers JavaLetter JavaLetterOrDigit*

  10. More examples of regular languages • all the possible three-bit strings (0|1)3 • all the single-digit decimal numbers (0|1|2|3|4|5|6|7|8|9) (0|..|9) • all the possible repetitions of the traffic-light sequence (red, amber, green, amber) (red amber green amber)*

  11. Activity Write down the regular expression denoting the following regular languages: • The language with two strings “the cat” and “the mat” • Arithmetic expressions with two operands, e.g. 1 + 2, 3 × 4 The allowed operator are: +, -, ×, ÷ The allowed operands are: single digit decimal numbers • The language consisting of all possible binary strings • The language of HTML tags such as <HEAD>

  12. Suggested Answers • The language with two strings “the cat” and “the mat” the (cat | mat) or (the (c|m)at) • Arithmetic expressions with two operands, e.g. 1 + 2, 3 × 4. (0|..|9) (+|-|×|÷) (0|..|9) • The language consisting of all possible binary strings (0|1)* • The language of HTML tags such as <HEAD> < (A|..|Z)+ >

  13. A cautionary note • You have been using a metalanguage! • The regular expression strings form a language having terminal symbols ( ) + * | plus literal symbols e.g. a stands for the letter a • this can cause problems when the metalanguage and the language get confused e.g. the language consisting of strings of one to three vertical bars: | | || | |||

  14. A cautionary note • we can fix this by some ghastly escape convention, e.g. convert the above to "|" | "||" | "|||" • now we have problems with the quote symbol! • the best idea is to choose metalanguage symbols which are rarely encountered in the language being described, and use bold-face or color to distinguish

  15. Regular languages and regular expressions Regular Expression • a • a b • a | b Regular Language • the empty set • the set consisting of the empty string () • the set consisting of a one-symbol string (e.g. "a") • a new regular language can be made by taking a string from a regular language and concatenating it with a string from a regular language • a new regular language can be made by taking the union of two regular languages

  16. Regular languages and regular expressions The other ways of forming regular expressions are just shorthand: a0= a1= a a2= aa a* = | a | aa | aaa | ... a+= a | aa | aaa | ...

  17. Regular languages and regular expressions • Brackets are used to show precedence of the operations (a | b )* a | b* • default precedence is: * or + or n concatenation |

  18. Activity Give examples of the following languages: • (x | y | z)3 • x | y | z* • a b2 • (a b)2

  19. Suggested Answers Give examples of the following languages: • (x | y | z)3 xzy • x | y | z* • a b2 abb • (a b)2abab

  20. From Regular Expressions to Finite State Automata • It is an amazing fact that any regular expression has an equivalent finite state automaton which recognizes it • and every finite state automaton recognizes some regular expression • we will prove these propositions later

  21. Finite State Machines transition • an FSM to add two binary numbers D 0 00 start state B 1 0 end state E A 01 0 output symbol 1 C 1 input symbol F 10

  22. Finite state automata • These are simple machines with no output symbols • they can only recognize strings of input symbols • acceptance is shown by a special state

  23. NFAs • The kind of finite state automata we shall be using are called nondeterministic finite automata • "nondeterministic" means we can do naughty things like: • have a transition without a symbol • label two exit transitions with the same symbol • not show the paths which lead to failure

  24. Example of an NFA b • what regular language does this NFA represent? a b | a b c | a+ a b c a a a

  25. Examples of conversion from REs to NFAs • (a b)2 • a b2 • (a | b)2 • (a | b)* a b b a a b b a a b b a b

  26. Activity Convert the following regular expressions to NFAs: • JavaLetter JavaLetterOrDigit* • (red amber green amber)* Convert the following NFAs to REs: a b c a b d

  27. Suggested answer • (ab)* • (ac|bd)+ javaLetter javaLetterOrDigit amber amber green red

  28. Summary • regular expressions give us a neat notation for describing regular languages • nondeterministic finite automata (NFAs) provide a diagrammatic version of regular expressions • these notations are equivalent • finite automata theory is crucial in generating lexical analyzers from regular expressions

More Related