CS 208: Computing Theory

CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics

Context Free Languages Context-Free Languages

So far … • Methods for describing regular languages • Finite Automata • Deterministic • Non-deterministic • Regular Expressions • They are all equivalent, and limited • Cannot some simple languages like {0n1n | n is positive} • Now, we introduce a more powerful method for describing languages • Context-free Grammars (CFG)

Are CFGs any useful? • Extremely useful! • Artificial Intelligence • Natural language Processing • Programming Languages • specification • compilation

Example • This is a CFG which we call G1 • A0A1 • AB • B#

Example: production rules • This is a CFG which we call G1 • A0A1 • AB • B# Each line is a substitution rules or production rules

Example: variables • This is a CFG which we call G1 • A0A1 • AB • B# A and B are called variables or non-terminals

Example: variables • This is a CFG which we call G1 • A0A1 • AB • B# 0,1, and # are called terminals

Example: variables • This is a CFG which we call G1 • A0A1 • AB • B# A is the start variable

Rules • We use a CFG to describe a language by generating each string of that language • Write down the start variable • Pick a variable written down and a production rule that starts with that variable • Replace that variable with right-hand side of the production rule • Repeat until no variable remain

Derivations • This is a CFG which we call G1 • A0A1 • AB • B# • Derivations with G1 • A0A10B10#1 • A0A100A1100B1100#11 • A0A100A11000A111000B111000#111

Parse tree • Parse tree for 0#1 in G1 • A0A10B10#1 A A B 1 0 #

Parse tree Parse tree for 00#11 in G1 A0A100A1100B1100#11 A A A B 1 1 0 0 #

Context-free languages • All strings generated by a CFG constitute the language of the grammar • Example: L(G1)={0n#1n | n is positive} • Any language generated by a context-free grammar is a context-free language

A useful abbreviation • Production rules • A  0A1 • A  B • B  # • Can be written as • A  0A1 | B • B  #

Another example • CFG G2 describing a fragment of English <SENTENCE>  <NOUN-PHRASE><VERB-PHRASE> <NOUN-PHRASE> <CMPLX-NOUN>|<PREP-PHRASE> <VERB-PHRASE><CMPLX-VERB>|<CMPX-VERB><PREP-PHRASE> <PREP-PHRASE><PREP><CMPLX-NOUN> <CMPLX-NOUN><ARTICLE><NOUN> <CMPLX-VERB><VERB>|<VERB><NOUN-PHRASE> <ARTICLE> a | the <NOUN>  boy | girl | flower <VERB>  touches | likes | sees <PREP>  with

Another example • Examples of strings belonging to L(G2) a boy sees the boy sees a flower a girl with a flower likes the boy with a flower

Another example • Derivation of a boy sees <SENTENCE>  <NOUN-PHRASE><VERB-PHRASE>  <CMPLX-NOUN><VERB-PHRASE>  <ARTICLE><NOUN> <VERB-PHRASE>  a <NOUN><VERB-PHRASE>  a boy <VERB-PHRASE>  a boy <CMPLX-VERB>  a boy <VERB>  a boy sees

Formal definitions • A context-free grammar is a 4-tuple <V, ∑, R, S> where • V is a finite set of variables • ∑is a finite set of terminals • R is a finite set of rules: each rule is a variable and a finite string of variable and terminals • S is the start symbol

Formal definitions • If • u and v are strings of variable and terminals, and • A  w is a rule of the grammar, • Then uAv yields uwv, written uAv  uwv • We write u * v if • u = v or • u u1  …. uk  v

Formal definitions • The language of grammar G is • L(G) = {w | S * w}

Example • Consider G4 =<{S},{(,)},R,S> where R is • S  (S) | SS | ε • What is the language of G4? • Examples: (), (()((())), …

Example • Consider G4 =<{S},{(,)},R,S> where R is • S  (S) | SS | ε • What is the language of G4? • L(G4) is the set of strings of properly nested parenthesis

Example • Consider G4 =<{E,T,F},{a,+, x, (, )},R,E> where R is • E  E + T | T • T  T X F | F • F  (E) | a • What is the language of G4? • Examples: a+a+a, (a+a) x a

Example • Consider G4 =<{E,T,F},{a,+, x, (, )},R,E> where R is • E  E + T | T • T  T x F | F • F  (E) | a • What is the language of G4? • E stands for expression, T for Term, and F for Factor: so this grammar describes some arithmetic expressions

Ambiguity • Sometimes a grammar can generate the same string in several different ways! • This string will have several parse trees • This is a very serious problem • Think if a C program can have multiple interpretations? • If a language has this problem, we say that it is ambiguous

Example • Consider G5: <EXPR><EXPR>+<EXPR>|<EXPR>x<EXPR> |(<EXPR>) | a G5 is ambiguous because a+axa has two parse tress!

Example • Consider G5: <EXPR><EXPR>+<EXPR>|<EXPR>x<EXPR> |(<EXPR>) | a G5 is ambiguous because a+axa has two parse tress! <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> a + a x a

Example • Consider G5: <EXPR><EXPR>+<EXPR>|<EXPR>x<EXPR> |(<EXPR>) | a G5 is ambiguous because a+axa has two parse tress! <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> <EXPR> a + a x a a + a x a

Formal definition: ambiguity • A string w is generated ambiguously in CFG G if it has two or more different leftmost derivations! • A derivation is leftmost if at every step the variable being replaced is the leftmost one • Grammar G is ambiguous if it generates some string ambiguously

Chomsky Normal Form (CNF) • Every rule has the form • A  BC • A  a • S  ε • Where S is the start symbol, A, B, and C are any variables – except that B and C may not be the start symbol

Theorem • Theorem: Any context-free language is generated by a context-free grammar in Chomsky normal form • How? • Add new start symbol S0 • Eliminate all rules of the form A  ε • Eliminate all “unit” rules of the form A  B • Patch up rules so that grammar denotes the same language • Convert remaining rules to proper form

Steps to convert any grammar into CNF • Step1 • Add a new start symbol S0 • Add the rule S0S

Steps to convert any grammar into CNF • Step2: Repeat • Remove some rule of the form A  ε where A is not the start symbol • Then, for each occurrence of A on the right-hand side of a rule, we add a new rule with that occurrence deleted • E.g., if R uAvAu where u and v are strings of variables and terminals • We add rules: R uvAu, RuAvu, and Ruvu • For RA add Rε, except if Rε has already been removed • Until all ε-rules not involving the start symbol have been removed

Steps to convert any grammar into CNF • Step3: eliminate unit rules • Repeat • Remove some rule of the form A  B • For each Bu, add Au, except if Au has already been removed • Until all unit rules have been removed

Steps to convert any grammar into CNF • Step4: convert remaining rules • Replace each rule A u1u2…uk, where k >2 and each ui is a terminal or a variable with the rules • Au1A1 • A1u2A2 • A2u3A3 • …. • Ak-2uk-1uk • If k=2, we replace any terminal ui in the preceding rules with the new variable Ui and add the rule Uiui

Example • Start with • S  ASA | aB • A  B | S • B  b | ε

Example • Step 1: add new start symbol and new rule • S0  S • S  ASA | aB • A  B | S • B  b | ε

Example • Step 2: remove ε-rule B ε • S0  S • S  ASA | aB | a • A  B | S | ε • B  b

Example • Step 2: remove ε-rule A ε • S0  S • S  ASA | aB | a | SA | AS | S • A  B | S • B  b

Example • Step 3: remove unit rule S S • S0  S • S  ASA | aB | a | SA | AS | S • A  B | S • B  b

Example • Step 3: remove unit rule S0 S • S0  S | ASA | aB | a | SA | AS • S ASA | aB | a | SA | AS • A  B | S • B  b

Example • Step 3: remove unit rule A B • S0  ASA | aB | a | SA | AS • S ASA | aB | a | SA | AS • A  B | S | b • B  b

Example • Step 3: remove unit rule A S • S0  ASA | aB | a | SA | AS • S ASA | aB | a | SA | AS • A  S | b | ASA | aB | a | SA | AS • B  b

Example • Step 3: remove unit rule A S • S0  ASA | aB | a | SA | AS • S ASA | aB | a | SA | AS • A  b | ASA | aB | a | SA | AS • B  b

Example • Step 4: convert remaining rules • S0  AA1|UB| a| SA | AS • S AA1|UB | a | SA | AS • A  b | AA1 | UB | a | SA | AS • B  b • Ua • A1SA

Pushdown Automata

Pushdown automata • Pushdown automat (PDA) are like nondeterministic finite automat but have an extra component called a stack • Can push symbols onto the stack • Can pop them (read them back) later • Stack is potentially unbounded

input State control a a b a x y z stack

Formal Definition • A pushdown automaton is a 6-tuple (Q,∑,S, ξ,q0,F), where • Q is a finite set of states • ∑ is a finite set of symbols called the alphabet • S is the stack alphabet • ξ : Q x ∑ε x Sε P(Q x Sε) is the transition function • q0 Є Q is the start state • F ⊆ Q is the set of accept states or final states

CS 208: Computing Theory