- 76 Views
- Uploaded on
- Presentation posted in: General

CS 208: Computing Theory

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

CS 208: Computing Theory

Assoc. Prof. Dr. Brahim Hnich

Faculty of Computer Sciences

Izmir University of Economics

Context Free Languages

Context-Free Languages

- Methods for describing regular languages
- Finite Automata
- Deterministic
- Non-deterministic

- Regular Expressions
- They are all equivalent, and limited
- Cannot some simple languages like {0n1n | n is positive}

- Finite Automata
- Now, we introduce a more powerful method for describing languages
- Context-free Grammars (CFG)

- Extremely useful!
- Artificial Intelligence
- Natural language Processing

- Programming Languages
- specification
- compilation

- Artificial Intelligence

- This is a CFG which we call G1
- A0A1
- AB
- B#

- This is a CFG which we call G1
- A0A1
- AB
- B#

Each line is a substitution rules or production rules

- This is a CFG which we call G1
- A0A1
- AB
- B#

A and B are called variables or non-terminals

- This is a CFG which we call G1
- A0A1
- AB
- B#

0,1, and # are called terminals

- This is a CFG which we call G1
- A0A1
- AB
- B#

A is the start variable

- We use a CFG to describe a language by generating each string of that language
- Write down the start variable
- Pick a variable written down and a production rule that starts with that variable
- Replace that variable with right-hand side of the production rule
- Repeat until no variable remain

- This is a CFG which we call G1
- A0A1
- AB
- B#

- Derivations with G1
- A0A10B10#1
- A0A100A1100B1100#11
- A0A100A11000A111000B111000#111

- Parse tree for 0#1 in G1
- A0A10B10#1

A

A

B

1

0

#

Parse tree for 00#11 in G1 A0A100A1100B1100#11

A

A

A

B

1

1

0

0

#

- All strings generated by a CFG constitute the language of the grammar
- Example: L(G1)={0n#1n | n is positive}

- Any language generated by a context-free grammar is a context-free language

- Production rules
- A 0A1
- A B
- B #

- Can be written as
- A 0A1 | B
- B #

- CFG G2 describing a fragment of English
<SENTENCE> <NOUN-PHRASE><VERB-PHRASE>

<NOUN-PHRASE> <CMPLX-NOUN>|<PREP-PHRASE>

<VERB-PHRASE><CMPLX-VERB>|<CMPX-VERB><PREP-PHRASE>

<PREP-PHRASE><PREP><CMPLX-NOUN>

<CMPLX-NOUN><ARTICLE><NOUN>

<CMPLX-VERB><VERB>|<VERB><NOUN-PHRASE>

<ARTICLE> a | the

<NOUN> boy | girl | flower

<VERB> touches | likes | sees

<PREP> with

- Examples of strings belonging to L(G2)
a boy sees

the boy sees a flower

a girl with a flower likes the boy with a flower

- Derivation of a boy sees
<SENTENCE>

<NOUN-PHRASE><VERB-PHRASE>

<CMPLX-NOUN><VERB-PHRASE>

<ARTICLE><NOUN> <VERB-PHRASE>

a <NOUN><VERB-PHRASE>

a boy <VERB-PHRASE>

a boy <CMPLX-VERB>

a boy <VERB>

a boy sees

- A context-free grammar is a 4-tuple <V, ∑, R, S> where
- V is a finite set of variables
- ∑is a finite set of terminals
- R is a finite set of rules: each rule is a variable and a finite string of variable and terminals
- S is the start symbol

- If
- u and v are strings of variable and terminals, and
- A w is a rule of the grammar,
- Then uAv yields uwv, written uAv uwv

- We write u * v if
- u = v or
- u u1 …. uk v

- The language of grammar G is
- L(G) = {w | S * w}

- Consider G4 =<{S},{(,)},R,S> where R is
- S (S) | SS | ε
- What is the language of G4?
- Examples: (), (()((())), …

- Consider G4 =<{S},{(,)},R,S> where R is
- S (S) | SS | ε
- What is the language of G4?
- L(G4) is the set of strings of properly nested parenthesis

- Consider G4 =<{E,T,F},{a,+, x, (, )},R,E> where R is
- E E + T | T
- T T X F | F
- F (E) | a
- What is the language of G4?
- Examples: a+a+a, (a+a) x a

- Consider G4 =<{E,T,F},{a,+, x, (, )},R,E> where R is
- E E + T | T
- T T x F | F
- F (E) | a
- What is the language of G4?
- E stands for expression, T for Term, and F for Factor: so this grammar describes some arithmetic expressions

- Sometimes a grammar can generate the same string in several different ways!
- This string will have several parse trees

- This is a very serious problem
- Think if a C program can have multiple interpretations?

- If a language has this problem, we say that it is ambiguous

- Consider G5:
<EXPR><EXPR>+<EXPR>|<EXPR>x<EXPR>

|(<EXPR>) | a

G5 is ambiguous because a+axa has two parse tress!

- Consider G5:
<EXPR><EXPR>+<EXPR>|<EXPR>x<EXPR>

|(<EXPR>) | a

G5 is ambiguous because a+axa has two parse tress!

<EXPR>

<EXPR>

<EXPR>

<EXPR>

<EXPR>

a+ a xa

- Consider G5:
<EXPR><EXPR>+<EXPR>|<EXPR>x<EXPR>

|(<EXPR>) | a

G5 is ambiguous because a+axa has two parse tress!

<EXPR>

<EXPR>

<EXPR>

<EXPR>

<EXPR>

<EXPR>

<EXPR>

<EXPR>

<EXPR>

<EXPR>

a+ a xa

a+ a xa

- A string w is generated ambiguously in CFG G if it has two or more different leftmost derivations!
- A derivation is leftmost if at every step the variable being replaced is the leftmost one

- Grammar G is ambiguous if it generates some string ambiguously

- Every rule has the form
- A BC
- A a
- S ε

- Where S is the start symbol, A, B, and C are any variables – except that B and C may not be the start symbol

- Theorem: Any context-free language is generated by a context-free grammar in Chomsky normal form
- How?
- Add new start symbol S0
- Eliminate all rules of the form A ε
- Eliminate all “unit” rules of the form A B
- Patch up rules so that grammar denotes the same language
- Convert remaining rules to proper form

- Step1
- Add a new start symbol S0
- Add the rule S0S

- Step2: Repeat
- Remove some rule of the form A ε where A is not the start symbol
- Then, for each occurrence of A on the right-hand side of a rule, we add a new rule with that occurrence deleted
- E.g., if R uAvAu where u and v are strings of variables and terminals
- We add rules: R uvAu, RuAvu, and Ruvu

- For RA add Rε, except if Rε has already been removed

- Until all ε-rules not involving the start symbol have been removed

- Step3: eliminate unit rules
- Repeat
- Remove some rule of the form A B
- For each Bu, add Au, except if Au has already been removed

- Until all unit rules have been removed

- Step4: convert remaining rules
- Replace each rule A u1u2…uk, where k >2 and each ui is a terminal or a variable with the rules
- Au1A1
- A1u2A2
- A2u3A3
- ….
- Ak-2uk-1uk

- If k=2, we replace any terminal ui in the preceding rules with the new variable Ui and add the rule Uiui

- Start with
- S ASA | aB
- A B | S
- B b | ε

- Step 1: add new start symbol and new rule
- S0 S
- S ASA | aB
- A B | S
- B b | ε

- Step 2: remove ε-rule B ε
- S0 S
- S ASA | aB | a
- A B | S | ε
- B b

- Step 2: remove ε-rule A ε
- S0 S
- S ASA | aB | a | SA | AS | S
- A B | S
- B b

- Step 3: remove unit rule S S
- S0 S
- S ASA | aB | a | SA | AS | S
- A B | S
- B b

- Step 3: remove unit rule S0 S
- S0 S | ASA | aB | a | SA | AS
- S ASA | aB | a | SA | AS
- A B | S
- B b

- Step 3: remove unit rule A B
- S0 ASA | aB | a | SA | AS
- S ASA | aB | a | SA | AS
- A B | S | b
- B b

- Step 3: remove unit rule A S
- S0 ASA | aB | a | SA | AS
- S ASA | aB | a | SA | AS
- A S | b | ASA | aB | a | SA | AS
- B b

- Step 3: remove unit rule A S
- S0 ASA | aB | a | SA | AS
- S ASA | aB | a | SA | AS
- A b | ASA | aB | a | SA | AS
- B b

- Step 4: convert remaining rules
- S0 AA1|UB| a| SA | AS
- S AA1|UB | a | SA | AS
- A b | AA1 | UB | a | SA | AS
- B b
- Ua
- A1SA

Pushdown Automata

- Pushdown automat (PDA) are like nondeterministic finite automat but have an extra component called a stack
- Can push symbols onto the stack
- Can pop them (read them back) later
- Stack is potentially unbounded

input

State

control

a

a

b

a

x

y

z

stack

- A pushdown automaton is a 6-tuple (Q,∑,S, ξ,q0,F), where
- Q is a finite set of states
- ∑ is a finite set of symbols called the alphabet
- S is the stack alphabet
- ξ : Q x ∑ε x Sε P(Q x Sε) is the transition function
- q0 Є Q is the start state
- F ⊆ Q is the set of accept states or final states

- Question: when is the stack empty?
- Start by pushing a $ onto the stack
- If you see it again, stack is empty

- Question: when is input string empty
- Doesn’t matter
- Accepting states accept only if inputs exhausted

- Transition a,bc means
- Read a from the input
- Pop b from stack
- Push c onto stack

- Meaning of ε transition
- If a = ε , don’t read input
- If b= ε , don’t pop any symbol
- If c= ε , don’t push any symbols

- Recall 0n1n which is not regular
- Consider the following PDA
- Read input symbols
- For each 0, push it on the stack
- As soon as a 1 is seen, pop a 0 for each 1 read
- Accept if stack is empty when last symbol read
- Reject if stack non-empty, or if input symbol exist, or if 0 read after a 1, etc…

{0n1n| n is positive}

0, ε0

ε,ε$

1,0 ε

1,0 ε

ε,$ ε

{aibjck| i=j or i=k}

c,ε ε

b, a ε

ε,$ ε

ε,ε ε

ε,ε$

ε,ε ε

ε, $ ε

ε,ε ε

a, ε a

b, ε ε

c, a ε

- Theorem: A language is context-free if and only some pushdown automaton accepts it
- Proof: we will skip it! (Those interested may read the book)
- Corollary: Every regular language is a context-free language

Context-free

languages

Regular

languages

Context-free grammars

definition

ambiguity

Chomsky normal form

Pushdown automata

definition

Next: Part C;

Computability Theory