730 likes | 762 Views
Learn about context-free languages, context-free grammars, derivations, parse trees, and ambiguity in the context of syntax analysis. Explore the concept of push-down automata and its role in recognizing context-free languages.
E N D
Cairo University FCI Compilers CS419 Lecture11: Syntax Analysis: Context Free Languages - Context Free Grammars - Derivations – Parse Trees - Ambiguity Push-Down Automata (PDA) Dr. HussienSharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University Welcome to a journey to
Today • Grammar • Definition as 4-tuple • Regular Grammars (RGs) … left-linear vs. right-linear • Context Free Grammars (CFGs) • Context Sensitive Grammars (CSGs) • Context Free Languages (CFLs) … examples • Parse Trees • Derivations … leftmost vs. rightmost • Ambiguity and Disambiguation • Grammar Simplification FCI-CU-EG
Context-Free Languages Regular Languages
Context-Free Languages Context-Free Grammars Pushdown Automata stack automaton
Pushdown Automaton -- PDA tape tape head stack head finite control stack
Pushdown Automaton -- PDA Input String Stack States Costas Busch - RPI
What is a Grammar • A grammar is a precise description of a formal language. • It describes what possible sequence of symbols/strings constitute valid words or sentences in that language • Natural Formal Languages: • Arabic, English, French, Spanish … etc • Programming Languages: • C, C++, Java, C#, HTML, XML …
What is a Grammar • A grammar G <N, Σ ,P, S> consists of the following components: • A finite set N of non-terminal symbols or variables. • A finite set Σ of terminal symbols that are disjoint from N. • A finite set P of production rules of the form (Σ U N)* N (Σ U N)*→ (Σ U N)*where * is the Kleene star operator and U denotes the set union. Each production rule maps from one string of symbols to another where the left hand side contains at least one non terminalsymbol. • A distinguished start symbol S ∈ N.
Regular languages • A language is said to be a regular language if it is generated by a regular grammar. • A grammar is said to be regular if it's either right-linear or left-linear. • Specifically, a grammar G <N, Σ ,P, S> is said to be: • right-linearif each of its production rules is either in the form A → xBor A →x, • left-linear if each of its production rules is either in the form A → Bxor A → x, • Where: • A and B are non terminal symbols in N and, • x is a string of terminal symbols in Σ*.
Example • Let A={a,b,c}, then the grammar for the A* language can be described by the following production rules: S→ S→aS S→bS S→cS • How do we know that this grammar describes the language A*? We must be able to describe each string of the language in terms of the grammar rules. • Prove that the string aacbis in A*???
Example • If A={a,b,c}, and the production rules is the set P the grammar G=<N,T,S,P> ≡ <{S,A,B}, {a,b,c}, S, P>, where P ≡ S→AB A→ |aA B→ |bB. • Let us derive the string aab: S⇒AB⇒aAB⇒aaAB⇒aaB⇒aabB⇒aab. • Note: that the language can have more than one grammar. So we should not be surprised when two people come up with two different grammars for the same language.
Combining grammars Suppose M and N are languages whose grammars have disjoint sets of non-terminals. Suppose also that the start symbols for the grammars M and N are A and B respectively. We can obtain the following new languages and grammars: Union Rule: the language M ∪ N starts with the production rule S → A | B . Product Rule: the language M ∙ N starts with the production S → A B. Closure Rule: the language M* starts with the production S →AS | .
Context-free languages • A language is said to be context-free if it is generated by a context-free grammar (CFG). • A grammar G <N, Σ, P, S> is context-free if the production rules are of the form N →(N U Σ)*. • Unlike regular grammars, the right hand sidesof the production rules in CFGs are unrestrictedand can be any combination of terminals and non terminals. • Regular languages (RLs) are subsets of context free languages (CFLs). • Things that cannot be expressed by regular grammars, but needed in Parsing of CFLs: • Palindromes. • Balanced brackets. • Counting!!
CFG • A context-free grammar is a notation for defining context free languages. • It is more powerful than finite automata or REs, but still cannot define all possible languages. • Useful for nested structures, e.g., parentheses in programming languages. • Basic idea is to use “variables” (non-terminals) to stand for sets of strings. • These variables are defined recursively, in terms of one another.
CFG • CFG is used to generate the strings belonging to CFL. • Each production has the form A → w, where A is a nonterminal and w is a string of terminals and non-terminals. • Any non-terminal can be expanded out to any of its productions at any point. • Language of a CFG: set of strings of terminals that can be derived from its start symbol • Pushdown Automata (PDA) is the automata capable of accepting languages defined by CFGs.
CFGs: Alternate Definition Many textbooks use different symbols and terms to describe CFG’s G = (V, S, P, S) V = variables a finite set S = alphabet or terminals a finite set P = productions a finite set S = start variable SV Productions’ form, where AV, a(VS)*: • A a
Definition: Context-Free Grammars Grammar Variables Terminal symbols Start variables Productions of the form: is string of variables and terminals
CSG • A context-sensitive grammar is a notation for defining context sensitive languages. • Each production has the form wAx → wyx • where w and x are strings of terminals and non-terminals and y is a string of terminals • The productions give rules saying "if you see Ain a given context, you may replace A by the string y
CFGs & CFLs: Example 1 {anbn | n0} One of our canonical non-RLs. S e | a S b Formally: G = ({S}, {a,b}, {S e, S a S b}, S)
? ? CFGs & CFLs: Example 2 {ambncm+n | m,n0} Rewrite as {ambncncm | m,n0}: S S’ | a S c S’ e | b S’ c Derivation Example: a4b3c7
CFGs & CFLs: Non-Example {anbncn | n0} It doesn’t belong to CFLs. It can’t be described by CFG. Intuition: Can count to n, then can count down from n, but forgetting n after that. • i.e., a stack as a counter. • Will see this when using a machine corresponding to CFGs.
Parsing • Parsing using CFG means categorizing the statements of a language into categories defined by the CFG. • Parsing can be expressed using a special type of graph called Trees where no cycles exist. • A parse tree is the graph representation of a derivation. • Programmatically; Parse tree can be represented as a dynamic data structure using a single root node. Dr. Hussien M. Sharaf
Parse tree • A vertex with a label which is a Non-terminal symbol is a parse tree. (2) If A → y1 y2 … yn is a rule in R, then the tree A y2 y1 . . . yn is a parse tree. Dr. Hussien M. Sharaf
CFG: S → (S) S → SS S → є Derivations • Thelanguage described by a CFG is the set of strings that can be derivedfrom the start symbol using the rules of the grammar. • At each step, we choose a non-terminal to replace. S(S) (SS) ((S)S) (( )S) (( )(S)) (( )((S))) (( )(( ))) sentential form derivation This example demonstrates a leftmost derivation : one where we always expand the leftmost non-terminal in the sentential form.
Derivations Definition: v is one-step derivable from u, written u v, if: • u = xz • v = xz • in R Definition: v is derivablefrom u, written u * v, if: There is a chain of one-step derivations of the form: u u1 u2 … v
Derivations Definition:Given a context-free grammar G = (, NT, R, S), the language generated or derived from G is the set: L(G) = {w : } S * w Definition:A language L is context-free if there is a context-free grammar G = (, NT, R, S), such that L is generated from G
Derivation • We derive strings in the language of a CFG by starting with the start symbol, and repeatedly replacing some variable A by the right side of oneof its productions. • Example: • S → aSb • S → ab • Same grammar using (or ‘|’) • S → aSb | ab
Derivation • CFG: • S → aSb • S → ab • Derivation example for “aabb” • Using S → aSb generates uncompleted string that still has a non- terminal S. • Then using S → abto replace the inner S • Generates “aabb” • S aSb aabb……[Successful derivation of aabb]
Derivation-Example : Palindrome • Describe palindrome of a’s and b’s using CFG • 1] S → aSa 2] S → bSb • 3] S → Λ • Derive “baab” from the above grammar. • S → bSb [by 2] → baSab [by 1] → baab [by 3]
CFG -Example : Even-Palindrome • i.e. {Λ, ab, abbaabba,… } • S → aSa| bSb| Λ Derive abaaba S a S a b S b a S a Λ Can you modify this grammar to accept odd-length palindromes?
CFG – Example • Describe anything (a+b)* using CFG 1] S → Λ 2] S → Y 3] Y→ aY 4] Y → bY 5] Y →a 6] Y→ b • Derive “aab” from the above grammar. • S → Y [by 1] Y → aY [by 3] Y → aaY [by 3] Y → aab [by 6]
S A B Root label = start node. A A b B Each interior label = variable. a a b Each parent/child relation = derivation step. Each leaf label = terminal or e. All leaf labels together = derived string = yield. Derivations and Parse Trees S A | A B A e | a | A b | A A B b | bc | B c | bB Sample derivations: S AB AAB aABaaBaabBaabb S AB AbBAbbAAbbAabbaabb These two derivations use same productions, but in different orders. This ordering difference is often uninteresting. Derivation trees give way to abstract away ordering differences.
Derivations and Parse Trees • We can graphically describe a derivation using a parse tree: • the root is labeled with the start symbol, S • each internal node is labeled with a non-terminal • the children of an internal node A are the right-hand side of a production A • each leaf is labeled with a terminal • A parse tree has a unique leftmost and a unique rightmost derivation (however, we cannot tell which one was used by looking at the tree)
Leftmost vs. Rightmost Derivations Definition. A left-most derivation of a sentential form is one in which rules transforming the left-most nonterminal are always applied Definition. A right-most derivation of a sentential form is one in which rules transforming the right-most nonterminal are always applied
S A B A A b B a a b Leftmost vs. Rightmost Derivations S A | A B A e | a | A b | A A B b | bc | B c | b B Sample derivations for string aabb: S AB AAB aABaaBaabBaabb S AB AbBAbbAAbbAabbaabb These two derivations are special: 1st derivation is leftmost. Always picks leftmost variable. 2nd derivation is rightmost. Always picks rightmost variable.
Leftmost derivation: Rightmost derivation: Derivation Order: String aab
Leftmost derivation: Rightmost derivation: Another Example: String abbbb
Derivation Tree yield
Partial Derivation Trees Partial derivation tree
sentential form Partial derivation tree yield