LR-Grammars

1 / 40

LR-Grammars - PowerPoint PPT Presentation

LR-Grammars. LR(0), LR(1), and LR(K). Deterministic Context-Free Languages. DCFL A family of languages that are accepted by a Deterministic Pushdown Automaton (DPDA) Many programming languages can be described by means of DCFLs. Prefix and Proper Prefix. Prefix (of a string)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'LR-Grammars' - kerry

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

LR-Grammars

LR(0), LR(1), and LR(K)

Deterministic Context-Free Languages
• DCFL
• A family of languages that are accepted by a Deterministic Pushdown Automaton (DPDA)
• Many programming languages can be described by means of DCFLs
Prefix and Proper Prefix
• Prefix (of a string)
• Any number of leading symbols of that string
• Example: abc
• Prefixes: , a, ab, abc
• Proper Prefix (of a string)
• A prefix of a string, but not the string itself
• Example: abc
• Proper prefixes: , a, ab
Prefix Property
• Context-Free Language (CFL) L is said to have the prefix property whenever w is in L and no proper prefix of w is in L
• Not considered a serve restriction
• Why?
• Because we can easily convert a DCFL to a DCFL with the prefix property by introducing an endmarker
Suffix and Proper Suffix
• Suffix (of a string)
• Any number of trailing symbols
• Proper Suffix
• A suffix of a string, but not the string itself
Example Grammar
• This is the grammar that will be used in many of the examples:
• S’  Sc
• S  SA | A
• A  aSb | ab
LR-Grammar
• Left-to-right scan of the input producing a rightmost derivation
• Simply:
• L stands for Left-to-right
• R stands for rightmost derivation
LR-Items
• An item (for a given CFG)
• A production with a dot anywhere in the right side (including the beginning and end)
• In the event of an -production: B  
• B · is an item
Example: Items
• Given our example grammar:
• S’  Sc, S  SA|A, A  aSb|ab
• The items for the grammar are:

S’·Sc, S’S·c, S’Sc·

S·SA, SS·A, SSA·, S·A, SA·

A·aSb, Aa·Sb, AaS·b, AaSb·, A·ab, Aa·b, Aab·

Some Notation
• * = 1 or more steps in a derivation
• *rm = rightmost derivation
• rm = single step in rightmost derivation
Right-Sentential Form
• A sentential form that can be derived by a rightmost derivation
• A string of terminals and variables  is called a sentential form if S* 
More terms
• Handle
• A substring which matches the right-hand side of a production and represents 1 step in the derivation
• Or more formally:
• (of a right-sentential form  for CFG G)
• Is a substring  such that:
• S *rm w
• w = 
• If the grammar is unambiguous:
• There are no useless symbols
• The rightmost derivation (in right-sentential form) and the handle are unique
Example
• Given our example grammar:
• S’  Sc, S  SA|A, A  aSb|ab
• An example right-most derivation:
• S’  Sc  SAc  SaSbc
• Therefore we can say that: SaSbc is in right-sentential form
• The handle is aSb
More terms
• Viable Prefix
• (of a right-sentential form for )
• Is any prefix of  ending no farther right than the right end of a handle of .
• Complete item
• An item where the dot is the rightmost symbol
Example
• Given our example grammar:
• S’  Sc, S  SA|A, A  aSb|ab
• The right-sentential form abc:
• S’ *rm Ac  abc
• Valid prefixes:
• A  ab for prefix ab
• A  ab for prefix a
• A  ab for prefix 
• Aab is a complete item,  Ac is the right-sentential form for abc
LR(0)
• Left-to-right scan of the input producing a rightmost derivation with a look-ahead (on the input) of 0 symbols
• It is a restricted type of CFG
• 1st in the family of LR-grammars
• LR(0) grammars define exactly the DCFLs having the prefix property
Computing Sets of Valid Items
• The definition of LR(0) and the method of accepting L(G) for LR(0) grammar G by a DPDA depends on:
• Knowing the set of valid items for each prefix 
• For every CFG G, the set of viable prefixes is a regular set
• This regular set is accepted by an NFA whose states are the items for G
Continued
• Given an NFA (whose states are the items for G) that accepts the regular set
• We can apply the subset construction to this NFA and yield a DFA
• The DFA whose state is the set of valid items for 
NFA M
• NFA M recognizes the viable prefixes for CFG
• M = (Q, V  T, , q0, Q)
• Q = set of items for G plus state q0
• G = (V, T, P, S)
• Three Rules
• (q0,) = {S| S is a production}
• (AB,) = {B| B is a production}
• Allows expansion of a variable B appearing immediately to the right of the dot
• (AX, X) = {AX}
• Permits moving the dot over any grammar symbol X if X is the next input symbol
Theorem 10.9
• The NFA M has property that (q0, ) contains A iff A is valid for 
• This theorem gives a method for computing the sets of valid items for any viable prefix
• Note: It is an NFA. It can be converted to a DFA. Then by inspecting each state it can be determine if it is a valid LR(0) grammar
Definition of LR(0) Grammar
• G is an LR(0) grammar if
• The start symbol does not appear on the right side of any productions
•  prefixes  of G where A is a complete item, then it is unique
• i.e., there are no other complete items (and there are no items with a terminal to the right of the dot) that are valid for 
Facts we now know:
• Every LR(0) grammar generates a DCFL
• Every DCFL with the prefix property has a LR(0) grammar
• Every language with LR(0) grammar have the prefix property
• L is DCFL iff L has a LR(0) grammar
DPDA’s from LR(0) Grammars
• We trace out the rightmost derivation in reverse
• The stack holds a viable prefix (in right-sentential form) and the current state (of the DFA)
• Viable prefixes: X1X2…Xk
• States: s1, s2,…,sk
• Stack: s0X1s1…Xksk
Reduction
• If sk contains A
• Then A is valid for X1X2…Xk
•  = suffix of X1X2…Xk
• Let
•  = Xi+1…Xk
• w such that X1…Xkw is a right-sentential form.
Reduction Continued
• There is a derivation:
• S *rm X1…XiAw rm X1…Xkw
• To obtain the right-sentential form (X1…Xkw) in a right derivation we reduce  to A
• Therefore, we pop Xi+1…Xk from the stack and push A onto the stack
Shift
• If sk contains only incomplete items
• Then the right-sentential form (X1…Xkw) cannot be formed using a reduction
• Instead we simply “shift” the next input symbol onto the stack
Theorem 10.10
• If L is L(G) for an LR(0) grammar G, then L is N(M) for a DPDA M
• N(M) = the language accepted by empty stack or null stack
Proof
• Construct from G the DFA D
• Transition function: recognizes G’s prefixes
• Stack Symbols of M are
• Grammar Symbols of G
• States of D
• M has start state q and other states used to perform reduction
We know that:
• If G is LR(0) then
• Reductions are the only way to get the right-sentential form when the state of the DFA (on the top of the stack) contains a complete item
• When M starts on input w it will construct a right-most derivation for w in reverse order
What we need to prove:
• When a shift is called for and the top DFA state on the stack has only incomplete items then there are no handles
• (Note: if there was a handle, then some DFA state on the stack would have a complete item)
Suppose  state A (complete item)
• Each state is put onto the top of the stack
• It would then immediately be reduced to A
• Therefore, a complete item cannot possibly become buried on the stack
Proof continued
• The acceptance of G occurs when the top of the stack contains the start symbol
• The start symbol by definition of LR(0) grammars cannot appear on the right side of a production
• L(G) always has a prefix property if G is LR(0)
Conclusion of Proof
• Thus, if w is in L(G), M finds the rightmost derivation of w, reduces w to S, and accepts
• If M accepts w, then the sequence of right-sentential forms provides a derivation of w from S
• N(M) = L(G)
Corollary of Theorem 10.10
• Every LR(0) grammar is unambiguous
• Why?
• The rightmost derivation of w is unique
• (Given the construction we provided)
LR(1) Grammars
• LR grammar with 1 look-ahead
• All and only deterministic CFL’s have LR(1) grammars
• Are greatly important to compiler design
• Why?
• Because they are broad enough to include the syntax of almost all programming languages
• Restrictive enough to have efficient parsers (that are essentially DPDAs)
LR(1) Item
• Consists of an LR(0) item followed by a look-ahead set consisting of terminals and/or the special symbol \$
• \$ = the right end of the string
• General Form:
• A  , {a1, a2, …, an}
• The set of LR(1) items forms the states of a viable prefix by converting the NFA to a DFA
A grammar is LR(1) if
• The start symbol does not appear on the right side of any productions
• The set of items, I, valid for some viable prefix includes some complete item A, {a1,…,an} then
• No ai appears immediately to the right of the dot in any item of I
• If B, {b1,…,bk} is another complete item in I, then ai  bj for any 1  i  n and 1  j  k
Accepting LR(1) language:
• Similar to the DPDA used with LR(0) grammars
• However, it is allowed to use the next input symbol during it’s decision making
• This is accomplished by appending a \$ to the end of the input and the DPDA keeps the next input symbol as part of the state
LR(1) Rules for Reduce/Shift
• If the top set of items has a complete item A, {a1, a2, …, an}, where A  S, reduce by A if the current input symbol is in {a1, a2, …, an}
• If the top set of items has an item S, {\$}, then reduce by S and accept if the current symbol is \$ (i.e., the end of the input is reached)
• If the top set of items has an item AaB, T, and a is the current input symbol, then shift
Regarding the Rules
• Guarantees that at most one of the rules will be applied for any input symbol or \$
• Often for practicality the information is summarized into a table
• Rows: sets of items
• Columns: terminals and \$