LL Parsing

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

**1. **1 LL(*) Parsing Terence Parr
University of San Francisco

**2. **2

**3. **3 Research goals Make top-down LL-based parsers as powerful as possible
allows more natural grammars
language tools more accessible
My research constrained by what programmers can/will use
recursive-descent parsers must be the base
k>1 fixed lookahead
semantic predicates
syntactic predicates; controlled backtracking and means of specifying ambiguity resolution
And for my next trick… LL(*)

**4. **4 Background: parsers Building a parser generator is easy except for the lookahead analysis:
rule ref ? “rule()”
token ref ? “match(t)”
rule def ? void rule() { if ( lookahead-expr-alt 1 ) { match alt 1; } else if ( lookahead-expr-alt 2 ) { match alt 2; } else error; }
The nature of the lookahead expressions dictates the strength of your parser generator

**5. **5 LL(2) parser example

**6. **6 Lookahead as DFA

**7. **7 Linear approximate lookahead Note that LA(1) doesn’t help distinguish
Often it’s the depth not sequence of tokens that matters
Reduces O(|T|^k) to O(|T| x k) space for lookahead sequences
Collapse all tokens at depth d<=k into k sets
Only slightly weaker than LL(k)

**8. **8 Problem: what can’t LL(k) do? Can’t see past arbitrarily long constructs from left edge
For example, can’t see past A+ here:
Could left-factor, but not always possible and it’s unnatural!

**9. **9 Solution overview Natural extension to LL(k) lookahead DFA: Allow cyclic DFA that can skip ahead past the As to X or Y
Don’t approximate entire CFG with regex; i.e., don’t include R or S
Just predict and proceed normally with LL parse
DFA yields the predicted alt number
Grammar actions are not sucked into DFAs and aren’t executed during prediction

**10. **10 LL(*) code Arbitrary cyclic graphs can’t be encoded w/o gotos in Java, but here a simple while is ok

**11. **11 Isn’t that just backtracking? No. For example, if I can guarantee you will never lookahead more than 10 symbols, it's just LL(10), right?
Not backtracking with the parser! DFA is smaller and faster; e.g., DFA predicting expr does not follow deep call chain; parser does
Don’t have to avoid or unroll actions in grammar!
The DFAs are efficiently coded and automatically throttle down when less lookahead is needed

**12. **12 Do we need LL(*) in practice? Natural grammars sometimes not LL(k); e.g. C function decl vs def:
From the left edge, lookahead is not fixed to see the ‘;’ vs ‘{‘. We need arbitrary lookahead because of the arg*
If you have actions at ID, can’t easily refactor
Lookahead will be 5=k=10 usually for this decision

**13. **13 Can we classify LL(*) strength? Obviously stronger than LL(k) for fixed k
Weaker than syntactic predicates + LL(k), but it’s automatic and faster
ANTLR v3 will have LL(*) + syntactic predicates :)
What about LL(k)’s traditional foe LR(k) and it’s nefarious minion LALR(1) (yacc)?
No strict ordering! (see next slide)
Weaker than GLR or any other system that handles all context-free grammars

**14. **14 LL(*) vs LR(k) LR(k) even with k=1 is generally more powerful than LL(*) or at least more efficient for same grammar, but there is no strict ordering; add epsilon rule refs to left edge of our grammar and it’s not LR(k) for fixed k; derived from adding actions

**15. **15 LL(*) Strength Limitations Limited to regular approximation
Creating regular covering approximation to lookahead language of context-free grammar fragment
Can’t distinguish between context-free fragments
Can’t see past recursive structures
Still deterministic; can’t deal with ambiguous grammars; must pick one interpretation

**16. **16 Can’t see past recursion LL(*) DFA construction takes LL stack into consideration, but resulting DFA will not have stack; uses sequence instead
Example weakness: (same language diff grammar)

**17. **17 LL(*) Static Analysis Problems Sometimes LL(*) creates giant DFA looking for more lookahead to distinguish alternatives
most often due to true ambiguity
won’t ever succeed, but it keeps trying
w/o throttle would be hideous in time/space
Workarounds
can manually set fixed k lookahead
refactor grammar if ambiguous or to reduce lookahead requirements
Algorithm O(…) constant is critical; got java.g processing to drop from 20 minutes to 10s

**18. **18 LL(*) Analysis Benefits LL(*) analysis and resulting prediction DFAs are paradoxically simpler sometimes
LL(k) must compute all possible sequences with fixed k length using acyclic DFA
LL(3) lookahead of (A|B)* is {AAA,AAB,ABA,ABB,BAA,BAB,BBA,BBB}
LL(*) lookahead of (A|B)* is simply:

**19. **19 LL(*) Algorithm Outline build RTN-like NFA from grammar (similar to LR machine construction actually)
modified classical NFA-to-DFA conversion (“subset construction algorithm”)
DFA state encodes configurations NFA could be in after having seen input sequence including call invocation stack
NFA configuration (s|alt|context) tracks state, predicted alt, and rule invocation stack to get to that state
terminate algorithm when state uniquely predicts an alternative or nondeterminism found: (s|i|ctx) and (s|j|ctx) for same state s but different alts i,j and same/similar context
verify DFA is reduced and all alternatives have predict state

**20. **20 Example difference from classical conversion

**21. **21 Generated Code acyclic DFA generated inline as above
cyclic DFA dumped as state objects and walked at parse-time with int predict(IntStream input, State start)

**22. **22 Summary and Conclusions LL(*) + syntactic predicates is the most powerful parsing strategy accessible and attractive to average programmer
LL(*) has all benefits of LL but is much stronger; results in natural grammars
Doesn't alter recursive descent parser itself at all; just enhances the predictive capabilities.
Basic algorithm is not that complicated, but making it real and useful is “interesting”; it has taken 2.5 years to fully understand
Pre-release: http://www.antlr.org/download/