LL Parsing

LL Parsing PowerPoint PPT Presentation


  • 309 Views
  • Uploaded on
  • Presentation posted in: General

2. Topics. Research goalsBackgroundProblem definitionSolution overviewWhat is LL(*)?How much more powerful is it?LimitationsNondeterminism detectionLL(*) AlgorithmGenerated code. 3. Research goals. Make top-down LL-based parsers as powerful as possibleallows more natural grammarslanguage tools more accessibleMy research constrained by what programmers can/will userecursive-descent parsers must be the basek>1 fixed lookaheadsemantic predicatessyntactic predicates; controlled backt9451

Download Presentation

LL Parsing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


1. 1 LL(*) Parsing Terence Parr University of San Francisco

2. 2

3. 3 Research goals Make top-down LL-based parsers as powerful as possible allows more natural grammars language tools more accessible My research constrained by what programmers can/will use recursive-descent parsers must be the base k>1 fixed lookahead semantic predicates syntactic predicates; controlled backtracking and means of specifying ambiguity resolution And for my next trickÖ LL(*)

4. 4 Background: parsers Building a parser generator is easy except for the lookahead analysis: rule ref ? ďrule()Ē token ref ? ďmatch(t)Ē rule def ? void rule() { if ( lookahead-expr-alt 1 ) { match alt 1; } else if ( lookahead-expr-alt 2 ) { match alt 2; } else error; } The nature of the lookahead expressions dictates the strength of your parser generator

5. 5 LL(2) parser example

6. 6 Lookahead as DFA

7. 7 Linear approximate lookahead Note that LA(1) doesnít help distinguish Often itís the depth not sequence of tokens that matters Reduces O(|T|^k) to O(|T| x k) space for lookahead sequences Collapse all tokens at depth d<=k into k sets Only slightly weaker than LL(k)

8. 8 Problem: what canít LL(k) do? Canít see past arbitrarily long constructs from left edge For example, canít see past A+ here: Could left-factor, but not always possible and itís unnatural!

9. 9 Solution overview Natural extension to LL(k) lookahead DFA: Allow cyclic DFA that can skip ahead past the As to X or Y Donít approximate entire CFG with regex; i.e., donít include R or S Just predict and proceed normally with LL parse DFA yields the predicted alt number Grammar actions are not sucked into DFAs and arenít executed during prediction

10. 10 LL(*) code Arbitrary cyclic graphs canít be encoded w/o gotos in Java, but here a simple while is ok

11. 11 Isnít that just backtracking? No. For example, if I can guarantee you will never lookahead more than 10 symbols, it's just LL(10), right? Not backtracking with the parser! DFA is smaller and faster; e.g., DFA predicting expr does not follow deep call chain; parser does Donít have to avoid or unroll actions in grammar! The DFAs are efficiently coded and automatically throttle down when less lookahead is needed

12. 12 Do we need LL(*) in practice? Natural grammars sometimes not LL(k); e.g. C function decl vs def: From the left edge, lookahead is not fixed to see the Ď;í vs Ď{Ď. We need arbitrary lookahead because of the arg* If you have actions at ID, canít easily refactor Lookahead will be 5=k=10 usually for this decision

13. 13 Can we classify LL(*) strength? Obviously stronger than LL(k) for fixed k Weaker than syntactic predicates + LL(k), but itís automatic and faster ANTLR v3 will have LL(*) + syntactic predicates :) What about LL(k)ís traditional foe LR(k) and itís nefarious minion LALR(1) (yacc)? No strict ordering! (see next slide) Weaker than GLR or any other system that handles all context-free grammars

14. 14 LL(*) vs LR(k) LR(k) even with k=1 is generally more powerful than LL(*) or at least more efficient for same grammar, but there is no strict ordering; add epsilon rule refs to left edge of our grammar and itís not LR(k) for fixed k; derived from adding actions

15. 15 LL(*) Strength Limitations Limited to regular approximation Creating regular covering approximation to lookahead language of context-free grammar fragment Canít distinguish between context-free fragments Canít see past recursive structures Still deterministic; canít deal with ambiguous grammars; must pick one interpretation

16. 16 Canít see past recursion LL(*) DFA construction takes LL stack into consideration, but resulting DFA will not have stack; uses sequence instead Example weakness: (same language diff grammar)

17. 17 LL(*) Static Analysis Problems Sometimes LL(*) creates giant DFA looking for more lookahead to distinguish alternatives most often due to true ambiguity wonít ever succeed, but it keeps trying w/o throttle would be hideous in time/space Workarounds can manually set fixed k lookahead refactor grammar if ambiguous or to reduce lookahead requirements Algorithm O(Ö) constant is critical; got java.g processing to drop from 20 minutes to 10s

18. 18 LL(*) Analysis Benefits LL(*) analysis and resulting prediction DFAs are paradoxically simpler sometimes LL(k) must compute all possible sequences with fixed k length using acyclic DFA LL(3) lookahead of (A|B)* is {AAA,AAB,ABA,ABB,BAA,BAB,BBA,BBB} LL(*) lookahead of (A|B)* is simply:

19. 19 LL(*) Algorithm Outline build RTN-like NFA from grammar (similar to LR machine construction actually) modified classical NFA-to-DFA conversion (ďsubset construction algorithmĒ) DFA state encodes configurations NFA could be in after having seen input sequence including call invocation stack NFA configuration (s|alt|context) tracks state, predicted alt, and rule invocation stack to get to that state terminate algorithm when state uniquely predicts an alternative or nondeterminism found: (s|i|ctx) and (s|j|ctx) for same state s but different alts i,j and same/similar context verify DFA is reduced and all alternatives have predict state

20. 20 Example difference from classical conversion

21. 21 Generated Code acyclic DFA generated inline as above cyclic DFA dumped as state objects and walked at parse-time with int predict(IntStream input, State start)

22. 22 Summary and Conclusions LL(*) + syntactic predicates is the most powerful parsing strategy accessible and attractive to average programmer LL(*) has all benefits of LL but is much stronger; results in natural grammars Doesn't alter recursive descent parser itself at all; just enhances the predictive capabilities. Basic algorithm is not that complicated, but making it real and useful is ďinterestingĒ; it has taken 2.5 years to fully understand Pre-release: http://www.antlr.org/download/

  • Login