1 / 60

Analyzing Ambiguity of Context-Free Grammars

Analyzing Ambiguity of Context-Free Grammars. Claus Brabrand brabrand(at)itu.dk IT Uni. of Copenhagen. Robert Giegerich robert(at)TechFak.Uni-Bielefeld.de University of Bielefeld, Germany. Anders Møller amoeller(at)brics.dk DAIMI, University of Aarhus. Outline.

jbrad
Download Presentation

Analyzing Ambiguity of Context-Free Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analyzing Ambiguity of Context-Free Grammars Claus Brabrand brabrand(at)itu.dk IT Uni. of Copenhagen Robert Giegerich robert(at)TechFak.Uni-Bielefeld.de University of Bielefeld, Germany Anders Møller amoeller(at)brics.dk DAIMI, University of Aarhus

  2. Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion

  3. Motivation (for CFG Ambiguity) 1 Programming Languages what the programmer intended STM : EXP ";" | "if" "(" EXP ")" STM | "if" "(" EXP ")" STM "else" STM | "while" "(" EXP ")" "do" STM EXP : EXP "*" TERM | EXP "/" TERM | TERM TERM : TERM "+" FACT | TERM "-" FACT | FACT FACT : CONST | VAR Unambiguous P    int f() { if (b) if (c) f(); else y++; } G  parser P' Ambiguous ... G Programminglanguage (CFG) ComputerScientist 2 Models of Real-World Physical Structures Ambiguous beneficial... P : "(" P ")" | "(" O ")" O : L P | P R | S P S | H L : "." L | "." R : "." R | "." S : "." S | "." H : "." H | "." "." "." M  prediction of physical structure   G  parser Unambiguous lethal... M' G ACGAT… physical structure model (CFG) Engineer

  4. Context-Free Grammar Ambiguity • However: Undecidable! • i.e., no one can decide this line: • However^2… • Ambiguity:*: multiple derivation trees? s s  T T’ Ambiguity means there  such that:   = ? ambiguous unambiguous

  5. However: Conservative Analysis! • Use conservative (over-)approximation: • “Yes!” “G guaranteed unambiguous!” • Safely use any GLR parser on G ...and never get two parses at runtime! ...just because it’s undecidable, doesn’t mean there aren’t (good)conservative approximations! Indeed, the whole area of static analysis works on “side-steppingundecidability”. ambiguous unambiguous . G Yes!

  6. Conservative Analysis (cont'd) • Undecidability means: “there’ll always be a slack”: • However, still useful! • Possible interpretations of “Don't know?”: • Treat as error(reject grammar): • “Please redesign your grammar” (as in LR(k)) • Treat as warning: • “Here are some potential problems” ambiguous . . unambiguous Don't know?

  7. Problems with Existing Solutions Hard to reason (locally) about ambiguity: • Intricate overall structural property of a grammar Are "left-to-right" (or "right-to-left")biased: • Cannot handle "palindromic grammars" (...a serious problem for RNA analysis)! Error messages: • Hard to "pin-point ambiguity" (in terms of grammar) • Also: would like "shortest examples" for debugging (...especially for grammar non-experts)! 1 2 3 conflicts: 25 shift/reduce, 13 reduce/reduce

  8. Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion

  9. Characterization of Ambiguity • Theorem 1 (characterization): G  G  G unambiguous "G is horizontally and Vertically unambiguous" • Note: • Ambiguity fully characterized • Still undecidable (...of course) • Structural problem  Finite number of linguistic problems

  10. EXP : ID | EXP '+' EXP | EXP '*' EXP Terminology:Context-Free Grammar NN • N finite set of nonterminals •  finite set of terminals • s  Nstart nonterminal •  : N  P(E*) production function, E = N   G =  N, , s,   Assume (trivially): • Reachability (all nN reachable from s) • Productivity (all nN derive some string) L: E* P(*)"language-of" operator, L(s)

  11. Vertical Unambiguity • “Vertical unambiguity”: • Example ("xy"): G n  N : , '  (n) :   '  L() L(') =  S : 'x' Y | X'y' Y : 'y' X : 'x' Vertically ambiguous string:  xy

  12.      X Y x a y X Y Horizontal Unambiguity • “Horizontal unambiguity”: where: is given by: • Example ("xay"): G n  N:   (n):  = lr  L(l) L(r) =  : P(*)  P(*)  P(*) "overlap" XY:= { xay | x,y*  a+  x,xaL(X)  y,ayL(Y) } S : X Y V : 'x' | 'x''a' Y : 'a' 'y' | 'y' Horizontallly ambiguous string:  xay

  13. Characterization of Ambiguity • Theorem 1 (characterization): • Lemma 1a: (“”) • Lemma 1b: (“”) G  G  G unambiguous "G is horizontally and Vertically unambiguous" (aka. "soundness") G  G  G unambiguous (aka. "completeness") G  G  G unambiguous

  14. Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion

  15.    (Over-)Approximation (A) • (Over-)Approximation, A L: • Approximated vertical unambiguity: • Approximated horizontal unambiguity: • Adecidable emptiness of “ ” and “ ” decidable (on co-dom(A )) L: E* P(*)   E* :L()  A() A: E* P(*) G A n  N : , '  (n) :   '  A() A(') =  G A n  N:   (n) :  = l r  A(l) A(r) =  

  16.    Unambiguity Approximation • Proposition 2 (approximation soundness): • Proof: • "Larger sets don't overlap  smaller sets don't overlap"(contrapositively: "Smaller sets conflict  Larger sets conflict"):   G unambiguous G G A A and hence by transitivity via (Theorem 1)    G G G G A A A() A(') =   L() L(') =  A(l) A(r) =   L(l) L(r) = 

  17. Compositionality (of A's) • Proposition 3 (compositionality): • Proof: • Follows from definition [proof omitted] • Also:“approximations are locally(!)compositional” A,A’decidable (over-)approximations AA’ decidable (over-)approximation A ambiguous AA’ unambiguous ambiguous unambiguous  ambiguous unambiguous A’

  18. Are there any Approximations!?! • Are there any approximations?!? • YES!; e.g., "The worst... ...approximation" • A*() :=*everything(constant) • Almost useless: • “Can only acquit totally trivial grammars: as unambiguous” but safe(!) ambiguous unambiguous worst approximation N : 'x'

  19. Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion

  20. Regular Approximation (AMN)! • AMN() = [Mohri-Nederhof]G() • CFG  REGDFA(Over-)Approximation • Properties of this “ ”: • Good (over-)approximation! • Produces regular languages: • almost everything is decidable (constructively, via automata)! • Note: • Works on a language-level, L(G), ... • ...not onthe structure-levelof the grammar, G • “Regular Approximation of Context-Free Grammars through Transformation” • [Mohri-Nederhof, 2000] Black-box

  21. Example: Odd/Even • Keeping track of parity (odd/even): Start : Even ; | Odd ; Even : "(" "(" Even ")" ")" ; |  ; Odd : "(" "(" Odd ")" ")" ; | "(" ")" ;   unambiguous grammar! L(Even) = { (2n )2n | n0 } L(Odd) = { (2n+1 )2n+1 | n0 } A(Even) = A(Odd) = { (2n+1 )2m+1| n,m0 } { (2n )2m | n,m0 }

  22. Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion

  23. Assessment (implementation) • Java implementation: • 7,400 lines of code (command line + GU interface) [ www.brics.dk/grammar/ ]

  24. Technology Transfer • Integrated in DotVocal's "Grammar Studio": Ambiguity analysis: Grammar Studio provides developers a powerful algorithm to test the vertical and horizontal ambiguities. Erasing any ambiguity in a grammar means to improve the effectiveness and by consequence the recognition too.

  25. Examples: Palindromesand "Anti-palindromes" • Palindromic examples: P : "a" P "a" ; | ; P : "a" P "a" ; | "b" P "b" ; | "b" ; | "a" ; | ; P : "a" P "a" ; | "a" ; | ; unambiguous grammar! unambiguous grammar! unambiguous grammar! R : "a" R "b" ; | "b" R "a" ; | "a" "b" ; | "b" "a" ; R : "a" R "b" ; | "b" R "a" ; | ; Note: all are non-LR-Regular grammars!!  unambiguous grammar! unambiguous grammar!

  26. ...inherent in RNA Analysis!!! "Predicting behavior of genes": "Complimentary base pairs" // 'G-C', 'A-U', and 'G-U': R : 'G'R'C' | 'C'R'G' | 'A'R'U' | 'U'R'A' | 'G'R'U' | 'U'R'G' | 

  27. Examples: RNA Analysis (G1) • RNA Analysis (G1): %> java –jar Grambiguity.jar G1.cfg *** vertical ambiguity detected: 'S[aS]' vs. 'S[Sa]' ambiguous string: "." *** vertical ambiguity detected: 'S[aa]' vs. 'S[SS]' ambiguous string: "()" *** vertical ambiguity detected: 'S[aS]' vs. 'S[SS]' ambiguous string: "." *** vertical ambiguity detected: 'S[Sa]' vs. 'S[SS]' ambiguous string: "." *** vertical ambiguity detected: 'S[SS]' vs. 'S[empty]' ambiguous string: "" *** horizontal ambiguity detected: 'S[SS:0..0]' vs. 'S[SS:1..1]' ambiguous string: "." *** ambiguous grammar: 5 vertical ambiguities 1 horizontal ambiguity /* ambiguous */ S[aa] : "(" S ")" ; [aS] | "." S ; [Sa] | S "." ; [SS] | S S ; [empty] | ; G1

  28. Examples: RNA Analysis (G2) • RNA Analysis (G2): *** vertical ambiguity detected: 'S[aS]' vs. 'S[Sa]' ambiguous string: "." *** vertical ambiguity detected: 'S[aPa]' vs. 'S[SS]' ambiguous string: "()" *** vertical ambiguity detected: 'S[aS]' vs. 'S[SS]' ambiguous string: "." *** vertical ambiguity detected: 'S[Sa]' vs. 'S[SS]' ambiguous string: "." *** vertical ambiguity detected: 'S[SS]' vs. 'S[empty]' ambiguous string: "" *** vertical ambiguity detected: 'P[aPa]' vs. 'P[S]' ambiguous string: "()" *** horizontal ambiguity detected: 'S[SS:0..0]' vs. 'S[SS:1..1]' ambiguous string: "." *** ambiguous grammar: 6 vertical ambiguities 1 horizontal ambiguity /* ambiguous */ S[aPa] : "(" P ")" ; [aS] | "." S ; [Sa] | S "." ; [SS] | S S ; [empty] | ; P[aPa] : "(" P ")" ; [S] | S ; G2

  29. Examples: RNA Analysis (G3-G6) • RNA Analysis (G3,G4,G5,G6): S[aS] : "." S ; [T] | T ; [empty] | ; T[Ta] : T "." ; [aSa] | "(" S ")" ; [TaSa] | T "(" S ")" ; G4 S[aPa] : "(" P ")" ; [aL] | "." L ; [Ra] | R "." ; [LS] | L S ; L[aPa] : "(" P ")" ; [aL] | "." L ; R[Ra] : R "." ; [empty] | ; P[aPa] : "(" P ")" ; [aNa] | "(" N ")" ; N[aL] : "." L ; [Ra] | R "." ; [LS] | L S ; G3 S[LS] : L S ; [L] | L ; L[aFa] : "(" F ")" ; [a] | "." ; F[aFa] : "(" F ")" ; [LS] | L S ; G6 unambiguous grammar! S[aS] : "." S ; [aSaS] | "(" S ")" S ; [empty] | ; G5

  30. Examples: RNA Analysis (G7+G8) • RNA Analysis (G7,G8): S[aPa] : "(" P ")" ; [aL] | "." L ; [Ra] | R "." ; [LS] | L S ; L[aPa] : "(" P ")" ; [aL] | "." L ; R[Ra] : R "." ; [empty] | ; P[aPa] : "(" P ")" ; [aNa] | "(" N ")" ; N[aL] : "." L ; [Ra] | R "." ; [LS] | L S ; G7 *** (potential) vertical ambiguity detected: 'P[aPa]' vs. 'P[aNa]' shortest ambiguous string: "(((.)" *** (potentially) ambiguous grammar: 1 (potential) vertical ambiguity 0 (potential) horizontal ambiguities Note:these are all spurious errors due to imprecisions in the analysis S[aS] : "." S ; [T] | T ; [empty] | ; T[Ta] : T "." ; [aPa] | "(" P ")" ; [TaPa] | T "(" P ")" ; P[aPa] : "(" P ")" ; [aNa] | "(" N ")" ; N[aS] : "." S ; [Ta] | T "." ; [TaPa] | T "(" P ")" ; G8 Acquitted as unambiguoususing unfolding technique! *** (potential) vertical ambiguity detected: 'P[aPa]' vs. 'P[aNa]' shortest ambiguous string: "(((.)" *** (potentially) ambiguous grammar: 1 (potential) vertical ambiguity 0 (potential) horizontal ambiguities

  31. Examples: "voss" & "voss-light" LR(k): LR(1) = 3 r/r conflicts LR(3) = 12 r/r conflicts LR(5) = 93 r/r conflicts LR(7) = 249 r/r conflicts LR(9) = 513 r/r conflicts ... P : "(" P ")" ; // P: Closed structure | "(" O ")" ; O : L P ; // O: Open structure | P R ; | S P S ; | H ; L : "." L ; // L: Left bulge | "." ; R : "." R ; // R: Right bulge | "." ; S : "." S ; // S: Singlestrand | "." ; H : "." H ; // H: Hairpin 3+loop | "." "." "." ; unambiguous grammar!

  32. Example: Java Expressions /* -- cont'd -- */ Exp5[add] : Exp5 "+" Exp6 ; [sub] | Exp5 "-" Exp6 ; [exp6] | Exp6 ; Exp6[mul] : Exp6 "*" Exp7 ; [div] | Exp6 "/" Exp7 ; [exp7] | Exp7 ; Exp7[not] : "!" Exp7 ; [exp8] | Exp8 ; Exp8[par] : "(" Exp ")" ; [con] | Con ; Con[num] : "0" ; [id] | "x" ; Exp[assign] : Exp1 "=" Exp ; [exp1] | Exp1 ; Exp1[or] : Exp1 "||" Exp2 ; [exp2] | Exp2 ; Exp2[and] : Exp2 "&&" Exp3 ; [exp3] | Exp3 ; Exp3[eq] : Exp3 "==" Exp4 ; [neq] | Exp3 "!=" Exp4 ; [exp4] | Exp4 ; Exp4[lt] : Exp4 "<" Exp5 ; [leq] | Exp4 "<=" Exp5 ; [gt] | Exp4 ">" Exp5 ; [geq] | Exp4 ">=" Exp5 ; [exp5] | Exp5 ; unambiguous grammar!

  33. Error Messages (Amb. Example) • Ambiguous Expressions: E[plus] : E "+" E ; [mult] | E "*" E ; [x] | "x" ;      precedence "+" vs. "*" *** vertical ambiguity detected: 'E[plus]' vs. 'E[mult]' ambiguous string: ”x*x+x” *** horizontal ambiguity detected: 'E[plus:0..0]' vs. 'E[plus:1..2]' ambiguous string: ”x+x+x” *** horizontal ambiguity detected: 'E[plus:0..1]' vs. 'E[plus:2..2]' ambiguous string: ”x+x+x” *** horizontal ambiguity detected: 'E[mult:0..0]' vs. 'E[mult:1..2]' ambiguous string: ”x*x*x” *** horizontal ambiguity detected: 'E[mult:0..1]' vs. 'E[mult:2..2]' ambiguous string: ”x*x*x” *** ambiguous grammar: 1 vertical ambiguity 4 horizontal ambiguities assoc. of "+" assoc. of "*"

  34. Benchmark Grammars UNAMBIGUOUS LR(k) .. LR(8) LR(7) LR(6) LR(5) LR(4) LR(3) G1 LR(2) (5V+1H) LR(1) LALR(1) G8 Exp Amb-Exp G4 O/E (1V+4H) G6 G5 G2 (6V+1H) P Base Voss R G7 Voss-light G3 [OUR] AMBIGUOUS

  35. Benchmarks (from Schmitz 2007) Unambiguous

  36. Benchmarks (from Schmitz 2007) Ambiguous

  37. Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion

  38. Related Work (Dynamic) • Dynamicdisambiguation: • “Disambiguation-by-convention”: • Longest match, most specific match, … • Customizable: • [Bison v. 1.5+]: %dprec, %merge • [ASF+SDF]: “disambiguation filters” • Dynamicambiguityinterception: • GLR ([Tomita], [Early], [Bison], [ASF+SDF], …) • [AMBER]

  39. Related Work (Static) • Staticdisambiguation: • “Disambiguation-by-convention”: • First match, most specific match, … • Customizable: • [Yacc]: %left, %right, %nonassoc, %prec • Staticambiguityinterception: • Our work goes here • LL(k), LALR(1), LR(k), LR-regular, … • Sylvain Schmitz (ICALP 2007): "Conservative Ambiguity Detection in Context-Free Grammars" "An Experimental Ambiguity Detection Tool" (LDTA 2007) • Subsumes LR-regular, Incomparable to our technique S : A A A : 'a' A 'a' | 'b' 

  40. Comparative Related Work • "Ambiguity Detection Methods for Context-Free Grammars" • H. J. S. Bas Basten (Master's thesis) • CWI, Universiteit van Amsterdam, Holland • "Ambiguity Detection for Context-Free Grammars in Eli" • Michael Kruse (Master's thesis) • Uni. Paderborn, Germany

  41. Outline • Introduction (and Motivation) • Characterization of Ambiguity • (aka. "Vertical-" and "Horizontal-" Ambiguity) • Framework (for Analyzing Ambiguity) • Regular Approximation (AMN) • Assessment (Applications and Examples) • Related Work • Conclusion

  42. Conclusion • Advantages (of our approach): • Characterization! •  Possible to reason (locally) about ambiguity •  (Composable) Analysis Framework •  Complete decision procedure for regular grammars •  Inherently parallelizable •  DFA Counterexamples: •  and shortest (possibly) ambiguous string •  Not "left-to-right" or "right-to-left" biased: •  Can handle palindromic grammars •  Well-suited for RNA analysis :)

  43. Conclusion (cont'd) “Analyzing Ambiguity of Context-Free Grammars” It has been known since 1962 that the ambiguity problem for context-free grammars is undecidable. Ambiguity in context-free grammars is a recurring problem in language design and parser generation, as well as in applications where grammars are used as models of real-world physical structures. However, the fact that the problem is undecidable does not mean that there are no useful approximations to the problem. We observe that there is a simple linguistic characterization of the grammar ambiguity problem, and we show how to exploit this to conservatively approximate the problem based on local regular approximations and grammar unfoldings. As an application, we consider grammars that occur in RNA analysis in bioinformatics, and we demonstrate that our static analysis of context-free grammars is sufficiently precise and efficient to be practically useful.

  44. Thank you Questions, please?

  45. BONUS SLIDES

  46. Other Approximation Strategies • The ”EmptyString” Approximation: • The ”MayMust” Approximation: • …

  47. Asymptotic (Time) Complexity h • [Mohri-Nederhof]: O(n2vh) • Vertical Amb: O(n3v4h4) • Horizontal Amb: O(n3v3h5) • Total: O(n3v3h4(v+h)) O(g5) N1: e1,1 … ea,1 | … | e1,p … ea,p • n = |N| • v = max {|(N)|, NN} • h = max {||, (N), NN} • g = nvh = |G| v n

  48.  AMNis Decidable! • . • Constructively decidable (using DFAs): • O(|XDFA||YDFA|) • Constructively decidable(using DFAs): • O(|XDFA||YDFA|) • Constructively decidable • with potential counterexamples(as DFAs);i.e., we can extract shortest (potentially ambiguous) strings! XY =  XY =   AMN AMN

  49.  X Y x a y X Y Decision Algorithm for (X Y) • For X,Y regular languages (NFAs): • All overlappings,“xay” (as DFA's) • (essentially a variant of "DFA product-construction", '') a a  x y XNFA YNFA X'NFA Y'NFA [X;Y]NFA a a path :  a

  50. Example: Expressions • Expressions: Note: General problem with non-linear recursive structures However, there's a trick... E[term] : T ; [plus] | E "+" T ; T[x] : "x" ; [par] | "(" E ")" ; *** (potential) vertical ambiguity detected: 'E[term]' vs. 'E[plus]' shortest ambiguous string: "x+x" *** (potential) horizontal ambiguity detected: 'E[plus:0..0]' vs. 'E[plus:1..2]' shortest ambiguous string: "x+x+x" *** (potential) horizontal ambiguity detected: 'E[plus:0..1]' vs. 'E[plus:2..2]' shortest ambiguous string: "x+x+x" *** (potentially) ambiguous grammar: 1 (potential) vertical ambiguity 2 (potential) horizontal ambiguities

More Related