1 / 85

Tools and Analyses for Ambiguous Input Streams

Tools and Analyses for Ambiguous Input Streams. Andrew Begel and Susan L. Graham University of California, Berkeley LDTA Workshop - April 3, 2004. Harmonia: Language-aware Editing. Programming by Voice Code dictation Voice-based editing commands Program Transformations

yon
Download Presentation

Tools and Analyses for Ambiguous Input Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tools and Analyses for Ambiguous Input Streams Andrew Begel and Susan L. Graham University of California, Berkeley LDTA Workshop - April 3, 2004

  2. Harmonia:Language-aware Editing • Programming by Voice • Code dictation • Voice-based editing commands • Program Transformations • Transformation actions • Pattern-matching constructs LDTA 2004

  3. Harmonia:Language-aware Editing • Programming by Voice • Code dictation • Voice-based editing commands • Program Transformations • Transformation actions • Pattern-matching constructs Human Speech LDTA 2004

  4. Harmonia:Language-aware Editing • Programming by Voice • Code dictation • Voice-based editing commands • Program Transformations • Transformation actions • Pattern-matching constructs Human Speech EmbeddedLanguages LDTA 2004

  5. Harmonia:Language-aware Editing • Programming by Voice • Code dictation • Voice-based editing commands • Program Transformations • Transformation actions • Pattern-matching constructs Human Speech EmbeddedLanguages Each kind of input stream ambiguity requires new language analyses LDTA 2004

  6. for int i equals zero i less than ten i plus plus Speech Example for (int i = 0; i < 10; i++ ) {  } LDTA 2004

  7. Ambiguities for (int i = 0; i < 10; i++ ) {  } 4 int eye equals 0 aye less then10 i plus plus LDTA 2004

  8. Ambiguities ID Spelling? for (int i = 0; i < 10; i++ ) {  } KW or ID? KW or #? 4 int eye equals 0 aye less then10 i plus plus LDTA 2004

  9. for times ate equals zero two plus equals one Another Utterance LDTA 2004

  10. for times ate equals zero two plus equals one Many Valid Parses! 4 * 8 = zero; to += won  for (times; ate == 0; to += 1) {  } fore.times(8).equalsZero(2, plus == 1)  LDTA 2004

  11. Embedded Language Example • C and Regexps embedded in Flex Flex Rule for Identifiers [_a-zA-Z]([_a-zA-Z0-9])*i++; RETURN_TOKEN(ID); LDTA 2004

  12. Embedded Language Example • C and Regexps embedded in Flex Flex Rule for Identifiers [_a-zA-Z]([_a-zA-Z0-9])*i++; RETURN_TOKEN(ID); • Why not this interpretation? [_a-zA-Z]([_a-zA-Z0-9])* i++; RETURN_TOKEN(ID); LDTA 2004

  13. Fortran DO 57 I = 3,10 Legacy Language Example LDTA 2004

  14. Fortran Do Loop DO 57I=3,10 Legacy Language Example LDTA 2004

  15. Fortran Do Loop DO 57I=3,10 DO 57 I = 3 Legacy Language Example LDTA 2004

  16. Fortran Do Loop DO 57I=3,10 Assignment DO 57 I =3 Legacy Language Example LDTA 2004

  17. Fortran Do Loop DO 57I=3,10 Assignment DO57I =3 Legacy Language Example LDTA 2004

  18. Legacy Language Example • PL/I • Non-reserved Keywords IF IF = THEN THEN THEN = ELSE ELSE ELSE = END END LDTA 2004

  19. Legacy Language Example • PL/I • Non-reserved Keywords IF IF = THEN THEN THEN = ELSE ELSE ELSE = END END ID ID KW ID LDTA 2004

  20. Input Stream Classification LDTA 2004

  21. Input Stream Classification Embedded Languages Fall in all Four Categories! LDTA 2004

  22. GLR Analysis Architecture for (i = 0; i < 10; i++ ) {  } Lexer GLR Parser Semantics FOR I FOR ( I LDTA 2004

  23. GLR Analysis Architecture for (i = 0; i < 10; i++ ) {  } Handles syntactic ambiguities Lexer GLR Parser Semantics FOR I FOR ( I LDTA 2004

  24. Our Contribution:XGLR Analysis Architecture for i equals zero ... Lexer XGLR Parser Semantics FOR I FOR I LDTA 2004

  25. Our Contribution:XGLR Analysis Architecture for i equals zero ... Handles input stream ambiguities Lexer XGLR Parser Semantics FOR I FOR I 4 EYE LDTA 2004

  26. = 0 I KW # ID FOR KW LR Parsing Parse Stack Input Stream 1 Parse Table LDTA 2004

  27. = 0 I KW # ID FOR KW LR Parsing Parse Stack Input Stream 1 Parse Table LDTA 2004

  28. = 0 I KW # ID FOR KW LR Parsing Parse Stack Input Stream 1 3 Parse Table LDTA 2004

  29. = 0 I KW # ID FOR KW GLR Parsing Parse Stack Input Stream Parse Table 1 LDTA 2004

  30. = 0 I KW # ID FOR KW GLR Parsing Parse Stack Input Stream Parse Table 1 LDTA 2004

  31. = 0 I KW # ID FOR KW GLR Parsing Parse Stack Input Stream 2 5 Parse Table 1 LDTA 2004

  32. = 0 I # KW ID FOR FOR KW KW GLR Parsing Parse Stack Input Stream 2 4 5 Parse Table 1 3 LDTA 2004

  33. XGLR in Action LDTA 2004

  34. Parsing Homophones 23 FOR BAR LDTA 2004

  35. XGLR Extension: Multiple Spellings, Single and Multiple Lexical Categories FOUR FORE ID 23 FOR BAR KW 4 NUM LDTA 2004

  36. XGLR Extension: Parsers fork due to input ambiguity FOUR 23 FORE ID 23 FOR BAR KW 4 23 NUM LDTA 2004

  37. Each parser shifts its now unambiguous input FOUR 26 23 FORE ID 23 FOR 29 BAR KW 4 35 23 NUM LDTA 2004

  38. The next input is lexed unambiguously FOUR 26 23 FORE ID 23 FOR 29 BAR KW ID 4 35 23 NUM LDTA 2004

  39. ID is only a valid lookahead for two parsers FOUR 26 49 23 FORE ID 23 FOR 29 BAR 42 KW ID 4 35 23 NUM LDTA 2004

  40. Parsing Embedded Languages Example BNF Grammar Contains Languages L and W bL loopLdW ENDL loopL LOOPL |  dW  WHILEW NUMW doW doW  DOW |  L W LDTA 2004

  41. Parsing Embedded Languages Example BNF Grammar Contains Languages L and W bL loopLdW ENDL loopL LOOPL |  dW  WHILEW NUMW doW doW  DOW |  LOOP WHILE 34 END WHILE 56 DO END L W LDTA 2004

  42. LDTA 2004

  43. LDTA 2004

  44. LDTA 2004

  45. LDTA 2004

  46. Parsing Embedded Languages S 0 LOOP WHILE 34 LDTA 2004

  47. S 0 LOOP WHILE 34 Current parse state has ambiguous lexical language LDTA 2004

  48. L 0 S LOOP WHILE 34 W 0 XGLR Extension: Fork parsers, assign one to each lexical language LDTA 2004

  49. L L 0 LOOP KW S WHILE 34 W W 0 LOOP ID XGLR Extension: Single spelling, Multiple lexical categories Lex lookahead both in language L and W LDTA 2004

  50. L L L 0 LOOP 4 KW S WHILE 34 W W 0 LOOP ID Only LOOPL is valid lookahead, and is shifted LDTA 2004

More Related