1 / 33

Syntax Analysis: Introduction, Context-Free Grammars, and Parsing Techniques

This chapter provides an overview of syntax analysis, including the role of parsers in a compiler, error handling strategies, context-free grammars, and writing grammars. It also covers top-down and bottom-up parsing techniques, LR parsing, and handling ambiguous grammars. The chapter discusses various error recovery strategies and introduces the concept of parse trees and derivations.

susantravis
Download Presentation

Syntax Analysis: Introduction, Context-Free Grammars, and Parsing Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 4 Syntax Analysis

  2. Content • Overview of this chapter • 4.1 Introduction • 4.2 Context-Free Grammars • 4.3 Writing a Grammar • 4.4 Top-Down Parsing • 4.5 Bottom-Up Parsing • 4.6 Introduction to LR Parsing: Simple LR • 4.7 More Powerful LR Parsers • 4.8 Using Ambiguous Grammars • 4.9 Parser Generators

  3. 4.1 Introduction

  4. 4.1 Introduction In this section, we • Examine the way the parser fits into a typical compiler • Look at typical grammars for arithmetic expressions • Discuss error handling

  5. 4.1.1 The Role of the Parser • Position of parser in compiler model • Types of parsers • Universal • Top-down • Bottom-up

  6. 4.1.2 Representative Grammars • Grammar 4.1(LR, suitable for bottom-up parsing) • Grammar 4.2(Non-left-recursive, used for top-down) • Grammar 4.3(Handling ambiguities)

  7. 4.1.3 Syntax Error Handling • Common programming errors: • Lexical errors • Syntactic errors • Semantic errors • Logical errors • Parsing methods allows syntactic errors to be detected • Goals of error handler in a parser: • Report the presence of errors • Recover from each error • Add minimal overhead

  8. 4.1.4 Error-Recovery Strategies • Panic-Mode Recovery • Phrase-Level Recovery • Error Productions • Global Correction

  9. 4.2 Context-Free Grammars

  10. 4.2 Context-Free Grammars In this section, we • Review the definition of a context-free grammar • Introduce terminology for talking about parsing

  11. 4.2.1 The Formal Definition of a Context-Free Grammar • A context-free grammar consists of: 1. terminals: Basic symbols from which strings are formed 2. nonterminals:Syntactic variables that denote sets of strings 3. start symbol: One nonterminal 4.productions: (specify the manner) 1) A nonterminal called the head or left side 2) The symbol  3) A body or right sideconsisting of zero or more terminals and nonterminals

  12. 4.2.1 The Formal Definition of a Context-Free Grammar • Example: terminals: id, +, - ,*, /, (, ) nonterminals: expression, term, factor start symbol: expression

  13. 4.2.2 Notational Conventions • terminals: 1. Lowercase letters: a, b, c… 2. Operator symbols: +, *, … 3. Punctuation symbols: parentheses, comma… 4. Digits: 0,1,2… 5. Boldface strings: such as id ,if • nonterminals: 1. Uppercase letters: A, B, C… 2. The letter S 3. Lowercase, italic names such as expr or stmt 4. When discussing programming constructs:E,T,F

  14. 4.2.2 Notational Conventions • Uppercase letters late in the alphabet: such as X, Y, Z, representgrammar symbols (either nonterminals or terminals.) • Lowercase letters late in the alphabet, chiefly u, v, . . . , x, represent strings of terminals • Lowercase Greek letters: α,β,γ, represent strings ofgrammar symbols • A set of productions A -> α1, A -> α2, … ,or A-> α1| α2|…, Call α1, α2,. . . the alternatives for A • The head of the first production is the start symbol

  15. 4.2.3 Derivations Consider grammar: : “E derives -E“ A->γ is a production, then : derives in one step : derives in zero or more steps : derives in one or more steps • leftmost andrightmost derivations

  16. 4.2.4 Parse Trees and Derivations • Parse tree for – ( id + id ) • BASIS:The tree for a1=A is a single node labeled A • INDUCTION: i-1 = XI X2 …. Xk (Xi is either a nonterminal or a terminal). Suppose i is • derived from i-1 by replacing Xj, a nonterminal, by , = Y1Y2 …..Ym. That is, at the ith step of the derivation, production Xj is applied to i-1 to • derive i = XIX2 ….Xj-1 Xj+l …Xk

  17. 4.2.4 Parse Trees and Derivations • Example: Sequence of parse trees for derivation (4.8)

  18. 4.2.5 Ambiguity • ambiguous: A grammar that produces more than one parse tree for some sentence • Example:(4.3) grammar (4.3) permits two distinct leftmost derivations for the sentence id+id*id

  19. 4.2.5 Ambiguity two parse trees for id+id*id

  20. 4.2.6 Verifying the Language Generated by a Grammar • A grammar G generates a language L has two parts: 1. Every string generated by G is in L 2. Every string in L can be generated by G • Example:S->(S)S|Є 1. Every sentence derivable from S is balanced 2. Every balanced string is derivable from S

  21. 4.2.7 Context-Free Grammars Versus RegularExpressions • Grammars are more powerful than regular expressions 1. Every construct that can be described by a regular expression can be described by a grammar, but not vice-versa 2. Every regular language is a context-free language, but not vice-versa • Example 1:(alb)*abb can be described by grammar:

  22. 4.2.7 Context-Free Grammars Versus RegularExpressions • Example 2: Consider the language with an equal number of a's and b's Can be described by a grammar but not regular expression

  23. 4.3 Writing a Grammar

  24. 4.3 Writing a Grammar In this section, we • Discuss how to divide work between lexical analyzer and a parser • Consider several transformations • One technique that can eliminate ambiguity • Left-recursion elimination and left factoring • Consider some programming language constructs that cannot be described by any grammar

  25. 4.3.1 Lexical Versus Syntactic Analysis • Why use regular expressions? • Provides a convenient way of modularizing the front end of a compiler into two manageable-sized components • Quite simple • Provide a more concise and easier-to-understand notation for tokens • More efficient lexical analyzers can be constructed automatically • Regular expressions:identifiers, constants, keywords, whitespace (structure of constructs) • Grammars:balanced parentheses, matching begin-end's, corresponding if-then-else's (nested structures)

  26. 4.3.2 Eliminating Ambiguity • "dangling-else“ grammar is ambiguous since string: “if El then if E2 then S1 else S2” has the two parse trees:

  27. 4.3.2 Eliminating Ambiguity • Rewrite “dangling-else” grammar: unambiguous

  28. 4.3.3 Elimination of Left Recursion • What is left recursive? A grammar is left recursive if it has a nonterminal A such that there is a derivation A Aα for some string α e.g. A ->Aα|β • Why eliminating left recursion? Top-down parsing methods cannot handle left-recursive grammars • Technique of eliminating left recursion: • Group the productions as:

  29. 4.3.3 Elimination of Left Recursion 2. Replace the A-productions by: • Example:

  30. 4.3.4 Left Factoring • left factoring: A grammar transformation that is useful for producing a grammar suitable for predictive, or top-down, parsing e.g. stmt ->if expr then stmt else stmt | ifexpr then stmt • Left factoring a grammar: 1. Find the longest prefix α of A, α is common to two or more of its alternatives

  31. 4.3.4 Left Factoring 2. Replace by A’ is a new nonterminal 3. Repeatedly apply this transformation • Example:

  32. 第二次作业 • 3.9.4 (2) • 4.2.1 • 4.3.1

  33. The end of Lecture04

More Related