1 / 51

Chapter 9

Chapter 9. Compilers and Language Translation. The Compilation Process. Phase I: Lexical analysis Phase II: Parsing Phase III: Semantics and code generation Phase IV: Code Optimization. Introduction. High-level languages are more difficult to “ translate ” than assembly languages.

ernst
Download Presentation

Chapter 9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 9 Compilers and Language Translation

  2. The Compilation Process • Phase I: Lexical analysis • Phase II: Parsing • Phase III: Semantics and code generation • Phase IV: Code Optimization

  3. Introduction • High-level languages are more difficult to “translate” than assembly languages. • Assembly language and machine language are related 1-to-1. • The relationship between a high-level language and machine language is 1-to-many.

  4. Compiler • The piece of software that translates high-level programming language codes into machine language codes. • Two distinct goals of compiler: • Correctness • Efficient and conciseExample: 2x0+2x1+…+2x50000

  5. The Compilation Process Scanner Parser Code Generator Optimizer Objectfile

  6. Lexical Analysis • The compiler examines the individual characters in the source program and groups them into syntactical units, called tokens, that will be analyzed in succeeding stages. • Analogous to grouping letters into words prior to analyzing text.

  7. Parsing • During this stage the sequence of tokens formed by the scanner is checked to see whether it is syntactically correct according to the rules of the programming language. • Equivalent to checking whether the words in the text form grammatically correct sentences.

  8. Semantic Analysis and Code Generation • If the high-level language statement is structurally correct, then the compiler analyzes its meaning and generates the proper sequence of machine language instructions to carry out these actions.

  9. Code Optimization • The compiler takes the generated code and see whether it can be made more efficient, either by making it run faster, or having it occupy less memory.

  10. Phase I: Lexical Analysis • Scanner, or lexical analyzer, groups input characters into tokens. • Example:a = b + 319 - delta; • The scanner discards nonessential characters, such as blanks and tabs, and the group the remaining characters into high-level syntactic symbols such as symbols, numbers, and operators.

  11. Token Classifications • Token type Classification number symbol 1 number 2 • Others: =(3),+(4),-(5),;(6); ==(7), if(8), else (9), ( 10, ) 11

  12. Phase II: Parsing • During the parsing phase, a compiler determines whether the tokens recognized by the scanner fit together in a grammatically meaningful way. • Analogous to the operation of “diagramming a sentence”.

  13. Example • To prove the sequence of words: The man bit the dog is a correctly formed sentence.

  14. Another Example The man bit the

  15. Programming Language Example • Statement: a = b + c

  16. Parse Tree • The structure shown in the previous example is called a parse tree. • It starts from the individual tokens a,=,b,+,c and show how these tokens can be grouped together into predefined grammatical categories such as <symbol>, <addition operator> and <expression> until the desired goal is reached. (in this case, <assignment statement>)

  17. Grammars, Languages and BNF • How does a parser know how to construct the parse tree? • The parser must be given a formal description of the syntax, the grammatical structure, of the language that it is going to analyze. • Most widely used notation for representing the syntax of programming language is called BNF, an acronym for Backus-Naur form.

  18. BNF • The syntax of a language is specified as a set of rules, also called productions. • The entire collection of rules is called a grammar. • BRN rule:left-hand side::=“definition”

  19. BNF Example • <assignment statement>::=<symbol>=<expression> • The rule says that the syntactical construct called <assignment statement> is defined as a <symbol> followed by the token = followed by the syntactical construct called <expression>

  20. Terminal/Nonterminals • BNF uses two types of objects on the right hand side of a productions: • Terminals: actual tokens of the language recognized and returned by a scanner. • Nonterminals: an intermediate grammatical category used to help explain and organize the language.

  21. Goal Symbol • The goal symbol is the highest-level nonterminal. • When goal symbol has been produced, the parser has finished building the tree, and the statements have been successfully parsed. • The collection of all statements that can be successfully parsed is called the language defined by a grammar.

  22. Meta-symbols • Meta-symbol: used to describe the characteristics of another language. • BNF has five meta-symbols: < > ::= | :OR, Ex:<digit>:=0|1|2|3|4|5|6|7|8|9 L : null string Ex:<signed integer>:= <sign><number> <sign>:= +|-|L

  23. Fundamental Rule of Parsing • If, by repeated applications of the rules of the grammar, a parser can convert the sequence of input tokens into the goal symbol, then that sequence of tokens is a syntactically valid statement of the language.

  24. Example • A three-rule grammar • <sentence>::=<noun><verb> • <noun>::= bees|dogs • <verb>::=buzz|bite • Example 1: Dogs bite. • Example 2: Bees dogs.

  25. Another Example • Grammar for a simplified assignment statement • <assignment statement>::=<variable>=<expression> • <expression>::=<variable>|<variable>+<variable> • <variable>::= x|y|z

  26. Generated Parse Tree

  27. Wrong Path

  28. How to parse? • The process of parser is a complex sequence of applying rules, building grammatical constructs, seeing whether things are moving toward the correct answer (the goal symbol). If not, “undo” the rule just applied and try another. • Look-ahead parsing algorithm: “looking down the road” a few tokens to see what would happen if a certain choice were made.

  29. Example Not possible to build a parse tree with the grammar.

  30. Major Challenge • Design a grammar that: • Includes every valid statement that we want to be in the language • Excludes every invalid statement that we do not want to be in the language

  31. Assignment Statement (2nd try) • <assignment statement>::=<variable>=<expression> • <expression>::=<variable>|<expression>+<expression> (recursive definition) • <variable>::= x|y|z

  32. Resulting Parse Tree

  33. Using Recursive Definition

  34. Validity vs. Ambiguity • It is possible to construct two parse trees of x=x+y+z using the 2nd grammar. Two different meanings. • X=(x+y)+z x=x+(y+z)

  35. If-else grammar

  36. Parse Tree

  37. Phase III: Semantics and Code Generation • <sentence>::=<noun><verb> • <noun>::= bees|dogs • <verb>::=buzz|bite • Possible combinations: • Dogs bite. • Dogs bark. • Bees bite. • Bees bark. • Not all combinations make sense.

  38. Semantics and Code Generation • A compiler examines the semantics of a programming language statement. It analyzes the meaning of the tokens and tries to understand the actions they perform. • If the statement is meaningless, it is semantically rejected. Otherwise it is translated into machine language.

  39. Example • The statementsum=a+b; is syntactically correct. • But what if the variables are defined as follows: char a; double b; int sum;

  40. Semantic Records • Each nonterminal symbol is associated with a semantic record, a data structure that stores information about a nonterminal, such as the actual name of the object and its data type.

  41. Semantic Records (II) • Grows gradually.

  42. Another Situation

  43. Two-Stage Process • Semantic analysis: a pass over the parse tree to determine whether all branches of the tree are semantically valid. • Code generation: the compiler makes a 2nd pass over the parse tree to produce the translated code.

  44. Example

  45. Example (cont’d)

  46. Example (cont’d)

  47. Example (cont’d)

  48. Example (cont’d)

  49. Code Optimization • To make the code more efficient: • Local optimization • Global optimization • Different from programmer optimization with compiler tools such as: • Visual development environments • On-line debuggers • Reusable code libraries

  50. Local Optimization • Look at a very small block of instructions and try to improve it. • Possible approaches • Constant evaluation: x=1+1; • Strength reduction: x=x*2; • Eliminating unnecessary operations

More Related