1 / 40

Compiler construction in4020 – course 2001/2002

Compiler construction in4020 – course 2001/2002. Koen Langendoen Delft University of Technology The Netherlands. Goals. understand the structure of a compiler understand how the components operate understand the tools involved scanner generator, parser generator, etc. understanding means

Download Presentation

Compiler construction in4020 – course 2001/2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler constructionin4020 – course 2001/2002 Koen Langendoen Delft University of Technology The Netherlands

  2. Goals • understand the structure of a compiler • understand how the components operate • understand the tools involved • scanner generator, parser generator, etc. • understanding means • [theory] be able to read source code • [practice] be able to adapt/write source code

  3. Format:“werkcollege” + practicum • 14 x 2 hours of interactive lectures 1 sp • book “Modern Compiler Design” • schedule: see blackboard • handouts: see blackboard • assignment 2 sp • groups of 2 students • modify reference compiler • oral exam 1 sp

  4. Homework • find a partner for the “practicum” • register your group • send e-mail to koen@pds.twi.tudelft.nl

  5. compiler program in some source language executable code for target machine What is a compiler?

  6. semantic represen- tation front-end analysis back-end synthesis compiler What is a compiler? program in some source language executable code for target machine

  7. Why study compilerconstruction? • curiosity • better understanding of programming language concepts • wide applicability • transforming “data” is very common • many useful data structures and algorithms • practical application of “theory”

  8. Overviewlecture 1 • [introduction] • compiler structure • exercise ----------------- 15 min. break ---------------------- • lexical analysis • excercise

  9. program in some source language executable code for target machine front-end analysis back-end synthesis program in some source language executable code for target machine semantic represen- tation front-end analysis back-end synthesis compiler executable code for target machine back-end synthesis Compiler structure • L+M modules = LxM compilers

  10. program in some source language executable code for target machine front-end analysis back-end synthesis program in some source language executable code for target machine semantic represen- tation front-end analysis back-end synthesis compiler executable code for target machine back-end synthesis Limitations of modular approach • performance • generic vs specific • loss of information • variations must be small • same programming paradigm • similar processor architecture

  11. Semantic representation • heart of the compiler • intermediate code • linked lists of pseudo instructions • abstract syntax tree (AST) program in some source language executable code for target machine semantic represen- tation front-end analysis back-end synthesis compiler

  12. AST example • expression grammar expression  expression ‘+’ term | expression ‘-’ term | term term  term ‘*’ factor | term ‘/’ factor | factor factor  identifier | constant | ‘(‘ expression ‘)’ • example expression b*b – 4*a*c

  13. parse tree: b*b – 4*a*c expression expression ‘-’ term term term factor ‘*’ factor factor term term identifier ‘*’ ‘*’ identifier identifier factor factor ‘c’ identifier constant ‘b’ ‘a’ ‘b’ ‘4’

  14. AST: b*b – 4*a*c ‘-’ ‘*’ ‘*’ ‘b’ ‘b’ ‘*’ ‘c’ ‘4’ ‘a’

  15. identifier constant term expression annotated AST: b*b – 4*a*c type: real loc: reg1 ‘-’ type: real loc: reg1 type: real loc: reg2 ‘*’ ‘*’ type: real loc: sp+16 type: real loc: sp+16 type: real loc: reg2 type: real loc: sp+24 ‘b’ ‘b’ ‘*’ ‘c’ type: real loc: const type: real loc: sp+8 ‘4’ ‘a’

  16. AST exercise (5 min.) • expression grammar expression  expression ‘+’ term | expression ‘-’ term | term term  term ‘*’ factor | term ‘/’ factor | factor factor  identifier | constant | ‘(‘ expression ‘)’ • example expression b*b – (4*a*c) • draw parse tree and AST

  17. Answers

  18. answerparse tree: b*b – 4*a*c expression expression ‘-’ term term term factor ‘*’ factor factor term term identifier ‘*’ ‘*’ identifier identifier factor factor ‘c’ identifier constant ‘b’ ‘a’ ‘b’ ‘4’

  19. answerparse tree: b*b – (4*a*c) expression expression ‘-’ term term factor factor term expression ‘(’ ‘)’ ‘*’ identifier factor identifier ‘b’ ‘4*a*c’ ‘b’

  20. Break

  21. program text lexical analysis tokens syntax analysis AST front-end context handling annotated AST front-end: from program text to AST

  22. program text token description scanner generator lexical analysis tokens parser generator language grammar syntax analysis AST context handling annotated AST front-end: from program text to AST

  23. lex-i-cal: of or relating to words or the vocabulary of a language as distinguished from its grammar and construction Webster’s Dictionary Lexical analysis • covert stream of characters to stream of tokens • what is a token? • sequence of characters with a semantic notion, see language definition • rule of thumb: two characters belong to the same token if inserting white space changes the meaning. digit = *ptr++ - ’0’; digit = *ptr+ + - ’0’;

  24. Lexical analysis • covert stream of characters to stream of tokens • what is a token? • sequence of characters with a semantic notion, see language definition • rule of thumb: two characters belong to the same token if inserting white space changes the meaning. digit = *ptr++ - ’0’; digit = *ptr+ + - ’0’;

  25. Tokens • attributes • type • lexeme • value • file position • examples typedef struct { int class; char *repr; file_pos position; } Token_Type;

  26. Non-tokens • white spaces spaces, tabs, newlines • comments /* a C-style comment */ // a C++ comment • preprocessor directives #include “lex.h” #define is_digit(d) (’0’ <= (d) && (d) <= ’9’)

  27. Regular expressions

  28. Examples ofregular expressions • an integer is a sequence of digits: [0-9]+ • an identifier is a sequence of letters and digits; the first character must be a letter: [a-z][a-z0-9]*

  29. Regular descriptions • structuring regular expressions by introducing named sub expressions letter [a-zA-Z] digit  [0-9] letter_or_digit  letter | digit identifier  letter letter_or_digit* • define before use

  30. Exercise (5 min.) • write down regular descriptions for the following descriptions: • an integral number is a non-zero sequence of digits optionally followed by a letter denoting the base class (b for binary and o for octal). • a fixed-point number is an (optional) sequence of digits followed by a dot (’.’) followed by a sequence of digits. • an identifier is a sequence of letters and digits; the first character must be a letter. The underscore _ counts as a letter, but may not be used as the first or last character.

  31. Answers

  32. Answers base  [bo] integral_number  digit+ base? dot  \. fixed_point_number digit* dot digit+ letter [a-zA-Z] digit  [0-9] underscore  _ letter_or_digit  letter | digit letter_or_digit_or_und  letter_or_digit| underscore identifier  letter(letter_or_digit_or_und*letter_or_digit+)?

  33. get_next_token() Lexical analysis • covert stream of characters to stream of tokens • tokens are defined by a regular description • tokens are demanded one-by-one by the syntax analyzer program text AST lexical analyzer syntax analyzer tokens

  34. interface extern Token_Type Token; /* Global variable that holds the current token. */ void start_lex(void); /* Must be called before the first call to * get_next_token(). */ void get_next_token(void); /* Load the next token into the global * variable Token. */

  35. lexical analysis by hand • read complete program text into memory for simplicity • avoids buffering and arbitrary limits • variable length tokens • get_next_token() dispatches on the next character dot input: main() { printf( ”hello world\n”);}

  36. void get_next_token(void) { int start_dot; skip_layout_and_comment(); /* now we are at the start of a token or at end-of-file, so: */ note_token_position(); /* split on first character of the token */ start_dot = dot; if (is_end_of_input(input_char)) { Token.class = EoF; Token.repr = "<EoF>"; return; } if (is_letter(input_char)) {recognize_identifier();} else if (is_digit(input_char)) {recognize_integer();} else if (is_operator(input_char) || is_separator(input_char)) { Token.class = input_char; next_char(); } else {Token.class = ERRONEOUS; next_char();} Token.repr = input_to_zstring(start_dot, dot-start_dot); }

  37. Character classification & token recognition #define is_end_of_input(ch) ((ch) == '\0') #define is_layout(ch) (!is_end_of_input(ch) && (ch) <= ' ') #define is_uc_letter(ch) ('A' <= (ch) && (ch) <= 'Z') #define is_lc_letter(ch) ('a' <= (ch) && (ch) <= 'z') #define is_letter(ch) (is_uc_letter(ch) || is_lc_letter(ch)) #define is_digit(ch) ('0' <= (ch) && (ch) <= '9') #define is_letter_or_digit(ch) (is_letter(ch) || is_digit(ch)) #define is_underscore(ch) ((ch) == '_') #define is_operator(ch) (strchr("+-*/", (ch)) != NULL) #define is_separator(ch) (strchr(";,(){}", (ch)) != NULL) void recognize_integer(void) { Token.class = INTEGER; next_char(); while (is_digit(input_char)) {next_char();} }

  38. Summary • compiler is a structured toolbox • front-end: program text  annotated AST • back-end: annotated AST  executable code • lexical analysis: program text  tokens • token specifications • implementation by hand • exercises • AST • regular descriptions

  39. program text token description scanner generator lexical analysis tokens syntax analysis AST context handling annotated AST Next week • Generating a lexical analyzer • generic methods • specific tool lex

  40. Homework • find a partner for the “practicum” • register your group • send e-mail to koen@pds.twi.tudelft.nl • print handout lecture 2 [blackboard]

More Related