1 / 14

Today’s Agenda

Today’s Agenda. Compilation > Syntax Analysis > Lexical Analysis. Typical Compiler - Phases. Front End. P’ in M. Scanner. Parser. Instruction Scheduler. Type Checker. Semantic Analyzer. Register Allocator. IR Generator. Instruction Selector. Back End. Compiler. P in L.

grover
Download Presentation

Today’s Agenda

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Today’s Agenda Compilation > Syntax Analysis > Lexical Analysis

  2. Typical Compiler - Phases Front End P’ in M Scanner Parser Instruction Scheduler Type Checker Semantic Analyzer Register Allocator IR Generator Instruction Selector Back End Compiler P in L Code Emitter Optimizers IR

  3. Syntax Analysis • Typically includes • Parsing and Scanning • Syntax Specification for a natural language • Grammar for sentences and paragraphs. • Words are defined by convention • Syntax Specification for a PL • Grammar for constructs - expressions, statements, declarations/definitions (of procedures/variables/types/modules etc.) • Typically a Context Free Grammar covers most of it. • Words and Symbols are precisely defined • using regular expressions

  4. Lexical Analysis • Requirement Specification: • Input – Program as a text stream • Output – Token Stream • Chaff (what’s filtered out) – Comments, White Space (Blanks, Tabs, Newlines) • Error – Invalid token, Invalid comment, • Example Use Case Scenario: • Parser requests Scanner: getNextToken() • Scanner returns: a token if available • The token is a terminal symbol in the Grammar (defining the lang. recognized by the parser).

  5. Lexical Analysis module main import math.*; /* Function main */ var main : integer --> integer; main := fun (in x : integer) ret integer { return f(x); } endmodule TK_KEY_MODULE, TK_IDENT (main), TK_KEY_IMPORT, TK_IDENT (math), TK_PERIOD, TK_STAR, TK_SEMI, TK_KEY_VAR, TK_IDENT (main), TK_COLON, TK_KEY_INTEGER, TK_LONGARROW, TK_KEY_INTEGER, TK_SEMI, TK_IDENT (main), TK_ASSIGN, TK_KEY_FUN, TK_LPAREN, TK_KEY_IN, TK_IDENT (x), TK_COLON, TK_KEY_INTEGER, TK_RPAREN, TK_KEY_RET, TK_KEY_INTEGER, TK_LBRAC, TK_KEY_RETURN, TK_IDENT (f), TK_LPAREN, TK_IDENT (x), TK_RPAREN, TK_SEMI, TK_RBRAC, TK_KEY_ENDMOD

  6. Lexical Analysis • Special Cases (The C Prog. Lang.): • Uses a pre-processor (or macro-processor) before lexical analysis • Processes instructions of the form • #define max 1000 • #include <stdio.h> • #define f(x) (x x) • #ifdef (x)

  7. Lexical Analysis • Token definitions • Most tokens are fixed text strings - specified by singleton regular languages: • Examples: “(” , “->”, “>>” • Special mentions : identifiers, literals (numbers, strings, characters), comments • Identifiers: ALP (ALP | DIG )* • ALP is an alphabetic character; DIG is a digit character • Some languages may allow some special chars. • C allows _ • Scheme allows almost any printable char.

  8. Lexical Analysis - Implementation • Token Definitions: • (Reserved) Keywords: • Specific form of identifiers - reserved in some languages (e.g. C , Java) but not in some (Scheme, FORTRAN IV) • Numbers: DIG* (“.” DIG+)? • Exercise: Regular expression for C-style comments!

  9. Lexical Analysis • Implementation (from scratch) • Construct a finite automaton (first an NFA, then convert into DFA). • Use a loop and switch (on char.) to model the DFA’s transitions. • Note that a state in the DFA is the accumulated string and a final state identifies a token type as well. • Each token is a reg. expr - hence has an equi. DFA; • Scanner is the mega DFA which recognizes the union of all the token languages.

  10. Lexical Analysis - Implementation • Implementation of FA: • Avoid “Goto” statements • “Goto” Statements are harmful: • Reduced readability  Reduced Maintainability • Refer to Dijkstra’s article and Knuth’s article • Efficiency issues in modern platforms • Pipeline interruptions • Instruction Pre-fetch / Cache interruption • Page Faults • Common principle: Violation of locality of reference (of instructions) • Food for thought: Analogous violation of locality of reference (of data)?

  11. Lexical Analysis - Implementation • Look-ahead • One character look-ahead is often enough • E.g. > and >= in C • Multi-character look-ahead required in some cases: • E.g. Distinguishing a (unreserved) keyword and identifier • Question: How many look-ahead chars. needed to scan Java expressions? Consider (>, >=, >>, >>=, >>>, >>>=) • Look-ahead strategy: • Most common look-ahead cases are one char. look-ahead cases  • Just use a single character look-ahead in implementation • Special-case multi-character look-ahead

  12. Lexical Analysis - Implementation • Use Buffered I/O • Reduced I/O time due to amortized latency • Scanner scans the buffer • How to handle end of buffer? • Partial token at end of buffer. • Look-ahead at end of buffer. • End of buffer may contain incomplete token and reading in a new buffer may block the scanner: • Use twin buffers as a circular queue. • Read Reference book (Aho, Sethi, Ullman) for details on Buffering Schemes

  13. Lexical Analysis • Implementation (using tools/libraries) • E.g.1: Lex - a lexical analyzer generator • Given a set of tokens (as regular expressions), generates a scanner recognizing the tokens (as a C program, say) • Read Reference book (Aho, Sethi, Ullman) for buffering details. • E.g. 2: Use a Tokenizer • Primitive scanning - serves the purpose in limited implementations. • Read Java API (java.io.StreamTokenzier; java.util.StringTokenizer)

  14. Lexical Analysis • Design Specification: Module (class in Java) with the interfaces (public methods in Java) • Token nextToken(); // returns the next token • boolean hasMoreTokens();

More Related