1 / 40

CS 153: Concepts of Compiler Design October 22 Class Meeting

CS 153: Concepts of Compiler Design October 22 Class Meeting. Department of Computer Science San Jose State University Fall 2014 Instructor: Ron Mak www.cs.sjsu.edu/~mak. Backus Naur Form (BNF). A text-based way to describe source language syntax.

Download Presentation

CS 153: Concepts of Compiler Design October 22 Class Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 153: Concepts of Compiler DesignOctober 22 Class Meeting Department of Computer ScienceSan Jose State UniversityFall 2014Instructor: Ron Mak www.cs.sjsu.edu/~mak

  2. Backus Naur Form (BNF) • A text-based way to describe source language syntax. • Named after John Backus and Peter Naur. • Text-based means it can be read by a program. • Example: A compiler-compilerthat can automatically generate a parser for a source language after reading (and parsing) the language’s syntax rules written in BNF.

  3. Backus Naur Form (BNF), cont’d • Uses certain meta-symbols. • Symbols that are part of BNF itself but are not necessarily part of the syntax of the source language.

  4. BNF Example: U.S. Postal Address <postal-address> ::= <name-part> <street-part> <city-state-part> <name-part> ::= <first-part> <last-name> | <first-part> <last-name> <suffix> <first-part> ::= <first-name> | <capital-letter> . <suffix> ::= Sr. | Jr. | <roman-numeral> <street-part> ::= <house-number> <street-name> | <house-number> <street-name> <apartment-number> <city-state-part> ::= <city-name> , <state-code> <ZIP-code> <first-name> ::= <name> <last-name> ::= <name> <street-name> ::= <name> <city-name> ::= <name> <house-number> ::= <number> <apartment-number> ::= <number> <state-code> ::= <capital-letter> <capital-letter> <capital-letter> ::= A|B|C|D|E|F|G|H|I|J|K|L|M |N|O|P|Q|R|S|T|U|V|W|X|Y|Z <name> ::= … <number> ::= … etc. Mary Jane 123 Easy StreetSan Jose, CA 95192

  5. BNF: Optional and Repeated Items • To show optional items in BNF, use the vertical bar |. • Example: “An expression is a simple expression optionally followed by a relational operator and another simple expression.” • <expression> ::= <simple expression> | <simple expression> <rel op> <simple expression>

  6. BNF: Optional and Repeated Items • BNF uses recursion for repeated items. • Example: “A digit sequence is a digit followed by zero or more digits.” • <digit sequence> ::= <digit> | <digit> <digit sequence> Right recursive • <digit sequence> ::= <digit> | <digit sequence> <digit> Left recursive

  7. BNF Example: Pascal Number <digit sequence> ::= <digit> | <digit> <digit sequence> <unsigned integer> ::= <digit sequence> <unsigned real> ::= <unsigned integer>.<digit sequence> | <unsigned integer>.<digit sequence> <e> <scale factor> | <unsigned integer > <e> <scale factor> <scale factor> ::= <unsigned integer> | <sign> <unsigned integer> <e> ::= E | e <sign> ::= + | - <number> ::= <unsigned integer> | <unsigned real> Repetition via recursion. The sign is optional.

  8. BNF Example: Pascal IF Statement • It should be straightforward to write a parsing method from either the syntax diagram or the BNF. <if statement> ::= IF <expression> THEN <statement> | IF <expression> THEN <statement> ELSE <statement>

  9. Grammars and Languages • A grammar defines a language. • Grammar =the set of all the BNF rules (or syntax diagrams) • Language = the set of all the legal strings of tokens according to the grammar • Legal string of tokens = a syntactically correct statement

  10. Grammars and Languages • A statement is in the language (it’s syntactically correct) if it can be derived by the grammar. • Each grammar rule “produces”a token string in the language. • The sequence of productions required to arrive at a syntactically correct token string is the derivation of the string._

  11. Grammars and Languages, cont’d • Example: A very simplified expression grammar: • What strings (expressions) can we derive from this grammar? <expr> ::= <digit> | <expr> <op> <expr> | ( <expr> ) <digit> ::= 0|1|2|3|4|5|6|7|8|9 <op> ::= + | *

  12. Derivations and Productions • Is (1 + 2)*3 valid in our expression language? PRODUCTION GRAMMAR RULE <expr> <expr> <op> <expr> <expr> ::= <expr> <op> <expr>  <expr> <op> <digit> <expr> ::= <digit>  <expr> <op> 3 <digit> ::= 3  <expr>*3 <op> ::= *  ( <expr> )*3 <expr> ::= ( <expr> )  ( <expr> <op> <expr> )*3 <expr> ::= <expr> <op> <expr>  ( <expr> <op> <digit> )*3 <expr> ::= <digit>  ( <exp> <op> 2 )*3 <digit> ::= 2  ( <expr> + 2 )*3 <op> ::= +  ( <digit> + 2 )*3 <expr> ::= <digit>  ( 1 + 2 )*3 <digit> ::= 1 <expr> ::= <digit> | <expr> <op> <expr> | ( <expr> )<digit> ::= 0|1|2|3|4|5|6|7|8|9<op> ::= + | * • Yes! The expression is valid.

  13. Extended BNF (EBNF) • Extended BNF (EBNF) adds meta-symbols { } and [ ] • Originally developed by Niklaus Wirth. • Inventor of Pascal. • Early user of syntax diagrams.

  14. Extended BNF (EBNF) • Repetition (one or more): • BNF: • EBNF: • <digit sequence> ::= <digit> | <digit> <digit sequence> • <digit sequence> ::= <digit> { <digit> }

  15. Extended BNF, cont’d • Optional items. • BNF: • EBNF: • <if statement> ::= IF <expression> THEN <statement> | IF <expression> THEN <statement> ELSE <statement> • <if statement> ::= IF <expression> THEN <statement> [ ELSE <statement> ]

  16. Extended BNF, cont’d • Optional items. • BNF: • EBNF: • <expression> ::= <simple expression> | <simple expression> <rel op> <simple expression> • <expression> ::= <simple expression> [ <rel op> <simple expression> ]

  17. Compiler-Compilers • Professional compiler writers generally do not write scanners and parsers from scratch. • A compiler-compiler is a tool for writing compilers. • It can include: • A scanner generator • A parser generator • Parse tree utilities

  18. Compiler-Compilers, cont’d • Feed a compiler-compiler a grammar written in a textual form such as BNF or EBNF. • The compiler-compiler outputs code written in a high-level language that implements a scanner, parser, and a parse tree._

  19. Popular Compiler-Compilers • Yacc • “Yet another compiler-compiler” • Generates a bottom-up parser written in C. • GNU version: Bison • Lex • Generates a scanner written in C. • GNU version: Flex • JavaCC • Generates a scanner and a top-down parser written in Java. • JJTree • Preprocessor for JavaCC grammars. • Generates Java code to build and process parse trees. The code generated by a compiler-compiler can be in a high level language such as C or Java. However, you may find the code to be ugly and hard to read.

  20. JavaCC Compiler-Compiler • Feed JavaCC the grammar for a source language and it will automatically generate a scanner and a parser. • Specify the source language tokens with regular expressions  JavaCC generates a scanner for the source language. • Specify the source language syntax rules with Extended BNF  JavaCC generates a parser for the source language.

  21. JavaCC Compiler-Compiler, cont’d • The generated scanner and parser are written in Java. • Note: JavaCC calls the scanner the “tokenizer”._

  22. Download and Install • Download and install JavaCCand JJTree • http://javacc.java.net/ • Download the examples from the book Generating Parsers with JavaCC • http://generatingparserswithjavacc.com/examples.html

  23. JavaCC Regular Expressions • Literals • <HELLO : "hello"> • Character classes • <SPACE_OR_COMMA : [" ", ","]> • Character ranges • <LOWER_CASE : ["a"-"z"]> • Alternates • <UPPER_OR_LOWER : ["A"-"Z"] | ["a"-"z"]> Token name Token string

  24. JavaCC Regular Expressions • Negation • <NO_DIGITS : ~["0"-"9"]> • Repetition • <THREE_A : ("a"){3}> • <TWO_TO_FOUR_A : ("a"){2,4}> • Quantifiers • <ONE_OR_MORE_A : ("a")+> • <ZERO_OR_ONE_SIGN : (["+", "-"])?> • <ZERO_OR_MORE_DIGITS : (["0"-"9"])*>

  25. Example: helloworld1.jj • Recognize tokens “hello” and “world”. options { BUILD_PARSER=false; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) { java.io.StringReadersr = new java.io.StringReader(args[0]); SimpleCharStreamscs = new SimpleCharStream(sr); HelloWorldTokenManagermgr = new HelloWorldTokenManager(scs); for (Token t = mgr.getNextToken(); t.kind != EOF; t = mgr.getNextToken()) { debugStream.println("Found token: " + t.image); } } } TOKEN : { <HELLO : "hello"> | <WORLD : "world"> } Main Java methodof the tokenizer Literals Demo

  26. Example: helloworld2.jj options { BUILD_PARSER=false; IGNORE_CASE=true; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) { java.io.StringReadersr = new java.io.StringReader(args[0]); SimpleCharStreamscs = new SimpleCharStream(sr); HelloWorldTokenManagermgr = new HelloWorldTokenManager(scs); while (mgr.getNextToken().kind != EOF) {} } } SKIP : { <IGNORE : [" ", ","]> } TOKEN : { <HELLO : "hello"> { debugStream.println("HELLO token: " + matchedToken.image); } | <WORLD : "world"> { debugStream.println("WORLD token: " + matchedToken.image); } } • Recognize tokens “hello” and “world” • Ignore case. • Ignore spaces and commas between tokens. • Use lexical actions. Character class Demo Lexical actions

  27. Example: helloworld3.jj • Recognize words other than “hello” and “world” as identifiers. options { BUILD_PARSER=false; IGNORE_CASE=true; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) { ... } } SKIP : { <IGNORE : [" ", ","]> } TOKEN : { <HELLO : "hello"> { debugStream.println("HELLO token: " + matchedToken.image); } | <WORLD : "world"> { debugStream.println("WORLD token: " + matchedToken.image); } | <IDENTIFIER : ["a"-"z"](["a"-"z", "0"-"9", "_"])*> { debugStream.println("IDENTIFIER token: " + matchedToken.image); } } Why aren’t “hello” and “world” recognized as identifiers? Demo

  28. Example: helloworld4.jj options { BUILD_PARSER=false; IGNORE_CASE=true; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) { ... } } SKIP : { <IGNORE : [" ", ","]> } TOKEN : { <HELLO : "hello"> { debugStream.println("HELLO token: " + matchedToken.image); } | <WORLD : "world"> { debugStream.println("WORLD token: " + matchedToken.image); } | <IDENTIFIER : <LETTER> (<LETTER> | <DIGIT> | "_")*> { debugStream.println("IDENTIFIER token: " + matchedToken.image); } | <#DIGIT : ["0"-"9"]> | <#LETTER : ["a"-"z"]> } • Use private tokensDIGIT and LETTER. Demo Private tokens

  29. Example: helloworld5.jj • What’s the tokenizer doing? • Turn on debugging output. options { BUILD_PARSER=false; IGNORE_CASE=true; DEBUG_TOKEN_MANAGER=true; } PARSER_BEGIN(HelloWorld) public class HelloWorld {} PARSER_END(HelloWorld) TOKEN_MGR_DECLS : { public static void main(String[] args) { ... } } SKIP : { <IGNORE : [" ", ","]> } TOKEN : { <HELLO : "hello"> { debugStream.println("HELLO token: " + matchedToken.image); } | <WORLD : "world"> { debugStream.println("WORLD token: " + matchedToken.image); } | <IDENTIFIER : <LETTER> (<LETTER> | <DIGIT> | "_")*> { debugStream.println("IDENTIFIER token: " + matchedToken.image); } | <#DIGIT : ["0"-"9"]> | <#LETTER : ["a"-"z"]> } Demo

  30. The JavaCC Eclipse Plug-in • How to install the plug-in: • Start Eclipse. • Help  Install New Software ... • Click the Add button • Name: SF Eclipse JavaCCLocation: http://eclipse-javacc.sourceforge.net/ • Click the OK button. • Select “SF JavaCC Eclipse Plug-in” • Or, visithttp://sourceforge.net/projects/eclipse-javacc/

  31. letter [other] letter 0 1 digit - digit + E digit digit digit digit . + [other] digit digit E 4 6 7 9 10 11 3 5 8 12 2 - digit [other] [other] Review: DFA for a Pascal Identifier or Number private static final intmatrix[][] = { /* letter digit + - . E other */ /* 0 */{ 1, 4, 3, 3, ERR, 1, ERR}, /* 1 */{ 1, 1, -2, -2, -2, 1, -2 }, /* 2 */{ERR, ERR, ERR, ERR, ERR, ERR, ERR}, /* 3 */{ERR, 4, ERR, ERR, ERR, ERR, ERR}, /* 4 */{ -5, 4, -5, -5, 6, 9, -5 }, /* 5 */{ERR, ERR, ERR, ERR, ERR, ERR, ERR}, /* 6 */{ERR, 7, ERR, ERR, ERR, ERR, ERR}, /* 7 */{ -8, 7, -8, -8, -8, 9, -8 }, /* 8 */{ERR, ERR, ERR, ERR, ERR, ERR, ERR}, /* 9 */{ERR, 11, 10, 10, ERR, ERR, ERR}, /* 10 */ {ERR, 11, ERR, ERR, ERR, ERR, ERR}, /* 11 */{ -12, 11, -12, -12, -12, -12, -12 }, /* 12 */{ERR, ERR, ERR, ERR, ERR, ERR, ERR}, }; Negative numbers in the matrix are accepting states.

  32. The JavaCC Tokenizer • The JavaCCtokenizer (scanner) is a DFA created from the regular expressions. • The generated Java code implements state-transitions with switch statements instead of a matrix._

  33. The JavaCCTokenizer, cont’d • Example (from HelloWorldTokenManager.java): static private intjjMoveStringLiteralDfa0_0() { switch(curChar) { case 72: case 104: return jjMoveStringLiteralDfa1_0(0x4L); case 87: case 119: return jjMoveStringLiteralDfa1_0(0x8L); default : return jjMoveNfa_0(0, 0); } }

  34. Assignment #5 • Use JavaCC to create a simple Java tokenizer(i.e., scanner). • Write the .jj file. • Due Friday, October 31.

  35. Compiler Team Project • Write a compiler for a procedure-oriented source language that will generate code for the Java virtual machine (JVM). • The source language should be a procedural, non- object-oriented language or subset thereof, or a language that the team invents. • Tip: Start with a simple language! • No Scheme, Lisp, or Lisp-like languages.__

  36. Compiler Team Project • The object language must be Jasmin, the assembly language for the Java virtual machine. • You will be provided an assembler that translates Jasmin assembly language programs into .class files for the JVM. • You must use the JavaCCcompiler-compiler. • You can also use any Java code from the Writing Compilers and Interpreters book. • Compile and run source programs written in your language.

  37. Compiler Team Project Deliverables • Java and JavaCCsource files of a working compiler. • The JavaCC.jj (or .jjt) grammar files. • Do not include the Java files that JavaCC generated.

  38. Compiler Team Project Deliverables, cont’d • Written report (5-10 pp. single spaced) • Include: Syntax diagrams for key source language constructs. • Include: Code diagrams for key source language constructs. • Optional: UML diagramsfor key compiler components.

  39. Compiler Team Project Deliverables, cont’d • Instructions on how to build your compiler. • If it’s not standard or not obvious. • Instructions on how to run your compiler (scripts OK). • Sample source programs to compile and execute.

  40. Post Mortem Report • Private individual post mortem report(up to 1 page from each student) • What did you learn from this course? • An assessment of your accomplishments for your project. • An assessment of each of your project team members.

More Related