Using JavaCC - PowerPoint PPT Presentation

Using javacc
1 / 22

  • Uploaded on
  • Presentation posted in: General

Using JavaCC. CMSC 431. String stream. Scanner generator. Java scanner program. NFA. RE. DFA. Minimize DFA. Simulate DFA. Automating Lexical Analysis Overall picture. Tokens. Building Faster Scanners from the DFA. Table-driven recognizers waste a lot of effort

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Using JavaCC

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Using javacc

Using JavaCC

CMSC 431

Automating lexical analysis overall picture

String stream

Scanner generator

Java scanner program




Minimize DFA

Simulate DFA

Automating Lexical Analysis Overall picture


Building faster scanners from the dfa

Building Faster Scanners from the DFA

Table-driven recognizers waste a lot of effort

  • Read (& classify) the next character

  • Find the next state

  • Assign to the state variable

  • Branch back to the top

    We can do better

  • Encode state & actions in the code

  • Do transition tests locally

  • Generate ugly, spaghetti-like code

    (it is OK, this is automatically generated code)

  • Takes (many) fewer operations per input character

state = s0 ;

string = ;

char = get_next_char();

while (char != eof) {

state = (state,char);

string = string + char;

char = get_next_char();


if (state in Final) then

report acceptance;


report failure;

Inside lexical analyzer generator

Inside lexical analyzer generator

  • How does a lexical analyzer work?

    • Get input from user who defines tokens in the form that is equivalent to regular grammar

    • Turn the regular grammar into a NFA

    • Convert the NFA into DFA

    • Generate the code that simulates the DFA

Flow for using javacc

Flow for Using JavaCC

Extracted from

Structure of a javacc file

Focus of this Lecture

Focus of this Lecture

Structure of a JavaCC File

  • A JavaCC file is composed of 3 portions:

    • Options

    • Class declaration

    • Specification for lexical analysis (tokens), and specification for syntax analysis.

  • For the very first example of JavaCC, let's recognize two tokens: ``+'', and numerals.

  • Use an editor to edit and save it with file name numeral.jj

Using javacc for lexical analysis

Using javaCC for lexical analysis

  • javacc is a “top-down” parser generator.

  • Some parser generators (such as yacc , bison, and JavaCUP) need a separate lexical-analyzer generator.

  • With javaCC, you can specify the tokens within the parser generator.

Example file

Example File

/* main class definition */


public class Numeral{

public static void main(String[] args)

throws ParseException, TokenMgrError {

Numeral numeral = new Numeral(;

while (numeral.getNextToken().kind!=EOF);




/* token definitions */



<ADD: "+">

| <NUMERAL: (["0"-"9"])+>




  • The options portion is optional and is omitted in the previous example.

  • STATIC is a boolean option whose default value is true. If true, all methods and class variables are specified as static in the generated parser and token manager.

    • This allows only one parser object to be present, but it improves the performance of the parser.

    • To perform multiple parses during one run of your Java program, you will have to call the ReInit() method to reinitialize your parser if it is static.

    • If the parser is non-static, you may use the "new" operator to construct as many parsers as you wish. These can all be used simultaneously from different threads.


Simple Loop

Getting Tokens


/* main class definition */


public class Numeral{

public static void main(String[] args)

throws ParseException, TokenMgrError {

Numeral numeral = new Numeral(;

while (numeral.getNextToken().kind!=EOF);




/* token definitions */



<ADD: "+">

| <NUMERAL: (["0"-"9"])+>



After calling javacc to compile numeral.jj, eight files are generated if no error messages occur.

They are,,,,,, and

bash-2.05$ javacc numeral.jj

Java Compiler Compiler Version 3.2 (Parser Generator)

(type "javacc" with no arguments for help)

Reading from file numeral.jj . . .

File "" does not exist. Will create one.

File "" does not exist. Will create one.

File "" does not exist. Will create one.

File "" does not exist. Will create one.

Parser generated successfully


Javacc specification of a lexer

javaCC specification of a lexer

Note the need

for ( )!



A full example

A Full Example

See the sample file

Dealing with errors

Dealing with errors

  • Error reporting: 123e+q

  • Could consider it an invalid token (lexical error) or

  • return a sequence of valid tokens

    • 123, e, +, q,

    • and let the parser deal with the error.

Lexical error correction

Lexical error correction?

  • Sometimes interaction between the Scanner and parser can help

    • especially in a top-down (predictive) parse

    • The parser, when it calls the scanner, can pass as an argument the set of allowable tokens.

    • Suppose the Scanner sees calss in a context where only a top-level definition is allowed.

Same symbol different meaning

Same symbol, different meaning.

  • How can the scanner distinguish between binary minus and unary minus?

    • x = -a; vsx = 3 – a;

Scanner troublemakers

Scanner “troublemakers”

  • Unclosed strings

  • Unclosed comments.

Javacc as a parsing tool

JavaCC as a Parsing Tool

Javacc overview

Javacc Overview

  • Generates a top down parser.

    • Could be used for generating a Prolog parser which is in LL.

  • Generates a parser in Java.

    • Hence can be integrated with any Java based Prolog compiler/interpreter to continue our example.

  • Token specification and grammar specification structures are in the same file => easier to debug.

Types of productions in javacc

Types of Productions in Javacc

There can be four different kinds of Productions.

  • Javacode

    • For something that is not context free or is difficult to write a grammar for.

      eg) recognizing matching braces and error processing.

  • Regular Expressions

    • Used to describe the tokens (terminals) of the grammar.

  • BNF

    • Standard way of specifying the productions of the grammar.

  • Token Manager Declarations

    • The declarations and statements are written into the generated Token Manager (lexer) and are accessible from within lexical actions.

Javacc look ahead mechanism

Javacc Look-ahead mechanism

  • Exploration of tokens further ahead in the input stream.

  • Backtracking is unacceptable due to performance hit.

  • By default Javacc has 1 token look-ahead. Could specify any number for look-ahead.

  • Two types of look-ahead mechanisms

    • Syntactic

      A particular token is looked ahead in the input stream.

    • Semantic

      Any arbitrary Boolean expression can be specified as a look-ahead parameter.

      eg) A -> aBc and B -> b ( c )? Valid strings: “abc” and “abcc”



  • Compilers Principles, Techniques and Tools, Aho, Sethi, and Ullman





  • Login