1 / 32

Compiler Structures

Compiler Structures. 241-437 , Semester 1 , 2011-2012. Objective s describe lex give many examples of lex's use. 3. Lex. Overview. 1. What is lex (and flex)? 2. Lex Program Format 3. Removing Whitespace (white.l) 4. Printing Line Numbers (linenos.l) 5. Counting (counter.l)

faunia
Download Presentation

Compiler Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiler Structures 241-437, Semester 1, 2011-2012 • Objectives • describe lex • give many examples of lex's use 3. Lex

  2. Overview 1. What is lex (and flex)? 2. Lex Program Format 3. Removing Whitespace (white.l) 4. Printing Line Numbers (linenos.l) 5. Counting (counter.l) 6. Counting IDs (ids.l) 7. Matching Rules 8. More Information.

  3. 1. What is lex (and flex)? • lex is a lexical analyzer generator • flex is a fast version of lex, which we'll be using • lex translates REs into C code • The generated code is easy to integrate into C compilers (and other applications).

  4. Uses for Lex • Convert input from one form to another. • Extract information from text files. • Extract tokens for a syntax analyzer.

  5. Using Lex lexsourceprogramlex.l lex (flex) lex.yy.c Ccompiler lex.yy.c a.out a.out inputstream of chars sequenceof tokens

  6. Running Flex • With UNIX: > flex foo.l > gcc –Wall -o foo lex.yy.c > ./foo < inputfile.txt • You may need to include –ll (-lfl) in the gcc call. • it links in the lex library • You may get "warning" messages from gcc.

  7. How Lex Works • The lex-generated program (e.g. foo) will read characters from stdin, trying to match against a character sequence using its REs. • Once it matches a sequence, it reads in more characters for the next RE match.

  8. 2. Lex Program Format • A lex program has three sections: REs and/or C code%%RE/action rules%%C functions

  9. A Lex Program 1) C Code, REs %{ int charCount=0, wordCount=0, lineCount=0; %} word [^ \t\n]* %% {word} {wordCount++; charCount += yyleng; } [\n] {charCount++; lineCount++;} . {charCount++;} %% int main(void) { yylex(); printf(“Chars %d, Words: %d, Lines: %d\n”, charCount, wordCount, lineCount); return 0; } 2) RE/Action rules 3) C functions

  10. Section 1: Defining a RE • Format: name RE • Examples: digit [0-9] letter [A-Za-z] id {letter} ({letter}|{digit})* word [^ \t\n]*

  11. Regular Expressions in Lex x match the char x\. match the char ."string" match contents of string of chars. match any char except \n^ match beginning of a line$ match the end of a line[xyz] match one char x, y, or z[^xyz] match any char except x, y, and z[a-z] match one of a to z

  12. r* closure (match 0 or more r's)r+ positive closure (match 1 or more r's)r? optional (match 0 or 1 r)r1 r2match r1 then r2 (concatenation)r1 | r2match r1 or r2 (union)( r ) groupingr1 \ r2match r1 when followed by r2{ name } match the RE defined by name

  13. Example REs (Again) [0-9] A single digit. [0-9]+ An integer. [0-9]+ (\.[0-9]+)? An integer or floating point number. [+-]? [0-9]+ (\.[0-9]+)? ([eE][+-]?[0-9]+)? Integer, floating point, or scientific notation.

  14. Section 2: RE/Action Rule • A rule has the form:name { action } • the name must be defined in section 1 • the action is any C code • If the named RE matches an input character sequence, then the C code is executed.

  15. Section 3: C Functions • Added to the lexical analyzer • Depending on the lex/flex version, you may need to add the function: int yywrap(void){ return 1; } • it returns 1 to signal that the end of the input file means that the lexer can terminate

  16. 3. Removing Whitespace (white.l) whitespace [ \t\n] %% {whitespace} ; . { ECHO; } %% int yywrap(void) { return 1; } int main(void) { yylex(); // the lexical analyzer return 0; } name empty action RE ECHO macro

  17. Usage flex output file > flex white.l > gcc -Wall -o white lex.yy.c > ./white < white.l /*white.l*//*AndrewDavison,May... >

  18. 4. Printing Linenos (linenos.l) %{ int lineno = 1; %} %% ^(.*)\n { printf("%4d\t%s", lineno, yytext); lineno++; } %% int yywrap(void) { return 1; } continued

  19. int main(int argc, char *argv[]) { if (argc > 1) { FILE *file = fopen(argv[1], "r"); if (file == NULL) { printf("Error opening %s\n", argv[1]); exit(1); } yyin= file; } yylex(); fclose(yyin); return 0; }

  20. Built-in Variables • yytext holds the matched string. • yyin is the input stream. • yyleng holds the length of the string. • There are several other built-in variables in lex.

  21. Usage > flex linenos.l > gcc -Wall -o linenos lex.yy.c > ./linenos textfile.txt > ./linenos < textfile.txt

  22. ./linenos < linenos.l 1 2 /* linenos.l */ 3 /* Andrew Davison, March 2005 */ 4 5 %{ 6 int lineno = 1; 7 %} 8 9 %% : :

  23. 5. Counting (counter.l) %{ int charCount = 0, wordCount = 0, lineCount = 0; %} word [^ \t\n]* %% {word} { wordCount++; charCount += yyleng; } \n { charCount++; lineCount++; } . { charCount++; } %% int yywrap(void) { return 1; } continued

  24. int main(void) { yylex(); printf("Characters %d, Words: %d, Lines: %d\n", charCount, wordCount, lineCount); return 0; }

  25. Usage > flex counter.l > gcc -Wall -o counter lex.yy.c > ./counter < counter.l Characters 496, Words: 78, Lines: 29

  26. 6. Counting IDs (ids.l) %{ int count = 0; %} digit [0-9] letter [A-Za-z] id {letter}({letter}|{digit})* %% {id} { count++; } . ; /* ignore other things */ \n ; %% continued

  27. int yywrap(void) { return 1; } int main() { yylex(); printf("No. of Idents: %d\n", count); return 0; }

  28. Usage > flex ids.l > gcc -Wall -o ids lex.yy.c > ./ids < test1.txt No. of Idents: 6 > l test1.txt this is a test 177 23 bing2 *((() this5 >

  29. 7. Matching Rules • A rule is chosen that matches the biggest amount of input. beg {…} begin {…} Both rules can match the input string "beginning", but the second rule is chosen because it matches more. continued

  30. If two rules can match the same amount of input, then the first rule is used. begin {… } [a-z]+ {…} Both rules can match the input string "begin", so the first rule is chosen

  31. 8. More Information in our library • Lex and Yaccby Levine, Mason, and BrownO'Reilly; 2nd edition • On UNIX: • man lex • info lex continued

  32. A Compact Guide to Lex & Yaccby Tom Niemannhttp://epaperpress.com/lexandyacc/ • with several calculator examples, which I'll be discussing when we get to yacc • it's also on the course website in the "Niemann Tutorial" subdirectory of "Useful Info" • http://fivedots.coe.psu.ac.th/ Software.coe/Compilers/

More Related