1 / 28

Flex Tutorial - Overview of Scanner Generators

Learn how to use Flex and Lex to write programs that use regular expressions to control the flow of input. Generate C code for a scanner function.

swargo
Download Presentation

Flex Tutorial - Overview of Scanner Generators

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A brief [f]lex tutorial Saumya Debray The University of Arizona Tucson, AZ 85721

  2. flex (and lex): Overview Scanner generators: • Helps write programs whose control flow is directed by instances of regular expressions in the input stream. • Output: C code implementing a scanner: • function: yylex() • file: lex.yy.c Input: a set of regular expressions + actions flex (or lex) A quick tutorial on Lex

  3. Using flex file: lex.yy.c lex input spec (regexps + actions) yylex() { … } flex compiler user supplies driver code main() {…} or parser() {…} A quick tutorial on Lex

  4. flex: input format An input file has the following structure: definitions %% rules %% user code required optional Shortest possible legal flex input: %% A quick tutorial on Lex

  5. Definitions • A series of: • name definitions, each of the form namedefinition e.g.: DIGIT [0-9] CommentStart "/*" ID [a-zA-Z][a-zA-Z0-9]* • start conditions • stuff to be copied verbatim into the flex output (e.g., declarations, #includes): • enclosed in %{ … }%, or • indented A quick tutorial on Lex

  6. Rules • The rules portion of the input contains a sequence of rules. • Each rule has the form patternaction where: • pattern describes a pattern to be matched on the input • pattern must be un-indented • action must begin on the same line. A quick tutorial on Lex

  7. Patterns • Essentially, extended regular expressions. • Syntax: similar to grep (see man page) • <<EOF>> to match “end of file” • Character classes: • [:alpha:], [:digit:], [:alnum:], [:space:], etc. (see man page) • {name} where name was defined earlier. • “start conditions” can be used to specify that a pattern match only in specific situations. A quick tutorial on Lex

  8. Example %{ #include <stdio.h> #include <stdlib.h> %} dgt [0-9] %% {dgt}+ return atoi(yytext); %% void main() { int val, total = 0, n = 0; while ( (val = yylex()) > 0 ) { total += val; n++; } if (n > 0) printf(“ave = %d\n”, total/n); } A flex program to read a file of (positive) integers and compute the average: A quick tutorial on Lex

  9. Example %{ #include <stdio.h> #include <stdlib.h> %} dgt [0-9] %% {dgt}+ return atoi(yytext); %% void main() { int val, total = 0, n = 0; while ( (val = yylex()) > 0 ) { total += val; n++; } if (n > 0) printf(“ave = %d\n”, total/n); } Definition for a digit (could have used builtin definition [:digit:] instead) definitions Rule to match a number and return its value to the calling routine rules A flex program to read a file of (positive) integers and compute the average: Driver code (could instead have been in a separate file) user code A quick tutorial on Lex

  10. Example %{ #include <stdio.h> #include <stdlib.h> %} dgt [0-9] %% {dgt}+ return atoi(yytext); %% void main() { int val, total = 0, n = 0; while ( (val = yylex()) > 0 ) { total += val; n++; } if (n > 0) printf(“ave = %d\n”, total/n); } defining and using a name definitions rules A flex program to read a file of (positive) integers and compute the average: user code A quick tutorial on Lex

  11. Example %{ #include <stdio.h> #include <stdlib.h> %} dgt [0-9] %% {dgt}+ return atoi(yytext); %% void main() { int val, total = 0, n = 0; while ( (val = yylex()) > 0 ) { total += val; n++; } if (n > 0) printf(“ave = %d\n”, total/n); } defining and using a name definitions • char * yytext; • a buffer that holds the input characters that actually match the pattern rules A flex program to read a file of (positive) integers and compute the average: user code A quick tutorial on Lex

  12. Example %{ #include <stdio.h> #include <stdlib.h> %} dgt [0-9] %% {dgt}+ return atoi(yytext); %% void main() { int val, total = 0, n = 0; while ( (val = yylex()) > 0 ) { total += val; n++; } if (n > 0) printf(“ave = %d\n”, total/n); } defining and using a name definitions • char * yytext; • a buffer that holds the input characters that actually match the pattern rules A flex program to read a file of (positive) integers and compute the average: • Invoking the scanner: yylex() • Each time yylex() is called, the scanner continues processing the input from where it last left off. • Returns 0 on end-of-file. user code A quick tutorial on Lex

  13. Avoiding compiler warnings • If compiled using “gcc –Wall” the previous flex file will generate compiler warnings: lex.yy.c: … : warning: `yyunput’ defined but not used lex.yy.c: … : warning: `input’ defined but not used • These can be removed using ‘%option’ declarations in the first part of the flex input file: %option nounput %option noinput A quick tutorial on Lex

  14. Matching the Input • When more than one pattern can match the input, the scanner behaves as follows: • the longest match is chosen; • if multiple rules match, the rule listed first in the flex input file is chosen; • if no rule matches, the default is to copy the next character to stdout. • The text that matched (the “token”) is copied to a buffer yytext. A quick tutorial on Lex

  15. Matching the Input (cont’d) Pattern to match C-style comments: /* … */ "/*"(.|\n)*"*/" Input: #include <stdio.h> /* definitions */ int main(int argc, char * argv[ ]) { if (argc <= 1) { printf(“Error!\n”); /* no arguments */ } printf(“%d args given\n”, argc); return 0; } A quick tutorial on Lex

  16. Matching the Input (cont’d) Pattern to match C-style comments: /* … */ "/*"(.|\n)*"*/" Input: #include <stdio.h> /* definitions */ int main(int argc, char * argv[ ]) { if (argc <= 1) { printf(“Error!\n”); /* no arguments */ } printf(“%d args given\n”, argc); return 0; } longest match: A quick tutorial on Lex

  17. Matching the Input (cont’d) Pattern to match C-style comments: /* … */ "/*"(.|\n)*"*/" Input: #include <stdio.h> /* definitions */ int main(int argc, char * argv[ ]) { if (argc <= 1) { printf(“Error!\n”); /* no arguments */ } printf(“%d args given\n”, argc); return 0; } longest match: Matched text shown in blue A quick tutorial on Lex

  18. Start Conditions • Used to activate rules conditionally. • Any rule prefixed with <S> will be activated only when the scanner is in start condition S. • Declaring a start condition S: • in the definition section: %x S • “%x” specifies “exclusive start conditions” • flex also supports “inclusive start conditions” (“%s”), see man pages. • Putting the scanner into start condition S: • action: BEGIN(S) A quick tutorial on Lex

  19. Start Conditions (cont’d) • Example: • <STRING>[^"]* { …match string body… } • [^"] matches any character other than " • The rule is activated only if the scanner is in the start condition STRING. • INITIAL refers to the original state where no start conditions are active. • <*> matches all start conditions. A quick tutorial on Lex

  20. Using Start Conditions • Start conditions let us explicitly simulate finite state machines. • This lets us get around the “longest match” problem for C-style comments. FSA for C comments: flex input: %x S1, S2, S3 %% "/" BEGIN(S1); <S1>"*" BEGIN(S2); <S2>[^*] ; /* stay in S2 */ <S2>"*" BEGIN(S3); <S3>"*" ; /* stay in S3 */ <S3>[^*/] BEGIN(S2); <S3>"/" BEGIN(INITIAL); non-* * / / * * S1 S2 S3 non-{ /,* } A quick tutorial on Lex

  21. Using Start Conditions • Start conditions let us explicitly simulate finite state machines. • This lets us get around the “longest match” problem for C-style comments. FSA for C comments: flex input: %x S1, S2, S3 %% "/" BEGIN(S1); <S1>"*" BEGIN(S2); <S2>[^*] ; /* stay in S2 */ <S2>"*" BEGIN(S3); <S3>"*“ ; /* stay in S3 */ <S3>[^*/] BEGIN(S2); <S3>"/" BEGIN(INITIAL); non-* * / / * * S1 S2 S3 non-{ /,* } A quick tutorial on Lex

  22. Using Start Conditions • Start conditions let us explicitly simulate finite state machines. • This lets us get around the “longest match” problem for C-style comments. FSA for C comments: flex input: %x S1, S2, S3 %% "/" BEGIN(S1); <S1>"*" BEGIN(S2); <S2>[^*] ; /* stay in S2 */ <S2>"*" BEGIN(S3); <S3>"*“ ; /* stay in S3 */ <S3>[^*/] BEGIN(S2); <S3>"/" BEGIN(INITIAL); non-* * / / * * S1 S2 S3 non-{ /,* } A quick tutorial on Lex

  23. Using Start Conditions • Start conditions let us explicitly simulate finite state machines. • This lets us get around the “longest match” problem for C-style comments. FSA for C comments: flex input: %x S1, S2, S3 %% "/" BEGIN(S1); <S1>"*" BEGIN(S2); <S2>[^*] ; /* stay in S2 */ <S2>"*" BEGIN(S3); <S3>"*“ ; /* stay in S3 */ <S3>[^*/] BEGIN(S2); <S3>"/" BEGIN(INITIAL); non-* * / / * * S1 S2 S3 non-{ /,* } A quick tutorial on Lex

  24. Using Start Conditions • Start conditions let us explicitly simulate finite state machines. • This lets us get around the “longest match” problem for C-style comments. FSA for C comments: flex input: %x S1, S2, S3 %% "/" BEGIN(S1); <S1>"*" BEGIN(S2); <S2>[^*] ; /* stay in S2 */ <S2>"*" BEGIN(S3); <S3>"*“ ; /* stay in S3 */ <S3>[^*/] BEGIN(S2); <S3>"/" BEGIN(INITIAL); non-* * / / * * S1 S2 S3 non-{ /,* } A quick tutorial on Lex

  25. Using Start Conditions • Start conditions let us explicitly simulate finite state machines. • This lets us get around the “longest match” problem for C-style comments. FSA for C comments: flex input: %x S1, S2, S3 %% "/" BEGIN(S1); <S1>"*" BEGIN(S2); <S2>[^*] ; /* stay in S2 */ <S2>"*" BEGIN(S3); <S3>"*“ ; /* stay in S3 */ <S3>[^*/] BEGIN(S2); <S3>"/" BEGIN(INITIAL); non-* * / / * * S1 S2 S3 non-{ /,* } A quick tutorial on Lex

  26. Using Start Conditions • Start conditions let us explicitly simulate finite state machines. • This lets us get around the “longest match” problem for C-style comments. FSA for C comments: flex input: %x S1, S2, S3 %% "/" BEGIN(S1); <S1>"*" BEGIN(S2); <S2>[^*] ; /* stay in S2 */ <S2>"*" BEGIN(S3); <S3>"*“ ; /* stay in S3 */ <S3>[^*/] BEGIN(S2); <S3>"/" BEGIN(INITIAL); non-* * / / * * S1 S2 S3 non-{ /,* } A quick tutorial on Lex

  27. Using Start Conditions • Start conditions let us explicitly simulate finite state machines. • This lets us get around the “longest match” problem for C-style comments. FSA for C comments: flex input: %x S1, S2, S3 %% "/" BEGIN(S1); <S1>"*" BEGIN(S2); <S2>[^*] ; /* stay in S2 */ <S2>"*" BEGIN(S3); <S3>"*“ ; /* stay in S3 */ <S3>[^*/] BEGIN(S2); <S3>"/" BEGIN(INITIAL); non-* * / / * * S1 S2 S3 non-{ /,* } A quick tutorial on Lex

  28. Putting it all together • Scanner implemented as a function int yylex(); • return value indicates type of token found (encoded as a +ve integer); • the actual string matched is available in yytext. • Scanner and parser need to agree on token type encodings • let yacc generate the token type encodings • yacc places these in a file y.tab.h • use “#include y.tab.h” in the definitions section of the flex input file. • When compiling, link in the flex library using “-lfl” A quick tutorial on Lex

More Related