1 / 12

The Symbol Table

The Symbol Table. used during all phases of compilation. maintains information about many source language constructs. incrementally constructed and expanded during the analysis phases. used directly in the code generation phases. efficient storage and access important

yorick
Download Presentation

The Symbol Table

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Symbol Table used during all phases of compilation maintains information about many source language constructs incrementally constructed and expanded during the analysis phases used directly in the code generation phases efficient storage and access important in practice (but we won’t worry about efficiency - we’ll just use a liked list) may or may not be constructed during lexical and syntax analysis, depending on the compiler

  2. Constructing the Symbol Table • There are three main operations to be carried out • on the symbol table: • determining whether a string has already • been stored • inserting an entry for a string • deleting a string when it goes out of scope • This requires three functions: • lookup(s): returns the index of the entry for • string s, or 0 if there is no entry • insert(s,t): add a new entry for string s (of • token t), and return its index • delete(s): deletes s from the table (or, typically, • hides it)

  3. 1 ID_T 2 ID_T A simple implementation next node index next token atts strPtr 7 ID_T attribute structure position in string array Initial node Table: first length last 78 ... ... 78 ID_T ... ... ... c o u n t # i # ... n a m e # ...

  4. Declarations There are four kinds of entity that may require an entry in the symbol table: Constante.g. const int MAX = 10000; Variablee.g. int count, marks[100]; type (user-defined) e.g. struct Entry { int index; char *strPtr; }; Function e.g. int gcd(int n, int m) { if (m == 0) return n; else return gcd(m, n % m); } • The attributes represented in the table will depend on • the object being declared. • All four will typically have a type signature, representing • the data type or (for functions) the return type. • Constants may have value bindings. • Variables may have pointers to memory locations. • Functions may have a pointer to code segments. • All four may have scope information. • In some compilers, separate symbols tables are used for each different kind of declaration; in others, each separate region of the program (e.g. functions) may be given a separate table.

  5. Scope In most high-level languages, variables and functions have restricted scope - i.e. they can only be accessed in specific areas of the source code. The scope of any particular variable may be global, or within a specific code file, or in a file after its declaration, or within specific code blocks. In C, blocks are files, function declarations and compound statements (between "{" and "}"). Also, structures and unions can be considered to be blocks. In languages with restrictive scoping rules, it is possible to construct the symbol table during lexical analysis. {L}+ {entry = lookup(yytext); if (entry == -1) /* i.e. new ID_T */ insert(yytext,ID_T); }

  6. Scoping Rules In block structured languages, the same variable name can be used in different places to refer to different objects. We now cannot simply look to see if the name has already been entered in the table, as the current use may be a new declaration. int i; int f1(int k) { int j; ... print i; } int f2() { int j; ... } i is globally accessible a new integer k, in f1 only a new integer j, in f1 only (the global variable) a different j, in f2 only

  7. Scope and the Symbol Table In languages with nested scope, the symbol table functions are more complex. lookup must must find the most recently inserted declaration; i.e. search for a declaration of the identifier valid in the current scope. insert must not overwrite previous declarations, but make them inaccessible. delete should hide the most recent the most recent declaration and uncover the previous one. Symbol table should thus behave as a stack It is still possible to construct the symbol table during the first pass of the compiler if explicit nesting levels are associated with each entry in the table. Many compilers prefer to make multiple passes over the source code, first constructing a syntax tree, and then constructing the table once the nested structure of the code is known.

  8. One-pass symbol table construction One possible method of constructing the symbol table during the first pass is shown below. Prog -> Dec Prog Prog -> Main Dec -> VDec ; Dec -> FDec VDec -> int id FDec -> SFDec Par ){ CStat } SFDec -> int id ( Par -> Par -> VDec Par -> PList , VDec PList -> VDec PList -> VDec , PList decr(stack); incr(stack); {L}+ {entry = lookup(yytext,stack); if (entry == -1) insert(yytext,ID_T,stack); }

  9. The stack consists of entries of the form (nesting level, scope value) These are extra entries added to the symbol table The last index is the index of the last entry added to the symbol table Initially, the stack is set to < (0,0) > and last to 0. insert associates the top of the stack with the entry lookup searches for a matching entry, and obtains its nesting level. It moves down the stack until it finds a stack entry with the same nesting level. If the table index is less than the stack scope value, it ignores it, and continues searching the table. If no match is found, it returns -1. decr deletes the top element of the stack incr adds a new element to the top of the stack, increments the nesting level, and assigns the last index as the scope value.

  10. Index Str Nest Scope Atts 0 i 0 0 ... 1 f1 0 0 2 k 1 1 3 j 1 1 4 f2 0 0 5 j 1 4 The changes in the stack are as follows (top on the right): Event Last Stack (Nest,Scope) 0 (0,0) 1 1 (0,0), (1,1) 2 3 (0,0) 3 4 (0,0), (1,4) 4 5 (0,0)o

  11. constructed symbol table int i; int f1(int k) { int j; ... print i; } int f2() { int j; ... } Index Str Nest Scope Atts 0 i 0 0 ... 1 f1 0 0 2 k 1 1 3 j 1 1 4 f2 0 0 5 j 1 4 The changes in the stack are as follows (top on the right): Event Last Stack (Nest,Scope) 0 (0,0) 1 1 (0,0), (1,1) 2 3 (0,0) 3 4 (0,0), (1,4) 4 5 (0,0)o

  12. Syntax trees and scope Prog VDec func func id l int VDec int id int l VDec print id VDec l f2 i f1 int id int id int id id j j k i Many compilers simply build a syntax tree on the first pass (while carrying out lexical and syntax analysis). They then make a second pass, constructing the symbol table, checking data types, etc. It should be easier to determine the scope of the identifiers from the syntax tree. Multiple passes may be slower, but it can result in more natural grammars, and simpler translation and analysis routines.

More Related