1 / 73

Chap. 8, Declaration Processing and Symbol Tables

Chap. 8, Declaration Processing and Symbol Tables. J. H. Wang Dec. 13, 2011. Outline. Constructing a Symbol Table Block-Structured Languages and Scopes Basic Implementation Techniques Advanced Features Declaration Processing Fundamentals Variable and Type Declarations

lobo
Download Presentation

Chap. 8, Declaration Processing and Symbol Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chap. 8, Declaration Processing and Symbol Tables J. H. Wang Dec. 13, 2011

  2. Outline • Constructing a Symbol Table • Block-Structured Languages and Scopes • Basic Implementation Techniques • Advanced Features • Declaration Processing Fundamentals • Variable and Type Declarations • Class and Method Declarations • An Introduction to Type Checking

  3. Constructing a Symbol Table • We walk (make a pass over) the AST for two purposes • To process symbol declarations • To connect each symbol reference with its declaration • An AST node is enriched with a reference to the name’s entry in the symbol table

  4. Static Scoping • Static scope: includes its defining block as well as any contained blocks that do not contain a declaration for the identifier • Global scope: a name space shared by all compilation units • Scopes might be opened and closed by braces ({ } as in C and Java), or by reserved keywords (begin and end as in Ada, Algol)

  5. A Symbol Table Interface • Methods • OpenScope() • CloseScope() • EnterSymbol(name, type) • RetreiveSymbol(name) • DeclaredLocally(name) • Ex. • (Fig. 8.2) Code to build the symbol table for the AST in Fig. 8.1

  6. Block-Structured Languages and Scopes • Block-structured languages: languages that allow nested name scopes • Concepts introduced by Algol 60 • Handling Scopes • Current scope: the innermost context • Open scopes (or currently active scopes): the current scope and its surrounding scopes • Closed scopes: all other scopes

  7. Some common visibility rules • Accessible names are those in the current scope and in all other open scopes • If a name is declared in more than one scope, then a reference to the name is resolved to the innermost declaration • New declarations can be made only in the current scope • Global scope • Extern: in C • Public static: in Java

  8. Compilation-unit scope: in C and C++ • Declared outside of all methods • Package-level scope: in Java • Every function definition is available in the global scope, unless it has static attribute • In C++ and Java, names declared within a class are available to all methods in the class • Protected members are available to the class’s subclasses • Names declared within a statement-block are available to all contained blocks, unless it’s redeclared in an inner scope

  9. One Symbol Table or Many? • Two common approaches to implementing block-structured symbol tables • A symbol table associated with each scope • Or a single, global table

  10. An Individual Table for Each Scope • Because name scope are opened and closed in a last-in first-out (LIFO) manner, a stack is an appropriate data structure for a search • The innermost scope appears at the top of stack • OpenScope(): pushes a new symbol table • CloseScope(): pop • (Fig. 8.3) • Disadvantage • Need to search a name in a number of symbol tables • Cost depending on the number of nonlocal references and the depth of nesting

  11. One Symbol Table • All names in the same table • Uniquely identified by the scope name or depth • RetrieveSymbol() need not chain through scope tables to locate a name • More details in Sec.8.3.3 • (Fig. 8.8)

  12. Basic Implementation Techniques • Entering and Finding Names • The Name Space • An Efficient Symbol Table Implementation

  13. Entering and Finding Names • Examine the time needed to insert symbols, retrieve symbols, and maintain scopes • In particular, we pay attention to the cost of retrieving symbols • Names can be declared no more than once in each scope, but typically referenced multiple times • Various approaches • Unordered list • Ordered list • Binary search trees • Balanced trees • Hash tables

  14. Unordered List • Simplest • Array • Linked list or resizable array • All symbols in a given scope appear adjacently • Insertion: fast • Retrieval: linear scan • Impractically slow

  15. Ordered List • Binary search: O(log n) • How to organize the ordered lists for a name in multiple scopes? • An ordered list of stacks (Fig. 8.4) • RetrieveSymbols() locates stacks using binary search • CloseScope() examines each stack and pops those stacks whose top symbol is declared in the abandoned scope • To avoid such checking, we maintain a separate linking of symbol table entries that are declared at the same scope level (Sec.8.3.3) • Fast retrieval, but expensive insertion • Advantageous when the space of symbols is known • Reserved keywords

  16. Binary Search Trees • Insert, search: O(log n), given random inputs • Average-case performance does not necessarily hold for symbol tables • Programmers do not choose identifiers at random! • Advantage • Simple, widely known implementation

  17. Balanced Trees • The worst-case scenario for binary trees can be avoided if the tree is balanced • E.g.: red-black trees, splay trees • Insert, search: O(log n)

  18. Hash Tables • Most common, due to its excellent performance • Insert, search: O(1), given • A sufficiently large table • A good hash function • Appropriate collision-handling techniques • (Sec.8.3.3)

  19. The Name Space • Properties to consider • The name of a symbol does not change during compilation • Symbol names persist throughout compilation • Great variance in the length of identifier names • Unless an ordered list is maintained, comparisons of symbol names involve only equality and inequality • In favor of one logical name space (Fig. 8.5)

  20. Names are inserted, but never deleted • Two fields • Origin • Length

  21. An Efficient Symbol Table Implementation • A symbol table entry containing • Name • Type • Hash • Var • Level • Depth

  22. Two index structures • Hash table • Scope display • Symbols at the same level • (Fig. 8.7) & (Fig. 8.8)

  23. Advanced Features • Extensions of the simple symbol table framework to accommodate advanced features of modern programming languages • Name augmentation (overloading) • Name hiding and promotion • Modification of search rules

  24. Records and Typenames • Aggregate data structures • struct, record • E.g. a.b.c.d • C, Ada, Pascal: completely specifying the containers and the field • COBOL, PL/I: intermediate containers can be omitted if the reference can be unambiguously resolved • a.c, c.d: more difficult to read • Can be nested arbitrarily deeply • Tree • typedef: alias for a type • Symbol table

  25. Overloading and Type Hierarchies • Method overloading allowed in object-oriented languages such as C++ and Java • If each definition has a unique type signature • Number and types of the parameters and return type • E.g.: print(int), print(String) • To view a method definition not only in terms of its names but also its type signature • To encode type signature of a method along with its name • E.g.: M(int): void • To record a method along with a list of its overloaded definitions

  26. Operator overloading: allowed in C++, Ada • Ada allows literals to be overloaded • E.g.: diamond in two different enumeration types: as a playing card, and as a gem • Pascal, Fortran allow the same symbol to represent the invocation of a method and the value of the method’s result • Two entries in the symbol table • C: the same name as a local variable, a struct name, and a label

  27. E.g.: (in Ex. 17) • main() { struct xxx { int a, b; } c; int xxx;xxx: c.a = 1;} • Type extension through subclassing allowed in Java, C++ • resize(Shape) vs. resize(Rectangle)

  28. Implicit Declarations • In some languages, the appearance of a name in a certain context serves to declare the name as well • E.g.: labels in C • In Fortran: inferred from the identifier’s first letter • In Ada: an index is implicitly declared to be of the same type as the range specifier • A new scope is opened for the loop so that the loop index cannot clash with an existing variable • E.g. for (int i=1; i<10; i++) { … }

  29. Export and Import Directives • Export: some local scope names are to become visible outside that scope • Typically associated with modularization features such as Ada packages, C++ classes, C compilation units, and Java classes • Java: public attribute, String class in java.lang package • C: all methods are known outside unless the static attribute is specified • In a large software system, the space of global names can become polluted and disorganized • C, C++: Header files • Java: import • Ada: use

  30. Altered Search Rules • To alter the way in which symbols are found in symbol table • In Pascal: with R do S • First try to resolve an identifier as a field of the record R • Advantageous if R is a complex name • Can usually generate efficient code • Forward reference in recursive data structures or methods • A portion of the program will reference a definition that has not yet been processed • Must be announced in some languages

  31. Symbol Table Summary • The symbol table organization in this chapter efficiently represents scope-declared symbols in a block-structured language • Most languages include rules for symbol promotion to a global scope • Issues such as inheritance, overloading, and aggregate data types must be considered

  32. Declaration Processing Fundamentals • Attributes in the symbol table • Internal representations of declarations • Identifiers are used in many different ways in a modern programming language • Variables, constants, types, procedures, classes, and fields • Every identifier will not have the same set of attributes • We need a data structure to store the variety of information • Using a struct that contains a tag, and a union for each possible value of the tag • Using object-based approach, Attributes and appropriate subclasses

  33. Type Descriptor Structures

  34. Type Checking Using an Abstract Syntax Tree • Using the visitor pattern (in Chap. 7) • SemanticsVisitor: a subclass of Visitor • The top-level visitor for processing declarations and doing semantic checking on the AST nodes • TopDeclVisitor • A specialized visitor invoked by SemanticsVisitor for processing declarations • TypeVisitor • A specialized visitor used to handle an identifier that represents a type or a syntactic form that defines a type (such as an array)

  35. Variable and Type Declarations • Simple variable declarations • A type name and a list of identifiers • (Fig. 8.12) • Visitor actions: (Fig. 8.13)

  36. Visit Method

  37. Handling Type Names

  38. Type Declarations • A name and a description of the type to be associated with it • (Fig. 8.15) • Visit method: (Fig. 8.16)

More Related