Names and Binding

In procedural programming, you write instructions the manipulate the “state” of the process where the “state” is the collection of variables and their values in this chapter we will consider the idea of variables, storage, binding, types and scope Design issues for identifiers (names) does the language have a maximum length? most languages either have no restriction or the restriction is large enough to be immaterial (e.g., 31 in C and Pascal, 30 in COBOL) does the language have legal connectors? most languages use _ or “camel” notation COBOL uses – (hyphen) detracting from readability are letters case sensitive? this can detract from readability and writability both are the special words of the language context sensitive (key words) or reserved? in FORTRAN, INTEGER and REAL are context sensitive leading to this possibility INTEGER REAL REAL INTEGER Names and Binding

A variable is an abstraction of a memory location (for reference) and the type-specific methods to perform operations The address is sometimes referred to as the l-value the memory address or location of the left-hand side variable Aliases arise when multiple identifiers refer to the same memory location aliases are created through pointers pointing at the same allocated piece of memory parameter passing when the item is passed through a pointer union types (we explore this in chapter 6) Type specifies the allowable range of values to be stored and allowable operations Value the current value stored in memory, sometimes referred to as the r-value Variables

Binding • Binding is the association of an object to its attributes, operations, or name • there are many different types of bindings and bindings can occur at different times • for example: • design time binding: binding * to multiplication • language implementation time binding: int size and operations • compile time binding: binding type to a variable • for instance in int x; x will be treated as an int from here forward • link time binding: binding function name to a specific function definition • load time binding: binding variable to memory location • run time binding: binding variable to memory location or bind a polymorphic variable to specific class type • Binding is static if it occurs before run time and remains unchanged, otherwise binding is dynamic

Binding the identifier name to the declared type Are variables declared explicitly or implicitly? most languages require explicit variable declarations exceptions are FORTRAN: if first letter is I..N, then the variable is an integer, otherwise a real BASIC and PL/I: type binding occurs when variable is first assigned a value Perl, JavaScript, Ruby, ML, Lisp: variables are typeless – that is, they can change types any time the values assigned to them are changed – see dynamic type binding below Dynamic Type Binding binding of type is not explicit but derived by examining assignment statements at runtime LISP, JavaScript, Perl, JavaScript, Ruby, and PHP are all like this note that in Perl, different types of variables are determined by their name $name is a scalar, @name is an array, %name is a hash structure Type Inference using inference rules to determine the type returned by a function – this is the case in the languages ML, Miranda and Haskell Bindings: Variable Declarations

Static Lifetime • The lifetime is the time from which a variable is bound to memory until it is unbound • a static lifetime means that the variable is bound before program execution begins and remains the same until program termination • Variables are statically bound if they are • global variables • variables in a C function declared as static • variables in a FORTRAN subroutine • this allows the subprogram to retain the value of the variable after the subprogram terminates • however, this prohibits recursion or shadowing • Static is the most efficient form of binding • access is performed using direct memory addressing mode • no overhead needed for allocation or deallocation at runtime

Stack Dynamic Lifetime • The variable is bound when execution reaches the variable declaration in the code and unbound when the code that includes the declaration terminates • local variables and parameters in procedures, functions, methods are stack dynamic for the Algol-descended languages (C/C++/Java, Pascal, Ada, etc) • variables are pushed onto the run-time stack when the function/method begins execution and are popped off the stack when it terminates • allocation and deallocation are performed by the run-time environment • stack dynamic binding allows for recursion • FORTRAN’s handling of parameters being static does not allow for recursion • extra runtime overhead is needed for allocation and deallocation • memory space for these variables is provided on the run-time stack • variables declared inside a block are stack dynamic • { int x; …} in C/C++/Java, or inside begin…end blocks in Pascal/Ada/Algol • stack dynamic lifetimes will not allow a history of values to be saved after the block of code in which they were declared has terminated • FORTRAN 95 includes a Save list instruction to save the list of variables somewhere other than the stack • if you want to retain a variable’s value in C, declare it as static

Explicit Dynamic Lifetime • Variables are allocated and deallocated explicitly at runtime • memory for these types of variables comes from a system reserved area called the heap • allocated memory from the heap is nameless • that is, there is no binding of a named variable to a memory location • these locations must be referenced by pointers • the pointer might be named, as in int *p; • or the pointer may be part of a struct from heap itself • we use this memory for dynamic structures like linked lists, trees • in C/C++, we allocate heap memory using malloc and calloc • in Java, C++, Pascal and Ada, we use new • In PL/I, we use allocate • we must explicitly deallocate heap memory in most languages (but not Java or C# where garbage collection is used) • C# has parameters that can be either stack dynamic or explicit dynamic • The variable’s type is bound at compile time even though the variable’s memory is not allocated until run-time • since memory deallocation is often OS specific, many languages don’t actually implement a deallocation routine (so for instance, free in C may or may not work!)

Implicit Dynamic Lifetime • The variable’s memory is only bound while it is assigned a value • the variable’s attributes are bound during this time so that, if the variable is unbound and bound to a new memory location, then its attributes (including type) change • Lisp and JavaScript both use this approach • ALGOL 58 for “flex” arrays also used this approach • like explicit dynamic lifetime binding, this form also comes from the run-time heap, but here, allocation is implicit – that is, you do not have to explicitly use an allocation instruction like new or malloc • as with Java and C#, there is no need to explicitly deallocate the memory, it is garbage collected when no pointers are pointing at it • this form of lifetime has the highest degree of flexibility but also highest amount of overhead • no compile-time type checking is possible since the type can change, so any type checking (if performed at all), must be done at run-time • allows for Generic code which can operate on any type

Ensures that operands of an operator are compatible a compatible type is either one that is legal for the operator or is allowed via an implicit coercion (coercion is the automatic conversion of a variable’s type to a legal type for the operation) in C and Java, an int can be coerced into a float and a float can be coerced into a double, but not the other way in Pascal, an integer can be coerced into a string but in C++ or Java, it must be cast into a string A type error (often called a type mismatch) occurs if an operand is not appropriate for an operator and cannot be coerced If all bindings of variable to type are static, then type checking can be done completely at compile time JavaScript, Lisp and APL perform dynamic (run-time) type checking type checking is complicated if a memory location can store different types at different times of a program’s execution which is the case in FORTRAN or with Union types (we visit this in chapter 6) Type Checking

A programming language is strongly typed if all type checking errors are detectable before run-time a more restrictive definition is if every identifier in a program has a single type associated with it and known at compile time Both definitions require static binding having a language be strongly typed language is desirable because it offers the best reliability with regard to type errors Few languages are strongly typed because nearly all languages have a mechanism to get around type checking FORTRAN – in early versions, parameters were not checked, union types are available (EQUIVALENCE) Pascal – has variant records (we will examine this in chapter 6) C, C++ – functions may have params that skip type checking, and has union types Ada, Java and C# – they are close to being strongly typed in that no type errors can arise implicitly, however, all three languages have unchecked conversions (casts in Java and C#) which can lead to typing errors APL, SNOBOL, LISP – dynamic type binding ML is strongly typed but some of the types are inferred instead of declared Strongly Typed

Type Compatibility • Type compatibility determines whether a type error should arise or not • types are compatible if one is coercible into another • Languages will determine compatibility based on one of two strategies: • name type compatibility • if the variables are declared using the same declaration or the same type • example: int x; int y; // x and y are name type compatible • compatible by structure • if the variables have the same structure even though they are of differently named types, for example: • struct foo { int x; float y; }; • struct bar { int x; float y;}; • foo a; bar b; // a and b are structure type compatible • C uses compatibility by structure while C++ uses name compatibility • Ada uses name compatibility except for anonymous arrays which use structure compatibility

Scope • Scope is the range of statements from which a variable is “visible” (where the variable can be referenced) • Scope rules of a given language determine how the name being referenced will be associated with a particular variable in memory of that name • this is necessary when dealing with non-local references to variables, or when variables are re-declared inside of blocks • We will examine two forms of scope, static scope and dynamic scope • nearly all languages use static scope because it is easier to understand and type checking can be performed at compile time • Lisp is one of the few that has used dynamic scope, so we will consider this although today Lisp makes static scope available because of the difficulties in reading code that is dynamically scoped

Static Scope • Introduced in ALGOL 60 to bind names to non-local variables and has been copied by most languages since • scope for any variable can be determined prior to execution • two subtypes of static scoping • languages where subprograms can be nested (e.g., Ada, Pascal) • languages where subprograms can not be nested (e.g., C, Java) • if subprograms can be nested, then this creates a hierarchy of scopes formed by the definitions of the subprograms • example: sub1 is defined inside of sub2 which is inside of sub3, then a variable referenced in sub3 but not declared in sub3 would be found in sub2, and if not in sub2, then in sub1 • if two or more subprograms use the same name for a variable, the reference is to the definition that occurs in the subprogram innermost to the current, whereas the outer variable is “hidden” • in Ada, a hidden variable can still be accessed via notation like: sub1.x

MAIN MAIN A B A B C D E C D E Consider a program with nested subprograms: Main contains A and B A contains C and D B contains E In the language of this program, a nested subprogram can call any nested subprogram above it within the same subprogram, and can be called by the subprogram it is nested in so A can call C and D, D can call C, but C cannot call D B (which appears below A) can call A or D but D and C cannot call B or E Example of Static Scope Assume x is declared in MAIN, B and C and assume MAIN callsB calls E callsA calls D If x is referenced in E, it is B’s x whereas if x is referenced in D, it is MAIN’s x (not C’s) because D is statically scoped inside of MAIN but not in C

void scopeexample(int x) {… // reference 1 { int x; // declaration A … // reference 2 { … // reference 3 { int x; // declaration B … // reference 4 { … // reference 5 } } } } … // reference 6 } Blocks • In C-like languages • static scoping is not as much an issue because subprograms are not nested inside one another • any non-local variable will be a variable declared in the file (otherwise there is an error) • C-languages do allow blocks, which have the same scope rules as static scoping • a variable is found in the section of code it is referenced, or else you must follow the blocks outward until you reach the definition or reach the end of the function In the above example, reference 1 is to the parameter, reference 2 and 3 are to declaration A, reference 4 and 5 are to declaration B, reference 6 is to the parameter again. If x were not the name of the parameter, reference 1, 2 and 6 would yield syntax errors

Scope is based on the sequence of calling subprograms not their physical location To determine reference search backward through the chain of subprogram calls until the subprogram in which the variable was declared is found Dynamic scoping was used in APL, SNOBOL4, early LISPs MAIN - declaration of x SUB1 - declaration of x - ... call SUB2 ... SUB2 ... - reference to x - ... MAIN calls SUB1 SUB1 calls SUB2 SUB2 references x Static scoping – reference to x is to MAIN's x Dynamic scoping - reference to x is to SUB1's x Dynamic Scoping From previous example, if MAIN calls B calls E calls A calls D, and we access x in D, then we will reference B’s x, not MAIN’s • Dynamic scoping makes a program less readable and less reliable because a non-local reference cannot be determined until run-time

The referencing environment is the collection of variables which are accessible (visible) to a given statement in statically scoped languages, each statement’s Referencing Environment can be determined at compile time in dynamically scoped languages, the Referencing Environment of a statement consists of all variables in the local subprogram plus all other active subprograms (i.e., execution started but not yet terminated) Referencing Environment procedure Example is A, B : Integer; … procedure Sub1 is X, Y : Integer; begin …  1 end; procedure Sub2 is X : Integer; … procedure Sub3 is X : Integer; begin …  2 end; begin …  3 end; begin …  4 end. Example in Ada Referencing Environment at 1: X, Y of Sub1 A, B of Example 2: X of Sub3 A and B of Example (X of Sub2 is hidden but accessible as Sub2.X) 3: X of Sub2, A and B of Example 4: A and B of Example

A named constant is an identifier bound to a value at the time it is bound to storage and unalterable during its lifetime constants can aid readability and reliability of a program Ada, C++, Java allow dynamic binding of constants so that the value is not set by the programmer but can be determined at runtime (for instance, passed into a method as a parameter) For convenience, variable initialization can occur prior to execution FORTRAN: Integer Sum Data Sum /0/ Ada: Sum : Integer :=0; ALGOL 68: int first := 10; Java: int num = 5; LISP (Let (x (y 10)) ... ) Constants, Variable Initializations

Names and Binding