CS 3304Comparative Languages • Lecture 13:Subroutines and Control Abstraction • 28 February 2012
Introduction • Abstraction: a process by which the programmer can associate a name with a potentially complicated program fragment that can be thought of in terms of its purpose, rather than in terms of its implementation: • Control abstraction: performs a well-defined operation. • Data abstraction: representation of information. • Subroutine is a principal mechanism for control abstraction: • Mostly parameterized: • Actual parameters: arguments passed into a subroutine. • Formal parameters: parameters in the subroutine definition. • Function: a subroutine that returns a value. • Procedure: a subroutine that does not return a value. • Subroutines are usually declared before being used.
Subroutine Call Stack • Each routine, as it is called, is given a new stack frame (activation record) at the top of the stack. • A frame may contain arguments and/or return values, bookkeeping information (including the return address and saved registers), local variables, and/or temporaries. • When subroutine returns, its frame is popped from the stack. • Stack pointer register: contains the address of either the last used location at the top of the stack, or the first unused location, depending on convention. • Frame pointer register: contains an address within the frame.
Allocation Strategies • Static: • Code. • Globals. • Own variables. • Explicit constants (including strings, sets, other aggregates). • Small scalars may be stored in the instructions themselves. • Stack: • Parameters. • Local variables. • Temporaries. • Bookkeeping information: return program counter (dynamic link),saved registers, line number, saved display entries,static link. • Heap: • Dynamic allocation.
Subroutine Nesting • Static chains: languages withnested subroutines and staticscoping (Pascal, Ada). • Objects that lie in lexicallysurroundingsubroutines: neitherlocal nor global. • Each stack frame contains areference to the frame of thelexically surrounding subroutine. • Dynamic link: the saved valueof the frame pointer, which is restored on subroutine return. • The lexically surrounding routine is always active!
Calling Sequence • Maintenance of stack is responsibility of calling sequence and subroutine prolog and epilog: • Space is saved by putting as much in the prologue (code executed at the beginning) and epilogue (code executed at the end) as possible. • Time may be saved by putting stuff in the caller instead, where more information may be known: • E.g., there may be fewer registers in use at the point of call than are used somewhere in the callee. • Common strategy is to divide registers into caller-saves and callee-saves sets: • Caller uses the “callee-saves” registers first. • “Caller-saves” registers if necessary. • Local variables and arguments are assigned fixed OFFSETS from the stack pointer or frame pointer at compile time: • Some storage layouts use a separate arguments pointer.
Calling Sequence: Caller • Saves any caller-saves registers whose values will be needed after the call. • Computes the values of arguments and moves them into the stack of registers. • Computes the static link (if this is a language with nested subroutines), and passes it as an extra, hidden argument. • Uses a special subroutine call instruction to jump to the subroutine, simultaneously passing the return address on the stack or in a register.
Calling Sequence: Callee • Prologue: • Allocates a frame by subtracting an appropriate constant from stack pointer. • Saves the old frame pointer into the stack, and assigns it an appropriate new value. • Saves any callee-saves registers that may be overwritten by the current routine (including the static link and return address, if they were passed in registers). • Epilogue: • Moves the return value (if any) into a register or a reserved location in the stack. • Restores callee-saves registers if needed. • Restores the frame pointer and the stack pointer. • Jumps back to the return address. • Moves the return value to wherever it is needed. • Restores caller-saves registers if needed.
Typical Stack Frame • Usually grows downward towardlower addresses. • Arguments are accessed aspositive offsets from the framepointer. • Local variables andtemporaries are accessed atnegative offsets from the framepointer. • Arguments to be passed to called routines are assembled atthe top of the frame using positive offsets from the stackpointer.
Special-Case Optimizations • Many parts of the calling sequence, prologue, and/or epilogue can be omitted in common cases. • Particularly leaf routines (those that don't call other routines): • Leaving things out saves time. • Simple leaf routines don't use the stack - don't even use memory – and are exceptionally fast. • Display: • In static chains the access to an object in a scope k levels out requires that the static chain be dereferenced k times. • This number can be reduced to a constant by use of a display, a small array that replaces the static chain: • The j-th element of the display contains a reference to the frame of the most recently active subroutine at lexical nesting level j. • An object k levels out can be found using the address stored in elementj = i –k of the display.
CISC versus RISC • Compilers for CISC machines tend to pass arguments on the stack; compilers for RISC machines tend to pass argument in registers. • Compilers for CISC machines usually dedicate a register to the frame pointer; compilers for RISC machines often do not. • Compilers for CISC machines often rely on special-purpose instructions to implement parts of the calling sequence; available instructions on a RISC machine are typically much simpler.
Other Improvements • Register windows - a hardware mechanism, an alternative to saving and restoring registers on subroutine calls: • Map the instruction set architecture (ISA) limited set of register names onto some subset (window) of a much large collection. • Old and new mappings may overlap: argument passing. • In-Line Expansion: a copy of the “called” routine becomes a part of the “caller” – no actual subroutine call occurs. • Avoids overheads such as space allocation, branch delays (call and return), maintaining static chain/display, saving/restoring registers. • Allows the compiler to perform code improvement such as global register allocation, instruction scheduling, etc. • Usually the compiler chooses which subroutines to expand in-line. • A programmer can suggest (C++, C99). • It is semantically neutral: no effect on the meaning of the program. • Increases the code size.
Parameter Passing • Most subroutines are parameterized. • Most languages use a prefix notation for calls to user-defined subroutines - the subroutine name followed by a parenthesized argument list: • Lisp - the function name inside the parenthesis: (max a b). • ML – names can be defined as infix operators. • Lisp/Smalltalk – user-defined subroutines use the same style of syntax as built-in operators. • Examples: • Pascal: if a > b then max := a else max := b; • Lisp: (if (> a b) (setf max a) (setf max b)) • Smalltalk: (a > b) ifTrue: [max <- a] ifFalse: [max <- b].
Parameter Modes • Parameter-passing mode and related semantic details are heavily influenced by implementation issues. • The two most common parameter-passing modes (mostly for languages with a value model of variable): • Call-by-value: each actual parameter is assigned into the corresponding formal parameter when a subroutine is called and then the two are independent. • Call-by-reference: each formal parameter introduces, within the body of subroutine, a new name for the corresponding actual parameter. • Aliases: If the actual parameter is also visible within the subroutine under its original name. • The distinction between value and reference parameters is fundamentally an implementation issue.
Values and Reference Parameters • Call-by-value/result: • Copies the actual parameters into the corresponding formal parameters at the beginning of subroutine execution. • Copies the formal parameters back to the corresponding actual parameters when the subroutine returns. • Pascal: parameters are passed by value by default. • Reference is preceded by the keyword var. • C: always passed by value. • Fortran: always passed by reference.
Call-by-Sharing • Call-by-value and call-by-reference don’t make much sense in a language with a reference model of variables. • Pass the reference itself and let the actual and formal parameters refer to the same object: • Different from call-by-value: although the actual parameter is copied to the formal parameter, the referenced object can be modified. • Different from call-by-reference: while the object can be changed, the identity of that object cannot change. • Java uses call-by-value for built-in types and call-by-sharing for user-defined class types. • C# can provide passing by reference by labeling a formal parameter and each corresponding argument with ref or out keyword.
Call-by-Reference • Some languages (Pascal, Modula) provide both call-by reference and call-by-value: • Call-by-reference: • If the called subroutine should change the value of an actual parameter. • Requires an extra level of indirection. • Can be used to pass large arguments: could introduce bugs. • Call-by-value: • To ensure that the called subroutine does not change. • Requires copying actuals to formals, a potentially time-consuming operation when arguments are large.
Read-Only Parameters • Modula-3 provides a READONLY parameter mode. • Any formal parameter whose declaration is preceded by READONLY cannot be changed by the called routine: • Cannot be on the left hand side of an assignment statement. • Cannot read it from a file. • Cannot pass it by reference to any other subroutine. • C provides const. • Tends to confuse the key pragmatic issue (does the implementation pass a value or a reference?) with two semantic issues: • Is the callee allowed to change the formal parameter. • If so, will the changes be reflected in the actual parameter.
Parameter Modes in Ada • Three parameter-passing modes: in, out and in out. • in parameters pass information from the caller to the callee: they can be read by the callee but not written. • out parameters pass information from the callee to the caller. • in out parameters pass information in both directions: they can be both read and written. • For scalar and access (pointer) parameter types all three modes are implemented by copying values: • in: call-by-value. • In out: call-by-value/result. • out: call-by-result. • Erroneous program: can tell the difference between value and address-based implementations of (nonscalar, nonpointer) in out. • Euclid outlaws the creation of aliases to hide the distinction between reference and value/result.
References in C++ • C++ improves on C by introducing an explicit notion of a reference. • Reference parameters are specified by preceding their name with an ampersand in the header of the function. • References in C++ see their principal use as parameters. • Another important use is for function returns, especially for objects that do not support a copy operation (e.g., file buffer). • The object-oriented features of C++, and its operator overloading make reference returns particularly useful.
Closures as Parameters • A closure is a reference to a subroutine together with its referencing environment. • It may be passed as a parameter. • A closure needs to include both a code address and a referencing environment. • Subroutines are routinely passed as parameters (and returned as results) in functional languages. • Object closure: in object-oriented language a method is packaged with its environment within an explicit object. • C# delegates: provide type safety without the restrictions of inheritance.
Call-by-Name • A call-by-name parameter is re-evaluated in the caller’s referencing environment every time it is used. • The effect is as if the called routine had been textually expanded at the point of call, with the actual parameter (which may be a complicated expression) replacing every occurrence of the formal parameter. • Label parameters: Both Algol 60 and Algol 68 allow a label to be passed as a parameter. If a called routine performs a goto to such a label, control will usually need to escape the local context, unwinding the subroutine call stack. • Both call-by-name and label parameters lead to code that is difficult to understand.
Special-Purpose Parameters • Conformant arrays: A formal array parameter whose shape is finalized at run time. • Default (optional) parameters: one that need not necessarily be provided by the caller. If it is missing, then a preestablished default value will be used instead • Named parameters: instead of positional, some languages allow parameters to be named (keyword parameters). Their order does not matter:put(item => 37, base => 8); • Variable numbers of arguments - e.g., printf/scanf in C:intprintf(char *format, ..)
Function Returns • The syntax varies greatly. • Early imperative languages: an assignment statement whose left-hand side is the name of the function. • More recent: an explicit return statement. • Some languages allow the result of the function to have a name in its own right:procedure A_max(ref A[1:*]: int) returns rtn : int • Many languages place restrictions on the types of objects that can be returned from a function: • C, Pascal: a composite type. • ML, Python: a tuple of values. • Modula-3, Ada 95: a subroutine implemented as a closure.
Generic Subroutines and Modules • Performing the same operation for a variety of different objects types. • Provide an explicitly polymorphic generic facility that allows a collection of similar subroutines or modules (with different types) to be created from a single copy of the source code. • Generic modules (classes): very useful for creating containers – data abstractions that hold a collection of objects. • Generic subroutine (methods): needed in generis modules. • Generic parameter: • Java, C#: only types. • Ada, C++: more general, including ordinary types, including subroutines and classes.
Implementation Options • Ada, C++: • Purely static - all the work done at compile time. • A compiler creates a separate copy of the code for every instance. • Java: • All instances of a given generic will share the same code at run time. • If T is a generic type parameter in Java, then object of class T are (automatically) treated as instances of Object. • C#: • Creates specialized implementations of a generic for different built-in type or value types (like C++). • The generic code must be typesafe, independent of the arguments provided in an y particular instantiation (like Java).
Generic Parameter Constraints • Because a generic is an abstraction, it is important that its interface provide all the information that must be known by a user of the abstraction. • Constraining generic parameters: the operations permitted on a generic parameter type must be explicitly declared. • Java, C#: require that a generic parameter support a particular set of methods. • C++, Modula-3: no explicit constraints but check how parameters are used.
Implicit Instantiation • Before the generic can be used, an instance of a generic class must be created (e.g., C++):queue<int, 50> *my_queue = new queue<int, 50>(); • The same for subroutines (e.g., Ada):procedure int_sort is new sort(integere, int_array, “<“);…int_sort(my_array); • Other languages treat generic subroutines as form of overloading (C++, Java, C#).
Generics in C++, Java, and C# • C++: • Most ambitious. • Templates are intended for almost any programming task that requires substantially similar but not identical copies of an abstraction. • Java/C#: • Provide generics purely for the sake of polymorphism. • Java: • Design influenced by the desire for backward compatibility with existing version of the language and the existing virtual machines and libraries. • C#: • Generics were included from the very beginning.
Summary • Subroutines allow the programmer to encapsulate code behind a narrow interface. • Subroutine call stack contains stack frames (activation records) for currently active subroutines. • There are several parameter-passing modes, all of which are implemented by passing values, references, or closures. • Generics allow a control abstraction to be parameterized (at compile time) in terms of the types of its parameters, rather than just their values.