Lecture 6: Data Representation II

Lecture 6: Data Representation II Reading: Sebesta 5.4-5.7, 7.3-7.4 (Supplementary Texts: Pratt 5.1.4, 6.1.4, 6.4; Tucker & Noonan 3.1, 4.3, 5.4.1, 5.5.1) (Resources on C and Pascal available on the internet)

Overview • Types • Motivation for typed languages • Issues in Type Checking • How to type check? • How to cater for Polymorphism • Type Equivalence • When to type check? • Strong and Weak typed languages

1. Motivation for typed languages • Untyped Languages: perform any operation on any data. • Example: Assembly movi 5 r0 // Move integer 5 (2’s complement) to r0 addf 3.6 r0 // Treat bit representation in r0 as a // Floating point representation and add 3.6 // to it. Result? You can be sure that r0 does not contain 8.6! • (+) Flexibility : “I can do anything I want to and you can’t stop me” • (–) Ease of Error Checking. (programs are prone to errors, especially huge ones). “I am human, my brain is limited, I can’t remember and monitor everything.”

1. Motivation for typed languages Typed Languages: • A type represents a set of values. Programs / procedures / operators are functions from an input type to an output type. • Type Checking is the activity of ensuring that the operands / arguments of an operator / procedure are of compatible type through the use of a set of rules for associating a type with every expression in the language. (These rules are known as the type system). • A type error results when an operator is applied to an operand of inappropriate/incompatible type. • Output of a type system: • There are type-errors (wrt type system) => Program is NOT type-safe. • There are no type-errors (wrt type system) => Program is type-safe.

????? 1. Motivation for typed languages Program really has errors Program really does not have errors TC says that there are type errors TC errs on the conservative side Usually true Possible TC says that there are notype errors • Program MAY still have errors • It may still have type errors due to unsafe features of a language. This is due to bad type system design. • It may have logic errors. This serves to show that type errors is but one of the many errors you encounter.

1. Motivation for typed languages Program really has errors Program really does not have errors TC says that there are type errors TC errs on the conservative side Usually true Possible Typed Languages (+) Error Detection (+) Program documentation (–) Loss of Flexibility (but it’s ok, I don’t lose much freedom anyway since I don’t usually program in that way in the first place. I gain more than what I lose).

2. Issues in Type Checking • How to type-check? • How to cater for polymorphism? • What is your definition of “compatible type”? • When to perform type checking? • Is your language strongly or weakly typed?

2.1 How to type-check? Definition: • Type statements are of the form: <expr> : <type> meaning that an expression <expr> ‘is-of-the-type’ (the ‘:’ symbol) <type>. • Examples: • 3 : int • 3+4 : int • 3.14 : real • “abc” : String • while (x < 5) {x++;} : Stmt

e1 : t1 e2 : t2 … en : tn (rule name) f e1 e2 …en : t 2.1 How to type-check? Definition: • Type rules are of the form: where each ei : ti is a type statement, n ³ 0. The rule is interpreted as “IF e1 is of type t1 and … and en is of type tnTHEN f e1 e2 …en is of type t.”

E1 : int E2 : int E1 : int E2 : int 1 : int 2 : int 3 : int (+) (==) E1 + E2 : int E1 == E2 : bool 2.1 How to type-check? Examples of type rules: • Rule for constants: • Rule for addition: • Rule for boolean comparison:

x : T E : T E1 : Bool S1 : Stmt S2 : Stmt (:=) (if) x := E; : Stmt if (E1) {S1} else {S2} : Stmt 2.1 How to type-check? Examples of type rules: • Rule for assignment statement: • Rule for if-statment:

2.1 How to type-check? • Rules of Type Checking • Type of value => known in advance • Type of variable => known in the declaration • Type of function => known from the type of arguments (in declaration) and type of result (also in declaration). • Type of expression => inferred from sub-expression.

1 : int 2 : int 3 : int E1 : int E2 : int E1 : int E2 : int x : T E : T (==) (+) (:=) E1 == E2 : bool x := E; : Stmt E1 + E2 : int E1 : Bool S1 : Stmt S2 : Stmt (if) if (E1) {S1} else {S2} : Stmt x : int 1 : int (+) x : int x+1 : int (:=) 2.1 How to type-check? …And Given the rules: • Given the program: int x; … x := x+1; … A program/expression is type-safe if we can construct a derivation tree to give a type for that program/expression. x:=x+1; : Stmt

x : T E : T E1 : int E2 : int E1 : int E2 : int (:=) (==) (+) E1 == E2 : bool x := E; : Stmt E1 + E2 : int E1 : Bool S1 : Stmt S2 : Stmt (if) if (E1) {S1} else {S2} : Stmt x : int 1 : int (+) x : int 3 : int ??? x : int x+1 : int (==) (:=) (:=) x==3 : Bool y:=x; : Stmt x:=x+1; : Stmt (if) 2.1 How to type-check? …And Given the rules: • Given the program: int x; float y; … if (x == 3) { y := x; } else { x := x+1; } … 1 : int 2 : int 3 : int A program/expression is type-safe if we can construct a derivation tree to give a type for that program/expression. Follow the rules! Try to build tree. Cannot build tree => Not type safe if (x==3) {y:=x;} else {x:=x+1;} : Stmt

Issues in Type Checking • How to type-check? • How to cater for polymorphism? • What is your definition of “compatible type”? • When to perform type checking? • Is your language strongly or weakly typed?

2.2 How to cater for Polymorphism • Polymorphism = poly (many) + morph (form) • Polymorphism is the ability of a data object to take on or assume many different forms. • Polymorphism can be categorized into 2 types • Ad-hoc Polymorphism • Universal Polymorphism

Polymorphism Ad-Hoc Universal Coercion Overloading Parametric Inclusion 2.2 How to cater for Polymorphism Cardelli and Wegner’s classification (1985) Ad-Hoc polymorphism is obtained when a function works, or appears to work on several different types (which may not exhibit a common structure) and may behave in unrelated ways for each type. Universal polymorphism is obtained when a function works uniformly on a range of types; these types normally exhibit some common structure.

Polymorphism Ad-Hoc Universal This lecture Covered in FP & OO 2.2 How to cater for Polymorphism Cardelli and Wegner’s classification (1985) Coercion Overloading Parametric Inclusion

2.2 Polymorphism – Coercion COERCION A coercion is a operation that converts the type of an expression to another type. It is done automatically by the language compiler. (If the programmer manually forces a type conversion, it’s called casting) E : int (Int-Float Coercion) E : float int x; float y; ... y := x; ...

x : T E : T E1 : int E2 : int E1 : int E2 : int (+) (==) (:=) Add in new rule… x := E; : Stmt E1 == E2 : bool E1 + E2 : int E : int (Int-Float Coercion) E : float E1 : Bool S1 : Stmt S2 : Stmt (if) if (E1) {S1} else {S2} : Stmt x : int x : int 1 : int (+) x : int 3 : int x : int x+1 : int (==) (:=) x==3 : Bool y:=x; : Stmt x:=x+1; : Stmt (if) 2.2 Polymorphism – Coercion Example of the use of COERCION int x; float y; … if (x == 3) { y := x; } else { x := x+1; } … 1 : int 2 : int 3 : int (Coercion) y : float x : float (:=) if (x==3) {y:=x;} else {x:=x+1;} : Stmt

Coercion Widening Narrowing float float Theoretically speaking, int Í float int int 2.2 Polymorphism – Coercion Widening coercion converts a value to a type that can include (at least approximations of) all of the values of the original type. Widening is safe most of the time. It can be unsafe in certain cases. Narrowing coercion converts a value to a type that cannot store (even approximations of) all of the values of the original type. Narrowing is unsafe. Information is lost during conversion of type.

2.2 Polymorphism – Coercion Coercions (+) Increase flexibility in programming • Example: float x,y,z; int a,b,c; • If I have no coercions, and I intend to add y and a and store in x, then writing… x = y + ((float) a); …is too much of a hassle. • Therefore coercion is good.

2.2 Polymorphism – Coercion Coercions (–) Decrease Reliability (error detection) • Example: float x,y,z; int a,b,c; • If I have coercions and I intend to add x and y and store in z, but I accidentally write… z = x + a; …then my error will go undetected because the compiler will simply coerce the a to a float. • Therefore coercion is bad.

2.2 Polymorphism – Coercion Coercions: • A lot of them: PL/I, Fortran, C, C++ • Fewer : Java (permits only widening) • Very Few: Ada

E1 : int E2 : int E1 : float E2 : float (+-int) (+-float) E1 + E2 : int E1 + E2 : float 2.2 Polymorphism – Overloading OVERLOADING An overloaded operation has different meanings, and different types, in different contexts.

Example of the use of Overloading int x,y,z; float a,b,c; … if (x == 3) { x := y + z; } else { a := b + c; } … 1 : int 2 : int 3 : int E1 : int E2 : int (+) E1 + E2 : int x : T E : T E1 : int E2 : int (:=) (==) Add in new rule… x := E; : Stmt E1 == E2 : bool E1 : float E2 : float (+-float) E1 + E2 : float E1 : Bool S1 : Stmt S2 : Stmt (if) if (E1) {S1} else {S2} : Stmt y:int z:int b:float c:float (+) (+ -float) x : int 3 : int x : int y+z : int a : float b+c : float (==) (:=) (:=) x==3 : Bool x:=y+z; : Stmt a:=b+c; : Stmt (if) 2.2 Polymorphism – Overloading if (x==3) {x:=y+z;} else {a:=b+c;} : Stmt

2.2 Polymorphism – Overloading Overloading (+) Increase flexibility in programming • Examples are when user wants to use an operator to express similar ideas. • Example: int a,b,c; int p[10], q[10], r[10]; int x[10][10], y[10][10], z[10][10]; a = b * c; // integer multiplication p = a * q; // Scalar multiplication x = y * z; // Matrix multiplication • Therefore overloading is good.

2.2 Polymorphism – Overloading Overloading (–) Decrease Reliability (error detection) • Examples are when user intends to use the operator in one context, but accidentally uses it in another. • Example • In many languages, the minus sign is overloaded to both unary and binary uses. x = z–y and x = -y will both compile. What if I intend to do the first, but accidentally leave out the ‘z’?

2.2 Polymorphism – Overloading • Even for common operations, overloading may not be good. • Example int sum, count; float average; ... average = sum / count; Since sum and count are integers, integer division is performed first before result is coerced to float. That’s why Pascal has div for integer division and / for floating point division. Overloading (–) Decrease Reliability (error detection)

2.2 Polymorphism – Overloading • Do you allow the user to perform overloading? (Flexibility) Or are all overloaded functions predefined in the language? (controlled reliability) • If you allow the user to perform overloading, then can the user overload existing operators in the language? (eg. C++ allows you to overload +,-,*,/ to an extent that + can become * and * can become +!!!) Again power and flexibility vs reliability (the dangers of misuse). Overloading

2.2 Polymorphism – Summary • Coercion and Overloading: Use it but don’t abuse it. Use it wisely, don’t overdo it. Just like fire. Useful and yet dangerous if not managed carefully.

Polymorphism Ad-Hoc Universal Coercion Overloading Parametric Inclusion 2.2 Polymorphism – Summary Cardelli and Wegner’s classification (1985) Ad-Hoc polymorphism is obtained when a function works, or appears to work on several different types (which may not exhibit a common structure) and may behave in unrelated ways for each type. Universal polymorphism is obtained when a function works uniformly on a range of types; these types normally exhibit some common structure.

type // type definitions Q = array [1..10] of integer; S = array [1..10] of integer; T = S; var // variable declarations a : Q; b : S; c : T; d : array [1..10] of integer; begin a := b; // Is this allowed? // Meaning to say “Is a and b // the same type?” a := c; // Is this allowed? a := d; // Is this allowed? b := c; // Is this allowed? end. type // type definitions Queue = array [1..10] of integer; Stack = array [1..10] of integer; Tree = Stack; var // variable declarations a : Queue; b : Stack; c : Tree; d : array [1..10] of integer; begin a := b; // Is this allowed? // Meaning to say “Is a and b // the same type?” a := c; // Is this allowed? a := d; // Is this allowed? b := c; // Is this allowed? end. 2.3 Type Equivalence If you had said “yes” to most of it, chances are that you are adopting structural equivalence. If you had said “no” most of the time, then it is likely you are adopting name equivalence.

2.3 Type Equivalence Difference between type names and anonymous type names. • The type of a variable is either described through: • A type name: (1) those names defined using a type definition command. (eg. ‘type’ for Pascal, ‘typedef’ for C.), or… (2) the primitive numeric types (eg. int, float) • Or directly through a type constructor (eg. array-of, record-of, pointer-to). In this case, the variable has an anonymous type name.

type // type definitions Q = array [1..10] of integer; S = array [1..10] of integer; T = S; var // variable declarations a : Q; b : S; c : T; d : array [1..10] of integer; begin a := b; // Is this allowed? // Meaning to say “Is a and b // the same type?” a := c; // Is this allowed? a := d; // Is this allowed? b := c; // Is this allowed? end. Example Q,S,T are type names d has a type, but d does not have a type name. 2.3 Type Equivalence

2.3 Type Equivalence • When are two types equivalent (º)? Rule 1: For any type name T, T º T. Rule 2: If C is a type constructor and T1º T2, then CT1º CT2 . Rule 3: If it is declared that type name = T, then name º T. Rule 4 (Symmetry): If T1º T2,then T2º T1. Rule 5 (Transitivity): If T1º T2 and T2º T3, then T1º T3. • What rules do you want to use?

     2.3 Type Equivalence • When are two types equivalent (º)? Rule 1: For any type name T, T º T. Rule 2: If C is a type constructor and T1º T2, then CT1º CT2 . Rule 3: If it is declared that type name = T, then name º T. Rule 4 (Symmetry): If T1º T2,then T2º T1. Rule 5 (Transitivity): If T1º T2 and T2º T3, then T1º T3. • Structural Equivalence will use all the rules to check for type equivalence.

    2.3 Type Equivalence • When are two types equivalent (º)? Rule 1: For any type name T, T º T. Rule 2: If C is a type constructor and T1º T2, then CT1º CT2 . Rule 3: If it is declared that type name = T, then name º T. Rule 4 (Symmetry): If T1º T2,then T2º T1. Rule 5 (Transitivity): If T1º T2 and T2º T3, then T1º T3. • (Pure) Name Equivalence will use only the first rule. Unless the two variables have the same type name, they will be treated as different type 

     2.3 Type Equivalence • When are two types equivalent (º)? Rule 1: For any type name T, T º T. Rule 2: If C is a type constructor and T1º T2, then CT1º CT2 . Rule 3: If it is declared that type name = T, then name º T. Rule 4 (Symmetry): If T1º T2,then T2º T1. Rule 5 (Transitivity): If T1º T2 and T2º T3, then T1º T3. • Declarative Equivalence will leave out the second rule.

type // type definitions Q = array [1..10] of integer; S = array [1..10] of integer; T = S; var // variable declarations a,x : Q; b : S; c : T; d : array [1..10] of integer; e : array [1..10] of integer; begin a := x; // Is this allowed? // Meaning to say “Is a and b // the same type?” a := b; // Is this allowed? a := c; // Is this allowed? a := d; // Is this allowed? b := c; // Is this allowed? d := e; // Is this allowed? end. Example SE NE DE yes yes yes yes no no yes no no yes no no yes no yes yes no no 2.3 Type Equivalence R1: For any type name T, T º T. R2: If C is a type constructor and T1º T2, then CT1º CT2 . R3: If it is declared that type name = T, then name º T. R4 (Symmetry): If T1º T2,then T2º T1. R5 (Transitivity): If T1º T2 and T2º T3, then T1º T3.

Name Equivalence Easy to implement checking, since we need only compare the name. Very restrictive, inflexible. type idxtype = 1..100; var count : integer; index : idxtype; Structure Equivalence Harder to implement since entire structures must be compared. Other issues to consider: eg. arrays with same sizes but different subscripts – are they the same type? (similar for records and enumerations) More flexible, yet the flexibility can be bad too. type celsius = real; fahrenheit = real; var x : celsius; y : fahrenheit; ...x := y; // Allowed? 2.3 Type Equivalence

2.3 Type Equivalence • Different Languages adopt different rules. And the rules may change for one language (people can change their minds too!) • Pascal • Before 1982 – unknown. • ISO1982 – Declarative Equivalence. • ISO1990 – Structural Eqivalence. • C : Structural Equivalence, except for structs and unions, for which C uses declarative equivalence. If the two structs are in different files, then C goes back to structural equivalence. • C++ : Name Equivalence • Haskell/SML : Structural Equivalence.

When is the variable bound to the type? Compile-Time (Static Type Binding) Run-Time (Dynamic Type Binding) When can I type check? In theory, you can choose to type check at compile time or run-time. In practice, languages try to do it as much statically as possible. No choice but to do dynamic type checking. Eg. SML, Pascal Eg. JavaScript, APL 2.4 When to perform Type Checking?

2.4 When to perform Type Checking? • Static Type Checking – done at compile time. • (+) Done only once • (+) Earlier detection of errors • (–) Less Program Flexibility (Fewer shortcuts and tricks)

2.4 When to perform Type Checking? • Dynamic Type Checking – done at run time. • (–) Done many times • (–) Late detection of errors • (–) More memory needed, since we need to maintain type information of all the current values in their respective memory cells. • (–) Slows down overall execution time, since extra code is inserted into the program to detect type error. • (+) Program Flexibility (Allows you to ‘hack’ dirty code.) • Refer to Pratt 5.1.4 for detailed discussion. Sebesta 5.5 is brief.

2.4 When to perform Type Checking? • Hybrid • Type check statically as much as possible. Those which you can’t type check statically, do it dynamically. • Are there such cases? Yes, when a language provides a construct to allow a memory location to store values of different types during different execution times (next section).

2.5 Strong Type Systems • A programming language is defined to be strongly typed if type errors are always detected STATICALLY. • A language with a strong-type system only allows type-safe programs to be successfully compiled into executables. (Otherwise, language is said to have a weak type system). • Programs of strong-type systems are guaranteed to be executed without type-error. (The only error left to contend with is logic error).

Lecture 6: Data Representation II