Chapter 7: Data Types

Chapter 7: Data Types

Chapter 7: Data type 7.1 Type Systems 7.2 Type Checking 7.3 Records (Structures) and Variants (Unions) 7.4 Arrays 7.5 Sets 7.6 Lists 7.7 Pointers and Recursive types

Data Types • Computers manipulate sequences of bits • But most programs manipulate more general data • Numbers • String • Lists • … • Programming languages provide data types that raise the level of abstraction from bits to data • But computer hardware only knows about bits!

Data TypesThe Purpose of Types • Types provide implicit context • Compilers can infer more information, so programmers write less code • E.g. the expression a+b in Java may be adding two integer, two floats or two strings depending on the context • Types provides a set of semantically valid operations • Compilers can detect semantic mistakes • Make sure that certain meaningless operations do not occur. Type checking cannot prevent all meaningless operations, but it catches enough of them to be useful.

Type Systems • High-level languages have type systems • All objects and expressions have a type • E.g.int (*)(const void *, const void *)is the type of a C++ function: int compare(const void *, const void *) • A type system consists of • A mechanism for defining types and associating them with certain language constructs • A set of rules for type checking • Type equivalence • Type compatibility • Type inference

Type SystemsType Checking • Type checking is the process of ensuring that a program obeys the language’s type compatibility rules • Strongly typed languages always detect types errors • Weakly typed languages do not • It means that the language prevents you from applying an operation to data on which it is not appropriate. • All expressions and objects must have a type • All operations must be applied in appropriate type contexts

Type SystemsStatic Type Checking • Static Typing means that the compiler can do all the checking at compile time. • Common Lisp is strongly typed, but not statically typed. • Ada is statically typed. • Pascal is almost statically typed. • Java is strongly typed, with a non-trivial mix of things that can be checked statically and things that have to be checked dynamically.

Type SystemsDynamic Type Checking • Dynamic type checking is performed at run time. It is a form of late binding. • It tends to be found in languages that delay other issues as well. • Lisp, Scheme, and Smalltalk are dynamically typed • Languages with dynamic scoping are generally dynamically typed.

What is a type? • Three points of view: • Denotational: a set of values • Constructive: a type is built-in type or a composite type • Composite types are created using type constructors • E.g. In Java, boolean is a built-in type, while boolean[] is a composite type • Abstraction-based: a type is an interface that defines a set of consistent operations • These points of view complement each other

Classification of TypesBuilt-in Types • Built-in/Primitive/Elementary types • Mimic hardware units • E.g. boolean, character, integer, real (float) • Their terminology and implementation varies across languages • Characters are traditionally one-byte quantities using the ASCII character set • Early computers had a different byte sizes • Byte = 8 bits standardized by Fred Brooks et al. thanks to the IBM System/360 • Other character sets have also been used • Newer languages use the Unicode character set

Built-in TypesNumeric Types • Most languages support integers and floats • The range of value is implementation dependent • Some languages support other numeric types • Complex numbers (e.g. Fortran, Python) • Rational numbers (e.g. Scheme, Common Lisp) • Signed and unsigned integers (e.g. C, Modula-2) • Fixed point numbers (e.g. Ada) • Some languages distinguish numeric types depending on their precision • Single vs. double precision numbers • C’s int (4 bytes) and long (8 bytes)

Classification of TypesEnumerations • Enumeration improve program readability and error checking • They were first introduced in Pascal • E.g. type weekday = (sun, mon, tue, wed, thu, fri, sat); • They define an order, so they can be used in enumeration-controlled loops • The same feature is available in C • enum weekday {sun, mon, tue, wed, thu, fri, sat}; • Other languages use constants to define enumeration • Pascal’s approach is more complete: integers and enumerations are not compatible

Classification of TypesSubranges • Subranges improve program readability and error checking • They were first introduced in Pascal • E.g. type test_score = 0..100; type workday = mon..fri; • They define an order, so they can be used in enumeration-controlled loops • The distinction between derived types and subranges (i.e., subtypes) is a feature of Ada. type test_score is new integer range 0..100; subtype workday is weekday range mon..fri;

Classification of TypesComposite Types • Composite/Constructed types are created applying a constructor to one or more simpler types • Examples • Records • Variant Records • Arrays • Sets • Pointers • Lists • Files

Classification of TypesOrthogonality • Orthogonality is an important property in the design of type systems • A collection of features is orthogonal if there are no restrictions on the ways in which the features can be combined. • Pascal is more orthogonal than Fortran, because it allows arrays of anything, for instance. • Orthogonality is nice primarily because it makes a language easy to understand, easy to use, and easy to reason about.

Type Checking • Type equivalence • When are the types of two components the same? • Type compatibility • When can a value of type A be used in a context that expects type B? • Type inference • What is the type of an expression, given the types of the operands?

Type Checking • Type equivalence • - When are the types of two objects the same? • Type compatibility • - When can a value of type A be used in a context that expects type B? • Type inference • - What is the type of an expression, given the types of the operands?

Type Checking • Type equivalence • - When are the types of two objects the same? Why is this important? The compiler wants to determine if an object can be used in a certain context. At a minimum, the object can be used if its type and thetype expected are equivalent.

Type Equivalence • Type equivalence is defined in two principal ways • Structural equivalence • Name equivalence • Two types are structurally equivalent if they have identical type structures • They must have the same components • E.g. Fortran • Two type are nominally equivalent (name equivalent) if they have the same name • E.g. Java • Name equivalence is more fashionable these days

Type EquivalenceStructural Equivalence • Structural equivalence depends on simple comparison of type descriptions. How to compare two structures: substitute all names and expand all the way to built-in types. Original types are equivalent if the expanded type descriptions are the same • Pointers complicate matters, but the Algol folks figured out how to handle it in the late 1960's. The simple (not quite correct) approach is to pretend all pointers are equivalent.

Type EquivalenceStructural Equivalence • Consider these two structures typedef struct { int a, b; } foo1; typedef struct { int a, b; } foo2; • All languages consider these two types structurally equivalent

Type EquivalenceStructural Equivalence • Is the order in the declaration relevant? typedef struct { int b; int a; } foo2; typedef struct { int a; int b; } foo1; • Most languages consider these two types structurally equivalent

Type EquivalenceStructural Equivalence Pitfall • Are these two types structurally equivalent? typedef struct { char *name; char *address; int age; } student; typedef struct { char *name; char *address; int age; } school; • They are, and it is unlikely that the programmer intended to allow these two types to be usedin the same context (although the compiler willhave no trouble doing so).

Type EquivalenceName Equivalence • Name Equivalence assumes type definitions with different names are different types • Otherwise, why did the programmer create two definitions? • It solves the previous problem • student and school are not nominally equivalent • Aliases: • Under strict name equivalence, aliases are not equivalent • type A = B is a definition (i.e. new object) • Under loose name equivalence, aliases are equivalent • type A = B is a declaration (i.e. binding)

Type EquivalenceName Equivalence and Aliases • Loose name equivalence may introduce undesired type equivalences

7.2.2 Type Conversion • A values of one type can be used in a context that expects another type using type conversion or type cast • Two flavors: • Converting type cast: underlying bits are changed • E.g. In C, int i; float f = 3.4; i = int(f); • Nonconverting type cast: underlying bits are not altered • E.g. In C, int i; float f; int j = 4; i = j; i = *((int *) &f);

Notes Does Java use structural or name equivalence? Make an entry in your Java notebook for this. p 419

Type Checking • Type equivalence • - When are the types of two objects the same? • Type compatibility • - When can a value of type A be used in a context that expects type B? • Type inference • - What is the type of an expression, given the types of the operands?

7.2.3 Type Compatibility • Most languages do not require type equivalence in every context • Instead, a value's type is required to be compatible with that of the context in which it appears. • Java example: result = tirandXmlClient.process(GET_COST, xmlString); Assume that process() returns an Account object. What is the type of result?

7.2.3 Type Compatibility • In Ada, two types T and S are compatible if any of the following conditions is true: • T and S are equivalent • T is a subtype of S • S is a subtype of T • T and S are arrays with the same number of elements and the same type of elements

Type Compatibility • Implicit type conversion due to type compatibility are known as type coercions • They may involve low-level conversions • Example: type weekday is (sun,mon,tue,wed,thur,fri,sat); subtype workday is weekday range mon..fri;

Type Compatibility • Type coercions make type systems weaker • Example:

Type Compatibility • Generic container objects require a generic reference type • Example

Type Checking • Type equivalence • - When are the types of two values the same? • Type compatibility • - When can a value of type A be used in a context that expects type B? • Type inference • - What is the type of an expression, given the types of the operands?

Type Inference • Type inference • What is the type of an expression, given the types of the operands? • Generally easy, but subranges and composites may complicate this issue Example result = 5 + 7.5 + sin(j) - 10.5E-12

An Example Strict name equivalence: types are equivalent if they refer to the same declaration Loose name equivalence types are equivalent if they refer to the same outermost constructor, i.e. if they refer to the same declaration after factoring out any type aliases. Example: type alink = pointer to cell; subtype blink = alink; p, q : pointer to cell; r : alink; s : blink; u : alink;

An Example Structural equivalence says all five variables have same type. Strict name equivalence equates types of p & q and r & u, because they refer back to the same type declaration (a constructor on the RHS of a var declaration is an implicit declaration of an ANONYMOUS type) Loose name equivalence equates types of p & q and r, s, & u because in both cases we can trace back to the same "^" constructor.

Records • Also known as ‘structs’ and ‘types’. • C struct resident { char initials[2]; int ss_number; bool married; }; • fields – the components of a record, usually referred to using dot notation.

Nesting Records • Most languages allow records to be nested within each other. • Pascal type two_chars = array [1..2] of char; type married_resident = record initials: two_chars; ss_number: integer; incomes: record husband_income: integer; wife_income: integer; end; end;

Memory Layout of Records • Fields are stored adjacently in memory. • Memory is allocated for records based on the order the fields are created. • Variables are aligned for easy reference. Optimized for space Optimized for memory alignment 4 bytes / 32 bits 4 bytes / 32 bits

Simplifying Deep Nesting • Modifying records with deep nesting can become bothersome. book[3].volume[7].issue[11].name := ‘Title’; book[3].volume[7].issue[11].cost := 199; book[3].volume[7].issue[11].in_print := TRUE; • Fortunately, this problem can be simplified. • In Pascal, keyword with “opens” a record. with book[3].volume[7].issue[11] do begin name := ‘Title’; cost := 199; in_print := TRUE; end;

Simplifying Deep Nesting • Modula-3 and C provide better methods for manipulation of deeply nested records. • Modula-3 assigns aliases to allow multiple openings with var1 = book[1].volume[6].issue[12], var2 = book[5].volume[2].issue[8] DO var1.name = var2.name; var2.cost = var1.cost; END; • C allows pointers to types • What could you write in C to mimic the code above?

Variant Records • variant records – provide two or more alternative fields. • discriminant – the field that determines which alternative fields to use. • Useful for when only one type of record can be valid at a given time.

Variant Records – Pascal Example type resident = record initials: array [1..2] of char; case married: boolean of true: ( husband_income: integer; wife_income: integer; ); false: ( income: real; ); id_number: integer; end; Case is TRUE Case is FALSE

Unions • A union is like a record • But the different fields take up the same space within memory union foo { int i; float f; char c[4]; } • Union size is 4 bytes!

Union example (from an assembler) union DisasmInst { #ifdef BIG_ENDIAN struct { unsigned char a, b, c, d; } chars; #else struct { unsigned char d, c, b, a; } chars; #endif int intv; unsigned unsv; struct { unsigned offset:16, rt:5, rs:5, op:6; } itype; struct { unsigned offset:26, op:6; } jtype; struct { unsigned function:6, sa:5, rd:5, rt:5, rs:5, op:6; } rtype; };

Another union example void CheckEndian() { union { char charword[4]; unsigned int intword; } check; check.charword[0] = 1; check.charword[1] = 2; check.charword[2] = 3; check.charword[3] = 4; #ifdef BIG_ENDIAN if (check.intword != 0x01020304) { /* big */ cout << "ERROR: Host machine is not Big Endian.\nExiting.\n"; exit (1); } #else #ifdef LITTLE_ENDIAN if (check.intword != 0x04030201) { /* little */ cout << "ERROR: Host machine is not Little Endian.\nExiting.\n"; exit (1); } #else cout << "ERROR: Host machine not defined as Big or Little Endian.\n"; cout << "Exiting.\n"; exit (1); #endif // LITTLE_ENDIAN #endif // BIG_ENDIAN }

Chapter 7: Data Types