Data Types and Data Structures

Ranga Rodrigo Data Types and Data Structures

Does the computer know about data types? No.

Data Types • Computer programs manipulate data of various types, such as: • numbers, both integral and floating point, • characters, based on the ASCII code, • boolean values, and • compound structures such as arrays and records.

In memory, however, all data is held as bit patterns which must be interpreted before the data can be processed. There is clearly a threat of insecurity here: if a bit pattern is interpreted wrongly, the program may crash or produce erroneous results.

Type Errors • Type errors arise when an operation defined for one type of data is applied to another. • E.g., if you try to add an array to a string. • In general, these errors are detected at run-time if and when the run-time system tries to execute the erroneous operation.

Untyped Languages • E.g., Perl. • It is the programmer's responsibility to avoid run-time type errors. • Any variable can store data of any type, and it is up to the programmer to make sure that operations are only applied to data of the correct type. • Interpretedlanguages are often untyped.

Untyped Languages • Variables do not have a type. • Programmers have to keep track of what is stored where. • Errors may only be caught at run-time, when it may be too late. • Worse, data corruption may take place unnoticed.

Typed Languages • Typed languages try to use the compiler to detect type errors. • This is to ensure that programs will not crash at run-time. • This is widely seen as a crucial aspect of language security.

Typed Languages • Variables and similar entities have defined types. • Each type has a range of permissable operations defined for it. • The compiler can therefore ensure that operations are only applied to data of the correct sort.

What is a type-secure language? A type-secure language is one which would in no circumstances give rise to a run-time error related to types. It is hard to guarantee this property.

Languages Untyped Typed • Perl Strongly Weakly • Pascal • Eiffel • C

Weak and Strong Typing • The distinction relates to the extent to which the compiler will silently convert data of one type to another related type. • E.g., converting numeric types, or integers to addresses in C. • Weak typing has the potential to let through more errors than strong typing.

Type Conversion and Casting • Sometimes it is convenient or necessary to convert data at run time from one type to another. • A common example is given by calculations which need to mix integer and floating point data. • Here, some languages, including C++, Java and Eiffel, will carry out some data conversion automatically, e.g., changing an integer data item into the corresponding floating point value.

Conversions in Eiffel • Some conversions might involve a loss of information: for example, converting a floating point value like 3.14159 into an integer. • C++ allows such conversions, whereas Java and Eiffel compilers report an error in this case. • If the programmer wants the conversion to go ahead, an explicit function call can be inserted in Eiffel to specify exactly what conversion is required.

In C++ and Java • Casing can be used. void main { int i ; float f ; i = (int) f ; }

Dangers of Casting • There are dangers associated with the unrestricted use of casts. • It provides a means for a programmer to override the type checks implemented by the compiler. • Casting is a common source of programming errors. • For this reason, languages intended to be secure, like Eiffel, do not support casting.

Types Value types Reference types The actual data of interest---an integer or a boolean value, say---will be stored in the allocated memory. A memory address will be stored. The actual data will be placed at the memory location pointed to by the reference.

Reference Types • Reference types allow the same data to be referred to at different points in a program, and complex data structures to be constructed. • Reference types allow data structures to be passed as parameters efficiently: instead of copying the whole structure, a reference is passed.

Value Types • With value types, there is no danger of accidental corruption of data through sharing. • Value types avoid the overhead of having to de-reference an address before getting hold of the data to be processed.

Types in Java • All classes in Java define reference types: this means that if Person, is a class, then following a variable declaration such as Person p ; the variable p will only be able to hold references. Java also defines a range of primitve value types---e.g., int, char, bool etc---which allow simple data to be manipulated efficiently.

Primitive Types in Java • The values of the primitive types lie outside the Java class hierarchy. • So in one sense Java is not a pure object-oriented language. • To get round this, Java defines classes that correspond to the built-in types---Integer, Boolean etc.

Autoboxing and Unboxing • It can sometimes be rather clumsy and confusing to convert data between value and the corresponding reference types. • To deal with this problem, later versions of Java have introduced autoboxing and unboxing---in effect, built in conversions between the primitive types and the corresponding reference types.

In C++, the distinction between value and reference types on the one hand, and simple and class data on the other is orthogonal. • In other words, the two concepts are quite independent of each other. • It is possible to have references to ints in C++, and equally for an instance of a class to be stored as a simple value.

Reference Types in C++ • In C++, reference types are defined explicitly. Person p;//a value Person& pr;//a reference Person* pp;//a pointer, also requiring dereferencing This gives great flexibility in the way that memory is managed, but is also a common source of programming errors. By contrast, in Java it is impossible to take the address of or obtain a reference to an int and there is no equivalent of the pointer manipulations possible in C++.

Types in Eiffel • In Eiffel, every type is a class, including basic types like INTEGER. • There are no special primitive types as in Java, so in this sense Eiffel is more object-oriented than Java. • This makes the language very consistent and conceptually simple.

Types in Eiffel • However, to avoid the inefficiency involved if it was necessary to dereference addresses to calculate something like 3 + 4, Eiffel provides a mechanism of expanded types to enable data to be stored by value rather than by reference. • In a way this is the opposite of C++: • in C++, all data is stored by value by default, and operators are provided to define reference types; • in Eiffel, data is stored by reference by default, and an operator is defined enabling some data to be stored by value.

Expanded Types • If a class is defined as expanded, variables of that class hold data values, not references. • The classes defining the basic types are all defined to be expanded: expanded class INTEGER ... Expanded classes can't be unexpanded, so it's not possible to define a variable which holds a reference to an INTEGER, but for each expanded class there is a corresponding reference class defined in the Base Library, eg INTEGER_REF.

Expanded Types • Expanded classes cannot be unexpanded. • So it is not possible to define a variable which holds a reference to an INTEGER, but for each expanded class there is a corresponding reference class defined in the Base Library, e.g., INTEGER_REF. • So Eiffel can provide a consistent type system, without a performance handicap on basic types. For practical purposes, the language works much as expected; it is rarely necessary to deal explicitly with expanded types.

Specifying EXPANDED Variables • It is also possible to specify that individual variables are expanded, in which case they will hold data values instead of references: • In a case like this, the class must provide a creation procedure with no arguments, so that the variable can be correctly initialized. x : expanded COUNTER

User-Defined Types • Classic languages, like Pascal, defined a range of basic types and a number of user-defined types which enabled programmers to define more complex data structures based on the basic types. • User-defined types included sets, subtypes, enumerated types, records and arrays. • In OO languages, the class is the main vehicle for the definition of user-defined types, essentially replacing and extending record types.

Enumerations • This is a user-defined type consisting of a fixed number of values normally given names and thought of as uninterpreted symbols; They are commonly implemented by assigning a unique integer value to each symbol. E.g., in C++ and Java 5.0: • In C++, this defines Colour to be a value type. In Java 5.0, this is a form of class definition: attributes and methods can be added, and enum types include the functionality inherited from Object. enum Colour {red, yellow, green};

Enumerations in Eiffel • Eiffel does not support the declaration of enumeration types. A set of constant attributes to act as an enumeration can be defined: • Unique attributes are guaranteed to have values that are different from that of any other unique attribute defined in the same class. They are typically used in inspect statements to discriminate the various possible cases. red : INTEGER is unique ; yellow : INTEGER is unique ; green : INTEGER is unique ;

Arrays • Since the beginning of programming, languages have included arrays to facilitate the handling of repeated data. • Arrays are characterized by two basic properties: • The type of data contained in the array. • The size of the array, specified either as number or elements, or by giving array bounds, i.e., the lowest and highest permissable indicies.

Arrays in C++ • In C/C++ arrays are defined to be the same aspointers, ie the address of the area in memory holding the array. • There is no notion of an array type, though the component type of arrays is given. • Arrays are created by specifying the required length, but this length cannot then be checked at run-time: it is the programmer's responsibility to keep track of the end of an array, eg null-terminated strings. • This is very insecure.

Arrays in Pascal • In Pascal, an array type is defined by the component type and the bounds, from which the length can be deduced. • This was found to be very strict: for example a sort routine for arrays of one length won't type won't work for others, even though the algorithm would work unchanged. • Pascal got round this problem by defining a looser array type for parameters, that specified only the component type of the array, and allowing the size of these arrays to be obtained at run-time.

Arrays in Java • In Java, arrays are quasi-objects, though there is no array class defined. • In particular, you can find out the length of an array at run-time by calling something that looks very like a class method.

Arrars in General • Languages have converged on defining array types simply in terms of the component type, and letting the size of an array object be determined at run-time. • In fact, no extra type security is obtained by including array size in the type, as the compiler cannot check that array bounds will not be exceeded at run-time, so run-time errors cannot be eliminated.

Arrays in Eiffel • In Eiffel, arrays are defined by a class, like all types. • The syntax is the same as other classes, with the addition of special notation for array literals:

x : ARRAY[INTEGER] create x.make(1, 10) x := << 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 >> -- assign constant array x.put(42, 1) -- put 42 at position 1 in x io.put_integer(x.item(1)) -- prints 42 io.put_integer(x @ 1) -- synonym for "item"

C++ Notation in Eiffel Arrays • In later versions of Eiffel, the same notation as C++ or Java can be used to store data in an array and to retrieve it: x[1] := 42 io.put_integer(x[1])

Run-Time Violations of Bounds • Languages respond in different ways to a run-time violation of an array's bounds. • C and C++ simply ignore it; more secure languages such as Java and Eiffel will raise a run-time exception if an out-of-bounds array access is attempted.

Arrays and Design by Contract • Consider the following partial definition of a class intended to record details of a football team's performances during a season.

class RESULTS create make feature points : ARRAY[INTEGER] played : INTEGER total : INTEGER make( games : INTEGER ) is do create points.make(1, games) end add_result( pts : INTEGER ) is do played := played + 1 points.put( pts, played ) total := total + pts end end

Invarients • The invariant for this class should state, among other things, that all the values in the array points should be equal to 0, 1 or 3, the possible points that a team can be awarded for each game. • Using conventional boolean expressions, the only way of specifying this would be to write something like Infeasible (points[1] = 0 or points[1] = 1 or points[1] = 3) and (points[2] = 0 ...) and ...

valid_results : BOOLEAN is local i : INTEGER do Result := true from i := 1 until not Result or else i > points.upper loop Result := Result and (points[i] = 0 or points[i] = 1 or points[i] = 3) i := i + 1 end end invariant valid_results Infeasible

for_all and there_exists • A better approach would be to find a way to mimic the quantifiers of logic, or in other words to extend the boolean expressions used in assertions so that it is possible to say things like "every element of the array is ..." or "at least one element of the array is ...". • Eiffel provides this kind of facility by defining the features for_all and there_exists in the ARRAY class.

for_all and there_exists • Both for_all and there_exists apply a given Boolean-valued function to every element of an array. • for_all returns true if every element of the array satisfies the given function, and there_exists returns true if at least one does. • The way in which the function is supplied to for_all and there_exists varies between different versions of EiffelStudio.

Here the helper function valid_result tests a single value, and the loop code is provided by the for_all feature. The keyword agent creates a 'function reference', and the '?' indicates which parameter should be replaced by each array element. • With EiffelStudio 5.7 or later it is possible to use the agent keyword to define an anonymous function. This means that the invariant can be written without defining a separate feature whose job is simply to check the value of an array element.

With EiffelStudio 5.6 or Later valid_result( i : INTEGER) : BOOLEAN is do Result := i = 0 or else i = 1 or else i = 3 end invariant valid_results: points.for_all( agent valid_result(?) )

With EiffelStudio 5.7 or Later invariant valid_results: points.for_all( (agent (i : INTEGER) : BOOLEAN do Result := i = 0 or else i = 1 or else i = 3 end ) )

Data Types and Data Structures

Data Types and Data Structures

Presentation Transcript

Data Structures

Data Structures

GIS Data: Types and Structures

Data Structures

INFSCI 0015 - Data Structures Lecture 1: Data Types

Abstract Data Types (ADTs) and data structures: terminology and definitions

Data Structures

Data Structures

Methods, Data and Data Types

Data Structures

Data Types, Data Structures and Constructs in Java

Data Structures

Data Structures

Data Types and Data Sources

An Introduction to Data Structures and Abstract Data Types

Data Structures: Abstract Data Types (ADTs)

An Introduction to Data Structures and Abstract Data Types

Data and Data Types

Data Structures

Data Structures

Data Structures