1 / 50

Data Types and Data Structures

Ranga Rodrigo. Data Types and Data Structures . Does the computer know about data types?. No. Data Types. Computer programs manipulate data of various types, such as: numbers, both integral and floating point, characters, based on the ASCII code, boolean values, and

eben
Download Presentation

Data Types and Data Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ranga Rodrigo Data Types and Data Structures

  2. Does the computer know about data types? No.

  3. Data Types • Computer programs manipulate data of various types, such as: • numbers, both integral and floating point, • characters, based on the ASCII code, • boolean values, and • compound structures such as arrays and records.

  4. In memory, however, all data is held as bit patterns which must be interpreted before the data can be processed. There is clearly a threat of insecurity here: if a bit pattern is interpreted wrongly, the program may crash or produce erroneous results.

  5. Type Errors • Type errors arise when an operation defined for one type of data is applied to another. • E.g., if you try to add an array to a string. • In general, these errors are detected at run-time if and when the run-time system tries to execute the erroneous operation.

  6. Untyped Languages • E.g., Perl. • It is the programmer's responsibility to avoid run-time type errors. • Any variable can store data of any type, and it is up to the programmer to make sure that operations are only applied to data of the correct type. • Interpretedlanguages are often untyped.

  7. Untyped Languages • Variables do not have a type. • Programmers have to keep track of what is stored where. • Errors may only be caught at run-time, when it may be too late. • Worse, data corruption may take place unnoticed.

  8. Typed Languages • Typed languages try to use the compiler to detect type errors. • This is to ensure that programs will not crash at run-time. • This is widely seen as a crucial aspect of language security.

  9. Typed Languages • Variables and similar entities have defined types. • Each type has a range of permissable operations defined for it. • The compiler can therefore ensure that operations are only applied to data of the correct sort.

  10. What is a type-secure language? A type-secure language is one which would in no circumstances give rise to a run-time error related to types. It is hard to guarantee this property.

  11. Languages Untyped Typed • Perl Strongly Weakly • Pascal • Eiffel • C

  12. Weak and Strong Typing • The distinction relates to the extent to which the compiler will silently convert data of one type to another related type. • E.g., converting numeric types, or integers to addresses in C. • Weak typing has the potential to let through more errors than strong typing.

  13. Type Conversion and Casting • Sometimes it is convenient or necessary to convert data at run time from one type to another. • A common example is given by calculations which need to mix integer and floating point data. • Here, some languages, including C++, Java and Eiffel, will carry out some data conversion automatically, e.g., changing an integer data item into the corresponding floating point value.

  14. Conversions in Eiffel • Some conversions might involve a loss of information: for example, converting a floating point value like 3.14159 into an integer. • C++ allows such conversions, whereas Java and Eiffel compilers report an error in this case. • If the programmer wants the conversion to go ahead, an explicit function call can be inserted in Eiffel to specify exactly what conversion is required.

  15. In C++ and Java • Casing can be used. void main { int i ; float f ; i = (int) f ; }

  16. Dangers of Casting • There are dangers associated with the unrestricted use of casts. • It provides a means for a programmer to override the type checks implemented by the compiler. • Casting is a common source of programming errors. • For this reason, languages intended to be secure, like Eiffel, do not support casting.

  17. Types Value types Reference types The actual data of interest---an integer or a boolean value, say---will be stored in the allocated memory. A memory address will be stored. The actual data will be placed at the memory location pointed to by the reference.

  18. Reference Types • Reference types allow the same data to be referred to at different points in a program, and complex data structures to be constructed. • Reference types allow data structures to be passed as parameters efficiently: instead of copying the whole structure, a reference is passed.

  19. Value Types • With value types, there is no danger of accidental corruption of data through sharing. • Value types avoid the overhead of having to de-reference an address before getting hold of the data to be processed.

  20. Types in Java • All classes in Java define reference types: this means that if Person, is a class, then following a variable declaration such as Person p ; the variable p will only be able to hold references. Java also defines a range of primitve value types---e.g., int, char, bool etc---which allow simple data to be manipulated efficiently.

  21. Primitive Types in Java • The values of the primitive types lie outside the Java class hierarchy. • So in one sense Java is not a pure object-oriented language. • To get round this, Java defines classes that correspond to the built-in types---Integer, Boolean etc.

  22. Autoboxing and Unboxing • It can sometimes be rather clumsy and confusing to convert data between value and the corresponding reference types. • To deal with this problem, later versions of Java have introduced autoboxing and unboxing---in effect, built in conversions between the primitive types and the corresponding reference types.

  23. In C++, the distinction between value and reference types on the one hand, and simple and class data on the other is orthogonal. • In other words, the two concepts are quite independent of each other. • It is possible to have references to ints in C++, and equally for an instance of a class to be stored as a simple value.

  24. Reference Types in C++ • In C++, reference types are defined explicitly. Person p;//a value Person& pr;//a reference Person* pp;//a pointer, also requiring dereferencing This gives great flexibility in the way that memory is managed, but is also a common source of programming errors. By contrast, in Java it is impossible to take the address of or obtain a reference to an int and there is no equivalent of the pointer manipulations possible in C++.

  25. Types in Eiffel • In Eiffel, every type is a class, including basic types like INTEGER. • There are no special primitive types as in Java, so in this sense Eiffel is more object-oriented than Java. • This makes the language very consistent and conceptually simple.

  26. Types in Eiffel • However, to avoid the inefficiency involved if it was necessary to dereference addresses to calculate something like 3 + 4, Eiffel provides a mechanism of expanded types to enable data to be stored by value rather than by reference. • In a way this is the opposite of C++: • in C++, all data is stored by value by default, and operators are provided to define reference types; • in Eiffel, data is stored by reference by default, and an operator is defined enabling some data to be stored by value.

  27. Expanded Types • If a class is defined as expanded, variables of that class hold data values, not references. • The classes defining the basic types are all defined to be expanded: expanded class INTEGER ... Expanded classes can't be unexpanded, so it's not possible to define a variable which holds a reference to an INTEGER, but for each expanded class there is a corresponding reference class defined in the Base Library, eg INTEGER_REF.

  28. Expanded Types • Expanded classes cannot be unexpanded. • So it is not possible to define a variable which holds a reference to an INTEGER, but for each expanded class there is a corresponding reference class defined in the Base Library, e.g., INTEGER_REF. • So Eiffel can provide a consistent type system, without a performance handicap on basic types. For practical purposes, the language works much as expected; it is rarely necessary to deal explicitly with expanded types.

  29. Specifying EXPANDED Variables • It is also possible to specify that individual variables are expanded, in which case they will hold data values instead of references: • In a case like this, the class must provide a creation procedure with no arguments, so that the variable can be correctly initialized. x : expanded COUNTER

  30. User-Defined Types • Classic languages, like Pascal, defined a range of basic types and a number of user-defined types which enabled programmers to define more complex data structures based on the basic types. • User-defined types included sets, subtypes, enumerated types, records and arrays. • In OO languages, the class is the main vehicle for the definition of user-defined types, essentially replacing and extending record types.

  31. Enumerations • This is a user-defined type consisting of a fixed number of values normally given names and thought of as uninterpreted symbols; They are commonly implemented by assigning a unique integer value to each symbol. E.g., in C++ and Java 5.0: • In C++, this defines Colour to be a value type. In Java 5.0, this is a form of class definition: attributes and methods can be added, and enum types include the functionality inherited from Object. enum Colour {red, yellow, green};

  32. Enumerations in Eiffel • Eiffel does not support the declaration of enumeration types. A set of constant attributes to act as an enumeration can be defined: • Unique attributes are guaranteed to have values that are different from that of any other unique attribute defined in the same class. They are typically used in inspect statements to discriminate the various possible cases. red : INTEGER is unique ; yellow : INTEGER is unique ; green : INTEGER is unique ;

  33. Arrays • Since the beginning of programming, languages have included arrays to facilitate the handling of repeated data. • Arrays are characterized by two basic properties: • The type of data contained in the array. • The size of the array, specified either as number or elements, or by giving array bounds, i.e., the lowest and highest permissable indicies.

  34. Arrays in C++ • In C/C++ arrays are defined to be the same aspointers, ie the address of the area in memory holding the array. • There is no notion of an array type, though the component type of arrays is given. • Arrays are created by specifying the required length, but this length cannot then be checked at run-time: it is the programmer's responsibility to keep track of the end of an array, eg null-terminated strings. • This is very insecure.

  35. Arrays in Pascal • In Pascal, an array type is defined by the component type and the bounds, from which the length can be deduced. • This was found to be very strict: for example a sort routine for arrays of one length won't type won't work for others, even though the algorithm would work unchanged. • Pascal got round this problem by defining a looser array type for parameters, that specified only the component type of the array, and allowing the size of these arrays to be obtained at run-time.

  36. Arrays in Java • In Java, arrays are quasi-objects, though there is no array class defined. • In particular, you can find out the length of an array at run-time by calling something that looks very like a class method.

  37. Arrars in General • Languages have converged on defining array types simply in terms of the component type, and letting the size of an array object be determined at run-time. • In fact, no extra type security is obtained by including array size in the type, as the compiler cannot check that array bounds will not be exceeded at run-time, so run-time errors cannot be eliminated.

  38. Arrays in Eiffel • In Eiffel, arrays are defined by a class, like all types. • The syntax is the same as other classes, with the addition of special notation for array literals:

  39. x : ARRAY[INTEGER] create x.make(1, 10) x := << 2, 4, 6, 8, 10, 12, 14, 16, 18, 20 >> -- assign constant array x.put(42, 1) -- put 42 at position 1 in x io.put_integer(x.item(1)) -- prints 42 io.put_integer(x @ 1) -- synonym for "item"

  40. C++ Notation in Eiffel Arrays • In later versions of Eiffel, the same notation as C++ or Java can be used to store data in an array and to retrieve it: x[1] := 42 io.put_integer(x[1])

  41. Run-Time Violations of Bounds • Languages respond in different ways to a run-time violation of an array's bounds. • C and C++ simply ignore it; more secure languages such as Java and Eiffel will raise a run-time exception if an out-of-bounds array access is attempted.

  42. Arrays and Design by Contract • Consider the following partial definition of a class intended to record details of a football team's performances during a season.

  43. class RESULTS create make feature points : ARRAY[INTEGER] played : INTEGER total : INTEGER make( games : INTEGER ) is do create points.make(1, games) end add_result( pts : INTEGER ) is do played := played + 1 points.put( pts, played ) total := total + pts end end

  44. Invarients • The invariant for this class should state, among other things, that all the values in the array points should be equal to 0, 1 or 3, the possible points that a team can be awarded for each game. • Using conventional boolean expressions, the only way of specifying this would be to write something like Infeasible (points[1] = 0 or points[1] = 1 or points[1] = 3) and (points[2] = 0 ...) and ...

  45. valid_results : BOOLEAN is local i : INTEGER do Result := true from i := 1 until not Result or else i > points.upper loop Result := Result and (points[i] = 0 or points[i] = 1 or points[i] = 3) i := i + 1 end end invariant valid_results Infeasible

  46. for_all and there_exists • A better approach would be to find a way to mimic the quantifiers of logic, or in other words to extend the boolean expressions used in assertions so that it is possible to say things like "every element of the array is ..." or "at least one element of the array is ...". • Eiffel provides this kind of facility by defining the features for_all and there_exists in the ARRAY class.

  47. for_all and there_exists • Both for_all and there_exists apply a given Boolean-valued function to every element of an array. • for_all returns true if every element of the array satisfies the given function, and there_exists returns true if at least one does. • The way in which the function is supplied to for_all and there_exists varies between different versions of EiffelStudio.

  48. Here the helper function valid_result tests a single value, and the loop code is provided by the for_all feature. The keyword agent creates a 'function reference', and the '?' indicates which parameter should be replaced by each array element. • With EiffelStudio 5.7 or later it is possible to use the agent keyword to define an anonymous function. This means that the invariant can be written without defining a separate feature whose job is simply to check the value of an array element.

  49. With EiffelStudio 5.6 or Later valid_result( i : INTEGER) : BOOLEAN is do Result := i = 0 or else i = 1 or else i = 3 end invariant valid_results: points.for_all( agent valid_result(?) )

  50. With EiffelStudio 5.7 or Later invariant valid_results: points.for_all( (agent (i : INTEGER) : BOOLEAN do Result := i = 0 or else i = 1 or else i = 3 end ) )

More Related