Chapter 6 Data Types
Chapter 6 Topics • Introduction • Primitive Data Types • Character String Types • User-Defined Ordinal Types • Array Types • Associative Arrays • Record Types • Union Types • Pointer and Reference Types
Data Type • A data type defines • a collection of data values and • a set of predefined operations on those values.
Data Types and Implementation Easiness of a Programming Language • Computer programs produce results by manipulating data. • An important factor in determining the ease with which programs can perform this task is • how well the data types available in the language being used match the objects in the real-world problem space. • e.g.: If a language provides an employee type, it can facilitate a programmer to write employee management-related program. • It is therefore crucial that a language support an appropriate collection of data types and structures.
Data Type Evolution • The contemporary concepts of data types have evolved over the last 50 years. • In the earliest languages, all problem space data structures had to be modeled with only a few basic language-supported data structures. • For example, in pre-90 Fortrans, linked lists and binary trees are commonly modeled with arrays.
Evolution of Data Type Design • One of the most important advances in the evolution of data type design • is introduced in ALGOL 68 • is to provide • a few basic types and • a few flexible structure-defining operators that allow a programmer to design a data structure for each need. means a structured type
Advantages of User-defined Types – (1) • User-defined types provide improved readability through the use of meaningful names for types. • User-defined types also aid modifiability: • A programmer can change the type of a category of variables in a program by changing only atype declaration statement.
Advantages of User-defined Types – (2) • They allow type checking of the variables of a special category of use, which would otherwise not be possible. • For example, zoo is a user-defined type: • We can check whether two variables are of the same zoo data type.
Abstract Data Type • Taking the concept of a user-defined type a step farther, we arrive at abstract data types. • The fundamental idea of an abstract data type is that the interface of a type, which is visible to the user, is separated from the representation of values of that typeand the implementation ofset of operations on values of that type, which are hidden from the user. • All of the types provided by a high-level programming language are abstract data types.
Scalar Types • In computing, a scalar variable is one that can hold only one value at a time; as opposed to sturctured variables like array, list, hash, record, etc. • A scalar data type is the type of a scalar variable. • For example, char, int, float, and double are the most common scalar data types in the C programming language.
Structured Data Types • The two most common structured (nonscalar) data types are • arrays and • records.
Type Operators • A few data types are specified by type operators, or constructors, which are used to form type expressions. • For example, C uses brackets and asterisks as type operators to specify arrays and pointers.
Variables and Descriptors • It is convenient, both logically and concretely, to think of variables in terms of descriptors. • A descriptor is the collection of the attributes of a variable.
Implementation of Descriptors • In an implementation, a descriptor is an area of memory that stores the attributes of a variable.
Implementation of Descriptors with Only Static Attributes • If the attributes are all static, descriptors are required only at compile times. • These descriptors • are built by the compiler, usually as a part of the symbol table and • are used during compilation.
Implementation of Descriptors with Dynamic Attributes • For dynamic attributes, however, part or all of the descriptor must be maintained during execution. • In this case, the descriptor is used by the run-time system. • In all cases, descriptors are used for • type checking and • to build the code for the allocation and deallocation operations.
Object • The word object is often associated with • the value of a variable and • the space it occupies. • In this book, the author reserves object exclusively for instances of user-defined abstract data types.
Primitive Data Types • Data types that are not defined in terms of other types are called primitive data types. • Nearly all programming languages provide a set of primitive data types. • Some of the primitive types are merely reflections of the hardware. • For example, most integer types. • Others require only a little non-hardware support for their implementation.
Primitive Data Types and Structured Types • The primitive data types of a language are used, along with one or more type constructors, to provide the structured types.
Numeric Type • Many early programming languages had only numeric primitive types. • Numeric types still play a central role among the collections of types supported by contemporary languages. • Integer • Floating-Point • Decimal • Boolean Types • Character Types • Complex
Integer • The most common primitive numeric data type is integer. • Many computers now support several sizes of integers. • These sizes of integers, and often a few others, are supported by some programming languages. • For example, Java includes four signed integer size: byte,short,int, and long.
Integer Types and Hardware • A signed integer value is represented in a computer by a string of bits, with one of the bits (typically the leftmost) representing the sign. • Most integer types are supported directly by the hardware.
Representation of negative numbers in C [stackoverflow] • ISO C (C99), section 18.104.22.168/2, states that an implementation must choose one of three different representations for integral data types, • two's complement, • one's complement or • sign/magnitude • It's incredibly likely that the two's complement implementations far outweigh the others.
Integer Representation [stackoverflow] • In all those representations, positive numbers are identical. • To get the negative representation for a positive number, you: • invert all bits for one's complement. • invert all bits then add one for two's complement. • invert just the sign bit for sign/magnitude.
Floating-point • Floating-point data types model real numbers, but the representations are only approximations for most real values. • For example, neither of the fundamental numbers π or ℯ (the base for the natural logarithms) can be correctly represented in floating-point notation. • Of course, neither of these numbers can be accurately represented in any finite space.
Problems of Floating-Point Numbers –(1) • On most computers, floating-point numbers are stored in binary; hence, they can not accurately represent most real numbers. • For example: • in decimal : 0.1 • in binary: 0.0001100110011… 2-1 2-2 2-3 2-4
Problems of Floating-Point Numbers –(2) • The loss of accuracy through arithmetic operations. • For more information on the problems of floating-point notation, see any book on numerical analysis. • (a/b)×c , a is a small number, b and c are large number ≡ (a/b)×c (may loss accuracy) ≡ (a×c)/b
Internal Representation of Floating-Point Values • Most newer machines use the IEEE Floating-Point Standard 754 format to represent float-point numbers. • Under the above format, floating-point values are represented as fractions (mantissa or significand) and exponents. • Language implementers use whatever representation that is supported by the hardware.
IEEE Floating-point Formats single precision double precision
Real Value of Single-precision Floating-Point Format [wikipedia] The real value = (-1)sign(1.b22b21…b0)2 x 2e-127
Floating-point Type • Most languages include two floating-point types, often called float and double. • The float type is the standard size, usually being stored in four bytes of memory. • The double type is provided for situations where larger fractional parts are needed. • Double-precision variables usually • occupy twice as much storage as float variables and • provide at least twice the number of bits of fraction.
Precision and Range • The collection of values that can be represented by a floating-point type is defined in terms of precision and range. • Precision is the accuracy of the fractional part of a value, measured as the number of bits. • Range is a combination of the range of fractions, and, more importantly, the range of exponents.
Decimal • Most larger computers that are designed to support business systems applications have hardware support for decimal data types. • Decimal data types store a fixed number of decimal digits, with the decimal point at a fixed position in the value.
Internal Representation of a Decimal Value • Decimal types are stored very much like character strings, using binary codes for the decimal digits. • These representations are called binary coded decimal (BCD). • In some cases, they are stored one digit per byte, but in others they are packed two digits per byte. Either way, they take more storage than binary representations. • The operations on decimal values are done in hardware on machines that have such capabilities; otherwise, they are simulated in software.
BCD Example[wikipedia] • To BCD-encode a decimal number using the common encoding, each decimal digit is stored in a four-bit nibble. • Thus, the BCD encoding for the number 127 would be: 0001 0010 0111
Boolean Types • Their range of values has only two elements, one for true and one for false. • They were introduced in ALGOL 60 and have been included in most general-purpose languages designed since 1960. • One popular exception is C, in which numeric expressions can be used as conditionals. • In such expressions, all operands with nonzero values are considered true, and zero is considered false.
Internal Representation of a Boolean Type Value • A Boolean value could be represented by a single bit. • But because a single bit of memory is difficult to access efficiently on many machines, they are often stored in the smallest efficiently addressable cell of memory, typically a byte.
Character Types • Character data are stored in computers as numeric coding. • The most commonly used coding is ASCII(American Standard Code for Information Interchange), which uses the values 0 to 127 to code 128 different characters. • Many programming languages include a primitive type for character data.
More about ASCII Code[wikipedia] • ASCII is, strictly, a 7-bit code, meaning it uses the bit patterns representable with seven binary digits (a range of 0 to 127 decimal) to represent character information. • At the time ASCII was introduced, many computers dealt with 8-bit groups (bytes or, more specifically, octets) as the smallest unit of information; the eighth bit was commonly used as a parity bit for error checking on communication lines or other device-specific functions. • Machines which did not use parity typically set the eighth bit to zero, though some systems such as Prime machines running PRIMOS set the eighth bit of ASCII characters to one.
Unicode • Because of the globalization of business and the need for computers to communicate with other computers around the world, the ASCII character set is rapidly becoming inadequate. • A 16-bit character set named Unicode has been developed as an alternative. • Unicode includes the characters from most of the world's natural languages. • For example, Unicode includes the Cyrillic alphabet, as used in Serbia, and the Thai digits.
Character String Type • A character string type is one in which the values consist of sequences of characters. • Character strings also are an essential type for all programs that do character manipulation.
Design Issues • The two most important design issues that are specific to character string types are the following: • Should strings be simply • a special kind of character array or • a primitive type (with no array-style subscripting operations)? • Should strings have static or dynamic length?
String Operations • The common string operations are: • Assignment • Catenation • Substring reference • Comparison • Pattern matching.
Internal Representation of a String Type Value in C and C++ • If strings are not defined as a primitive type, string data is usually stored in arrays of single characters and referenced as such in the language. • This is the approach taken by C and C++.
String Operations in C and C++ • Cand C++ use char arrays to store character strings. • Cand C++ provide a collection of string operations through a standard library whose header file is string.h. • Most uses of strings and most of the library functions use the convention • that character strings are terminated with a special character, null, which is represented with zero. • This is an alternative to maintain the length of string variables. • The library operations simply carry out their operations until the null character appears in the string being operated on.
Character String Literals in C • The character string literals that are built by the compiler have the null character. • For example, consider the following declaration: char *str = "apples"; • In this example, str is a char pointer set to point at the string of characters, apples0, where 0 is the null character. • This initialization of str is legal because character string literals are represented by char pointers, rather than the string itself.
String Types in Java (1) • In Java, strings are supported as a primitive type by the String class, whose values are constant strings. • The String class represents character strings. • All string literals in Java programs, such as "abc", are implemented as instances of this class. • Strings are constant; their values cannot be changed after they are created[oracle].
String Types in Java (2) • The StringBuffer class, whose values are changeable and are more like arrays of single characters. • Subscripting is allowed on StringBuffervariables.