Data Types

Data Types • Introduction • Primitive Data Types • Character String Types • User Defined Ordinal Types • Array Types • Associative Arrays • Record Types • Union Types • Pointer and Reference Types

Introduction • Data Type: Collection of data values and a set of predefined operations on those values. • Programs produce results by manipulating data. • In pre-90 Fortrans, linked lists and binary trees were implemented with arrays. • COBOL introduced decimal data values & Structured data type for records • PL/I extended the capability of accuracy to Integer and Floating point types (Ada & Fortran).

Introduction… • ALGOL 68 provided basic types and few flexible structure- defining operators that allow a programmer to design a data structure for each need. • User-defined types provide improved readability through the use of meaningful names for types. • User-defined types also provide modifiability. A programmer can change the type of a category of variables in a program by changing only a type declaration statement. • Abstract data types emerged from user-defined types. • Abstract data types: Interface of a type is separated from representation and set of operations on values of that type.

Introduction… • The two most common structured data types are arrays and records. • Descriptor: Collection of the attributes of a variable (memory). • Descriptors are built by the compiler as part of symbol table. And are used during compilation. • Object: Instance of user-defined types. • Design Issue: What operations are provided for variables of the type and how are they specified?

Primitive Data Types • Data types that are not defined in terms of other types are called primitive data types. • Primitive data types are used along with one or more type constructors ( brackets and asterisks) to provide the structured types. • Numeric Types • Integer • Floating-Point • Complex • Decimal • Boolean Types • Character Types

Primitive Data Types… Integer • Integer: Many computers support several sizes of integers. • Example • java includes four signed integers byte, short, int and long. • c++ and c# include unsigned integer types( integer values with out signs) used for binary data. • A signed integer value is represented in a computer by a string of bits with one of the bits (leftmost) representing the sign. • A negative integer could be stored in sign-magnitude notation. In which the sign bit is set to indicate negative and the remainder of the bit string represents the absolute value of the number.

Primitive Data Types… Integer … • Most computers now use a notation called twos complement to store negative integers. In twos complement notation, the representation of a negative number is formed by taking the logical complement of the positive version of the number and adding one. • ones complement notation is still used by some computers. In ones complement the negative of an integer is stored as the logical complement of its absolute value. • The disadvantage of ones complement is it has two representations of zero.

Primitive Data Types… FloatingPoint • Floating Point data types model real numbers, but the representations are only approximations for most real values. • Ex:- neither of the fundamental numbers “pie” or “e” can be correctly represented in floating-point notation. • Another problem with floating point types is the loss of accuracy through arithmetic operations. • Floating point values are represented as fractions and exponents, a form that is borrowed forma scientific notation.

Primitive Data Types… FloatingPoint… • Older computers used a variety of different representations for floatingpoint values. • Most newer machines use the IEEE Floating point standard 754 format. • Most languages include two floating point types, called float and double. • The Float type is the standard size, usually being stored in four bytes of memory

Primitive Data Types… FloatingPoint… • The double type is provided for situations where larger fractional parts are needed. 8 bits 23 bits 11 bits 52 bits

Primitive Data Types… Complex • Some programming languages support a complex data type. • Ex:- Python. Complex values are represented as ordered pairs of floating point values. The imaginary jpart of a complex literal is specified by following it with a j or J. • Languages that support a complex type include operations for arithmetic on complex values.

Primitive Data Types… Decimal • Decimal data types store a fixed number of decimal digits, with the decimal point at a fixed position in the value. • These are the primary data types for business data processing and are therefore essential to COBOL • C# also has a decimal data type. • The disadvantages of decimal types are that the range of values is restricted because no exponents are allowed, • Decimal types are stored very much like character strings, using binary codes for the decimal digits. These representations are called binary coded decimal (BCD). • In some case they store one digit per byte, in others they are packed two digits per byte. Either way, they take more storage than binary representations.

Primitive Data Types… Boolean • Boolean types are the simplest of all types. • The range of values has only two elements: one for True and one for False. • Introduced in ALGOL 60 and were included in most general purpose languages designed since 1960. • Boolean types are used to represent switches or flags. • Although integers can be used for these purposes, the use of Boolean types is more readable. • It can be represented by a single bit, but because a single bit of memory cannot be accessed efficiently, they are stored in smallest efficiently addressable cell of memory, typically a byte.

Primitive Data Types… Character • Character data are stored in computer as numeric coding. • Traditionally, the most commonly used coding was the eight-bit ASCII (American Standard Code for Information Interchange). Which uses values 0 to 127 to code 128 different characters. • ISO 8859-1 is another eight-bit character code, but it allows 256 different characters (Ada 95 uses ISO 8859-1). • ASCII code has become inadequate because of globalization of business and need for computers to communicate with other computers around the world.

Primitive Data Types… Character… • A 16 bit character set named Unicode has been developed as an alternative. Unicode includes the characters form most of the world’s natural languages. • The first 128 characters of Unicode are identical to those of ASCII. • Java was the first widely used language to use the Unicode character set. Later java scripts, Python, Perl and C# use Unicode. • To provide the means of processing coding of single characters, most programming languages include a primitive type for them.

Character String Types • A Character string type is one in which the values consist of sequences of characters. • The input and output of all kinds of data are often done in terms of strings. • Design Issues: The two most important design issues are • Should strings be simply a special kind of character array or a primitive type? • Should strings have static or dynamic length? • Strings and Their Operations: The common string operations are assignment, concatenation, substring reference, comparison and pattern matching

Character String Types… • A substring reference is a reference to a substring of a given string. In context of arrays substring references are called slices. • Assignment and Comparison operations on character strings are complicated by the possibility of assigning and comparing operands of different lengths. • If strings are not defined as primitive types, string data is usually stored in arrays of single characters and referenced. This approach is followed in C and C++; • C and C++ use char arrays to store character strings. They provide a collection of string operations through a standard library whose header file is string.h. • Character strings are terminated with a special character, null. • Ex: char str[] = “apples”;

Character String Types… • Most commonly used library functions in C and C++ are • strcpy:- moves the stirngs • strcat:- concatenates one string onto another • strcmp:- lexicographically compares two give strings • strlen:- returns number of characters in the given string • The parameters and return values for most of the string manipulation functions are char pointer that point to arrays of char. • C++ programmers should use the string class form the standard library, rather than char arrays and the C string library.

Character String Types… • Fortran 95 treats strings as primitive type and provides assignment, relational operators, concatenation and substring reference operations for them. • In java, strinss are supported as a primitive type by the String class, whose values are constant strings and the StringBuffer class, whose values are changeable and are more like arrays of single characters. • Python also has strings as primitive types and has operations for substring reference, concatenation, indexing to access individual characters, as well as methods for searching and replacement. • Perl, java scripts, Ruby and PHP include built-in pattern matching operations. Pattern matching expressions are based on mathematical regular expressions, They are called regular expressions. • Ex:- /[A-Za-z][A-Za-z\d]+/ • Included in c++, Java, Python and c# class libraries.

Character String Types… • String Length Options: There are several design choices regarding the length of string values. • Static length string: length can be static and set when the string is created (Python, java, C++, Ruby, C#). • Limited dynamic length strings: Allows strings to have varying length but up to a declared and fixed maximum set by the variable’s definition (C and C++). • Dynamic length strings: Allows strings to have varying length with no maximum. ( JavaScript and Perl). • Ada 95 supports all three string length options. • Type String - Standard package • Type Bounded_String – Ada.Strings.Bounded package • Type Unbounded_String – Ada.Strings.Unbounded package

Character String Types… • Implementation of Character String Types • A descriptor for a static character string type, which is required only during compilation, has three fields. • The first field of every descriptor is the name of the type. • In case of static character strings, the second field is the type’s length. The third field is the address of the first character. • Compile-time descriptor for static strings

Character String Types… • Limited dynamic strings require a run-time descriptor to store both the fixed maximum length and the current length. • Run-time descriptor for limited dynamic strings • The limited dynamic strings of C and C++ do not require run-time descriptors, because the end of a string is marked with the null character.

Character String Types… • Dynamic length strings require more complex storage management. The length of a string, and therefore the storage to which it is bound, must grow and shrink dynamically. • There are 3 approaches to support the dynamic allocation and deallocation • First, Strings can be stored in a linked list, so that when a string grows, the newly required cells can come form any where in the heap. The drawbacks are the extra storage occupied by the links in the list representation and the necessary complexity of string operations.

Character String Types… • The second approach is to store strings as arrays of pointers to individual characters allocated in the heap. This method still uses extra memory, but string processing can be faster than with the linked list approach. • The third alternative is to store complete strings in adjacent storage cells . If a string grows and size is not sufficient, a new area of memory is found that can store the complete new string, and the old part is move to this area. Then the memory cells used for old string are deallocated.

User-Defined Ordinal Types • Ordinal type is one in which the range of possible values can be easily associated with the set of positive integers. • In java the primitive ordinal types are integers, char and boolean. • There are two user-defined ordinal types that have been supported by programming languages: enumeration and subrange.

User-Defined Ordinal Types… Enumeration Types • An enumeration type is one in which all of the possible values ( named constants) are provided or enumerated in the definition. • Enumeration types provide a way of defining and grouping collections of named constants, which are called enumeration constants. • The definition of a typical enumeration type in C# is as follows enum days { Mon, Tue, Wed, Thu, Fri, Sat, Sun}; • The enumeration constants are typically implicitly assigned the integer values, 0,1…., but can be explicitly assigned any integer literal in the type’s definition.

User-Defined Ordinal Types… Enumeration Types… Design issues for Enumeration types are • Is an enumeration constant allowed to appear in more than one type definition, and if so, how is the type of an occurrence of that constant in the program checked? • Are enumeration values coerced ( changing an entity of one type to another ) to integer? • Are any other types coerced to an enumeration type?

User-Defined Ordinal Types… Designs • In Languages that do not have enumeration types, programmers usually simulate them with integer values. • Ex:- If we wat to represent colors in C program and C did not have an enumeration type. We can use 0 to represent red, 1 to represent blue and so on. These can be defined as int red = 0, blue = 1; • The problem in this is that, since we have not defined a type for our colors, there is no type checking when they are used. • It is legal to add the two together, although that is not our intended operation. • They can also be combined with any other numeric type operand using any arithmetic operator. • They can be assigned any integer value, there by destroying the relationship with the colors.

User-Defined Ordinal Types… Designs… • C and Pascal were the first widely used languages to include an enumeration data type. • C++ includes C’s enumeration types. EX:- enum colors { red, blue, green, yellow, black}; colors myColor = blue, yourColor = red; • The colors type uses the default internal values for the enumeration constants,0,1,..., although the constants could have been assigned any integer literal. • The enumeration values are coerced to int when they are put in integer context. This allows their use in any numeric expression

User-Defined Ordinal Types… • Designs… • Ex:- If the current value of myColor is blue, the expression • myColor ++ • would assign green to myColor. • In C++ no other types value is coerced to an enumeration type. • Ex:- myColor = 4; is illegal in C++. This assignment would be legal if the right side has been cast to colors type. • C++ enumeration constants can appear in only one enumeration type in the same referencing environment. • In Ada, enumeration literals are allowed to appear in more than one declaration in the same referencing environment. These are called overloaded literals.

User-Defined Ordinal Types… • Designs… • If an overloaded literal and an enumeration variable are compared, the literal’s type is resolved to be that of the variable. • In some cases, the programmer must indicate some type specification for an occurrence of an overloaded literal to avoid a compilation error. • In Ada neither the enumeration literals nor the enumeration variables are coerced to integers, both the range of operations and the range of values of enumeration types are restricted, allowing many programmer errors to be compiler detected. • enumeration type was added to java in java 5.0 in 2004. All enumeraion types in java are implicitly subclasses of he predefined class Enum.

User-Defined Ordinal Types… • Designs… • The internal numeric value of a enumeration variable can be fetched with the ordinal method (java). • An enumeration variable is never coerced to any other type (java). • C# enumeration types are like that of C++, except they are not coerced to integer. • None of the scripting languages include enumeration types. These include Perl, JavaScript, PHP, Python and Ruby.

User-Defined Ordinal Types… Subrange Types • A subrange type is a contiguous subsequence of an ordinal type. • Ex:- 12..14 is a subrange of integer type. • subrange types were introduced by Pascal and are included in Ada. • There are no design issues that are specific to subrange types.

User-Defined Ordinal Types… Subrange Types… Ada’s Design • In Ada, subranges are included in the category of types called subtypes. Subtypes are not new types, they are new names for possibly restricted versions of existing types. • Ex:- type Days is (Mon, Tue, Wed, Thu, Fri, Sat, Sun); subtype Weekdays is Days range Mon..Fri; subtype Index is Integer range 1..100; In the example, the restriction on the existing types is in the range of possible values.

User-Defined Ordinal Types… Subrange Types… • All of the operations defined for the parent type are also defined for the subtype, except assignment of values outside the specified range. • Ex:- Day1 : Days; Day2 : Weekdays; … Day2 := Day1; the assignment is legal unless the value of Day1 is Sat or Sun. • The compile must generate range checking code for every assignment ot a subrange variable. While types are checked for compatibility at compile time, subranges require run-time range checking.

User-Defined Ordinal Types… Implementation of User-Defined Ordinal Types • Enumeration types are usually implemented as integers. • Subrange types are implemented same way as their parent types, except that range checks must be implicitly included by the compiler in every assignment of a variable or expression to a subrange variable. • This step increases code size and execution time but is usually considered well worth cost.

Array Types • An array is a homogeneous aggregate of data elements in which an individual element is identified by its position in the aggregate, relative to the first element. • A reference to an array element in a program often includes one or more non-constant subscripts, such references require additional run-time calculation to determine the memory location being referenced. • Design Issues: • What types are legal for subscripts • Are subscripting expressions in element references range-checked • When are subscripts ranges bound • When does array allocation take place • Are rectangular multidimensional arrays allowed • Can arrays be initialized when they have their storage allocated • What kinds of slices are allowed. If any.

Array Types • Arrays and indices • Arrays are referenced by tow-level syntactic mechanism, the name and subscripts or indices. • Arrays are sometimes called finite mappings. • array_name(subscript_value_list) -> element • The syntax is array name followed by list of subscripts, which is surrounded by either parentheses or brackets. • In Ada B(I) • In C-based languages a[i]; • The type of subscripts is often a subrange of integers, but Ada allows any ordinal type to be used such as Boolean, character and enumeration. • In Perl $list[-2] references the element with the subscript 3 if the array @list has five elements with the subscripts 0..4

Array Types • Subscript Bindings and Array Categories • The binding of the subscript type to an array variable is usually static, but the subscript value ranges are sometimes dynamically bound. • In C-based languages the lower bound is fixed at zero. • In Fortran 95 it defaults to one. • In some languages, subscript ranges must be specified by the programmer • There are five categories of arrays. The categories are based on the • Binding to subscript ranges • Binding to storage • Where the storage is allocated.

Array Types • Array Categories • A Static array: the subscript ranges are statically bound and storage allocation is static. The advantage of static arrays is efficiency. No dynamic allocation or de-allocation is required. • A Fixed stack-dynamic array: The subscript ranges are statically bound, but the allocation is done at declaration time during execution. • A Stack-dynamic array: both the subscript ranges and the storage allocation are dynamically bound during execution. • A Fixed heap-dynamic array: both the subscript ranges and storage binding is done during execution, and storage is allocated form the heap. • A heap-dynamic array: the binding of subscript ranges and storage allocation is dynamic and can change any number of times during the array’s lifetime. Arrays can grow and shrink during program execution as the need for space changes.

Array Types • Heterogeneous Arrays: A heterogeneous array is one in which the elements need not be of the same type. Such arrays are supported by Perl, Python, Java Script and Ruby. In all of these languages, arrays are heap dynamic. • Array Initialization: • In Fortran 95: Integer, Dimension(3)::List =(/0,5,5/) • In C, C++, Java and C#: int list[ ] = { 4,5,7,8}; • Character Strings in C and C++ are implemented as arrays of char: char name[ ]= “friend”; • char *names[ ] = {“Bob”, “Jack”, “Henry”}; • In java: String[ ] names = [“Bob ”, “Jack”, “Henry”]; • In Ada: List: array(1..5) of Integer:= (1,3,5,7,9); Bunch: array(1..5) of Integer:=(1=>17,3=>34, others=>0);

Array Types • Implementation of Array Types • address(list[k])= address(List[lower_bound]+ ((k- lower_bound) * element_size) • The Location of [i,j] element in a matrix 1 2 j j+1 n

Array Types Implementation of Array Types… Compile time descriptor for single dimensioned arrays Compile time descriptor for a multidimensional array

Associative Arrays • An associative arrays is an unordered collection of data elements that are indexed by and equal number of values called keys. • In an associative array, the user-defined keys must be stored in the structure, each element is in fact a pair of entities, a key and a value. • Associative arrays are supported by Python and Ruby and by the standard class libraries of Java, C++ and C#. • The only design issue is the form of the references to their elements. • In Perl, associative arrays are often called hashes, because in the implementation their elements are stored and retrieved with hash functions. • Every hash variable must begin with a percent sign(%). • %salaries = (“Bob”=>1200, “Henry”=> 2000,); • $salaries{“Mary”}=3000; • delete $salaries{“Henry”}; • @salaries =( );

Record Types • A record is a heterogeneous aggregate of data elements in which the individual elements are identified by names. • This is needed in programs to model collections of data that are not homogeneous. • Ex:- Information about a college student might include name, student number, grade point average e.t.c. • A data type for such a collection might use a character string for he name, an integer for the student number, a floatingpoint for the grade. Records are designed for these kind of needs • Records were introduced in COBOL in early 1960’s. • In C, C++ and C#, records are supported with the struct data type.

Record Types… • In C++, structures are minor variation on classes. • In C#, structures are also related to classes, but are quite different. • C# structures are stack-allocated value types, as opposed to class objects, which are heap- allocated reference types. • Structures in C++ and C# are normally used as encapsulation structures, rather than data structures. • The Design Issues that are specific to records are • What is the syntactic form of references to fields? • Are elliptical references allowed?

Record Types… • The COBOL form of a record declaration, which is part of the data division of a COBOL program is as follows • 01 EMPLOYEE-RECORD. • 02 EMPLOYEE-NAME. • 05 FIRST PICTURE IS X(20). • 05 MIDDLE PICTURE IS X(10). • 05 LAST PICTURE IS X(20). • 02 HOURLY-RATE PICTURE IS 99V99. • The EMPLOYEE-RECORD record consists of the EMPLOYEE-NAME record and the HOURLY-RATE field. The numerals 01,02 and 05 that begin the lines of the record declaration are level numbers, which indicate their relative hierarchical structure of the record. • The PICTURE Clauses show the formats of the field storage locations, X(20) specifies 20 alphanumeric characters and 99V99 specifying four decimal digits with the decimal point in the middle.

Record Types… • In Ada, records cannot be anonymous- they must be named types. • Ex:- Ada record declaration • type Employee_Name_Type is record • First: String (1..20); • Middle: String1..10); • Last: String(1..20); • end record; • type Employee_Record_Type is record • Employee_Name: Employee_Name_Type; • Hourly_Rate: Float; • end record; • Employee_Record: Employee_Record_Type; • In Java and C# records can be defined as data classes, with nested records defined as nested classes. Data members of such classes serve as the record fields.

Record Types… • References to the individual fields of records are syntactically spedified by several different methods, • In COBOL: MIDDLE OF EMPLOYEE-NAME OF EMPLOYEE-RECORD • Most of the other languages use dot notation for field reference. Where the components fo the reference are connected with periods. • In Ada: Employee_Record.Employee_Name.Middle • C and C++ use the same syntax for referencing the members of their structures. • In Fortran 95 field references have the same form, but th percent sign(%) are used instead of periods.

Data Types

Data Types

Presentation Transcript

Data types

Data Types

Data Types

Data Types

Data Types

Data Types

Data Types

Data Types

Data Types

Data types

Data types

DATA TYPES

Data Types

DATA TYPES

Data Types

Data types

Data types

Data Types

Data Types

Data types:

Data Types