What is Data and Information?

What is Data and Information? Data is the term, that may be new to beginners, but it is very interesting and simple to understand. It can be anything like name of a person or a place or a number etc. Data is the name given to basic facts and entities such as names and numbers. The main examples of data are weights, prices, costs, numbers of items sold, employee names, product names, addresses, tax codes, registration marks etc.

Information: Information is data that has been converted into a more useful or intelligible form. It helps human beings in their decision making process. Examples are: Time Table, Merit List, Report card, Headed tables, printed documents, pay slips, receipts, reports etc. The information is obtained by assembling items of data into a meaningful form. For example, marks obtained by students and their roll numbers form data, the report card/sheet is the information.

Difference between Data and Information Data is the material on which computer programs work upon. It can be numbers, letters of the alphabet, words, special symbols etc. But by themselves they have no meaning. For example, the following sequence of digits 240343 is meaningless by itself since it could refer to a data of birth, a part number for a automobile, the number of rupees spent on a project, population of a town, the number of people employed in a large organization etc. Once we know what the sequence refers to, then it becomes meaningful and can be called information. When we write above as 24-03-43, it may mean date of birth as 24th March 1943.

A set of words would be data but text would be information. For example “ANNUAL-EXAMINATION, AMITABH, JYOTSNA, PHYSICS” is a set of data and “JYOTSNA SCORED THE HIGHEST MARKS IN PHYSICS IN ANNUAL EXAMINATION” is information.

Database The related information when placed is an organized form makes a database. The organization of data/information is necessary because unorganized information has no meaning. There are so many examples of organized information, more precisely and the most common are, the dictionary, the telephone directory, student record register, your own address book and many more. In each of these the data is stored in some particular order i.e. in an organized form. In dictionary, the words are arranged in alphabetic order along with their meanings. So that it becomes easier to search any word whose meaning is required. If this ordering would not have done, how could you find one word out of say 10,000 words. Similarly everybody can make a database of his/her own to keep the information in an organized manner. Think of your own address directory where you keep the addresses and phone numbers of your near and dear ones and it is also a database. Now let us move one step ahead. What do we do with that database?

There are so many operations like: ¨ To add new information (e.g. to add the address of a new friend in your address book) To view or retrieve the stored information (e.g. you have to find the address of one of your old friends) ¨ To modify or edit the existing information (e.g. your friend has shifted to a new place so his address would get changed) ¨ To remove or delete the unwanted information (e.g. your friend has changed his/her mobile number, so his/her mobile number would have to be removed from list) ¨ Arranging the information in a desired order etc.

Manual database and its problems Consider an example of accounts department of an organization. To make the salary calculations of the employees every month they are to keep the record of every employee and do a number of calculations such as addition of allowances like DA, HRA to the basic salary and to make several deductions as loan recoveries, income tax and insurance etc and at the end, to make the pay slips of the net pay. This whole procedure is repeated every month and is very tedious and laborious job. It’s a mere calculation job and does not require any logic or intelligence. So to waste the skills and intelligence of human beings on such repetitive calculations is not a wise decision.

Consider another situation where a magazine publisher, who has 10000 subscribers, receives a cheque from Mr. Suneet Bhatia with a request to renew his subscription for the magazine, but Mr. Suneet Bhatia does not mention his subscription number. Now, the publisher has to search the entire list of 10,000 names to find out the subscription number of Mr. Suneet Bhatia. This is a boring job, isn’t it?

Database and Computers ¨ Computer has a large storage capacity. It can store thousands of records at a time. ¨ It has high speed, within no time it searches any desired information, arrange the data in alphabetical order, do calculations on the data and make repetitions and so on. ¨ Computer is more accurate. ¨ Data in computers can be stored in the form of a file, records and fields. ¨ There are two approaches for storing data in computers such as File based approach and Database approach.

File Based Approach (FBA) File Based system: File-based systems were an early attempt to computerize the manual filing system that we are all familiar with. A file system is a method for storing and organizing computer files and the data they contain to make it easy to find and access them. File systems may use a storage device such as a hard disk or CD-ROM and involve maintaining the physical location of the files

In our own home, we probably have some sort of filing system, which contains receipts, guarantees, invoices, bank statements, and such like. When we need to look something up, we go to the filing system and search through the system starting from the first entry until we find what we want. Alternatively, we may have an indexing system that helps to locate what we want more quickly. For example we may have divisions in the filing system or separate folders for different types of item that are in some way logically related.

The manual filing system works well when the number of items to be stored is small. It even works quite adequately when there are large numbers of items and we have only to store and retrieve them. However, the manual filing system breaks down when we have to cross-reference or process the information in the files. For example, a typical real estate agent’s office might have a separate file for each property for sale or rent, each potential buyer and renter, and each member of staff.

Limitations of the File-Based Approach There are following problems associated with the File Based Approach: Separated and Isolated Data To make a decision, a user might need data from two separate files. First, the files were evaluated by analysts and programmers to determine the specific data required from each file and the relationships between the data and then applications could be written in a programming language to process and extract the needed data. Imagine the work involved if data from several files was needed.

Duplication of data Often the same information is stored in more than one file. Uncontrolled duplication of data is not required for several reasons, such as: ¨ Duplication is wasteful. It costs time and money to enter the data more than once. ¨ It takes up additional storage space, again with associated costs. ¨ Duplication can lead to loss of data integrity; in other words the data is no longer consistent. For example, consider the duplication of data between the Payroll and Personnel departments. If a member of staff moves to new house and the change of address is communicated only to Personnel and not to Payroll, the person’s pay slip will be sent to the wrong address. A more serious problem occurs if an employee is promoted with an associated increase in salary. Again, the change is notified to Personnel but the change does not filter through to Payroll. Now, the employee is receiving the wrong salary. When this error is detected, it will take time and effort to resolve.

Data Dependence In file processing systems, files and records were described by specific physical formats that were coded into the application program by programmers. If the format of a certain record was changed, the code in each file containing that format must be updated. Furthermore, instructions for data storage and access were written into the application’s code. Therefore, changes in storage structure or access methods could greatly affect the processing or results of an application.

Difficulty in representing data from the user’s view To create useful applications for the user, often data from various files must be combined. In file processing it was difficult to determine relationships between isolated data in order to meet user requirements.

Data Inflexibility Program-data interdependency* and data isolation, limited the flexibility of file processing systems in providing users with ad hoc information requests. *Program-data independence refers to the capability of leaving data intact and accessible regardless of modifications to the database containing the data.Incompatible file formats As the structure of files is embedded in the application programs, the structures are dependent on the application programming language. For example, the structure of a file generated by a COBOL program may be different from the structure of a file generated by a ‘C’ program. The direct incompatibility of such files makes them difficult to process jointly.

Data Security The security of the data is low in file based system because, the data is maintained in the flat files* is easily accessible. *Flat File : Each file called a flat file, contained and processed information for one specific function, such as accounting or inventory. Programmers used programming languages such as COBOL, C++ to write applications that directly accessed flat files to perform data management services and provide information for users.

Transaction problem The FBA does not satisfy transaction (ACID) properties. Concurrency Problem When multiple user access the same piece of data at same interval of time then it is called as concurrency of the system. Example: ATM money transfer. Poor data modeling of real world The FBA is not able to represent the complex data and interfile relationship, which results poor data modeling properties.

Database Approach In order to remove all the above limitations of the File Based Approach, a new approach was required that must be more effective known as Database approach. A database is a computer based record keeping system whose over all purpose is to record and maintain information. The database is a single, large repository of data, which can be used simultaneously by many departments and users. Instead of disconnected files with redundant data, all data items are integrated with a minimum amount of duplication. The database is no longer owned by one department but is a shared corporate resource.

The database holds not only the organization’s operational data but also a description of this data. For this reason, a database is also defined as a self-describing collection of integrated records. The description of the data is known as the Data Dictionary or Meta Data (the ‘data about data’). It is the self-describing nature of a database that provides program-data independence.

A database implies separation of physical storage from use of the data by an application program to achieve program/data independence. Using a database system, the user or programmer or application specialist need not know the details of how the data are stored and such details are “transparent to the user”. Changes (or updating) can be made to data without affecting other components of the system. These changes include, for example, change of data format or file structure or relocation from one device to another.

Characteristics of data in a database The data in a database should have the following features: Shared: Data in a database are shared among different users and applications. Validity/Integrity/Correctness: Data should be correct with respect to the real world entity that they represent. Security: Data should be protected from unauthorized access. Consistency: Whenever more than one data element in a database represents related real world values, the values should be consistent with respect to the relationship. Non-redundancy: No two data items in a database should represent the same real world entity.

Independence: Data at different levels should be independent of each other so that the changes in one level should not affect the other levels. To create, manage and manipulate data in databases, a management system known as database management system was developed.

What is Data and Information?