organizing data and information
Download
Skip this Video
Download Presentation
Organizing Data and Information

Loading in 2 Seconds...

play fullscreen
1 / 62

organizing data and information - PowerPoint PPT Presentation


  • 746 Views
  • Uploaded on

Organizing Data and Information. Chapter 5. Topics in Ch. 5. General data management concepts and terms Two major approaches to data management Three database models The common functions performed by database management systems Three popular end-user database management systems

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'organizing data and information' - mike_john


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
topics in ch 5
Topics in Ch. 5
  • General data management concepts and terms
  • Two major approaches to data management
  • Three database models
  • The common functions performed by database management systems
  • Three popular end-user database management systems
  • Recent database developments.
data and information
Data and Information

What is data?

  • A collection of raw unprocessed facts, figures, and symbols

What is information?

  • Data that is organized, meaningful, and useful

How are data and information related?

  • Computers process data into information
slide4
What is a database?

Includes a collection of data organized so you can access, retrieve, and use the data

Database software allows you to

Create a computerized database

Add, change, and delete data

Sort and retrieve data from the database

Create forms and reports for the data in the database

Database software also called a database management system (DBMS)

hierarchy of data
Hierarchy of data
  • Bit
    • A single binary digit with either a one or zero state
  • Character
    • A byte, which is the basic building block of information
  • Field
    • A combination of one or more characters
    • The smallest unit of data a user accesses
    • A field name uniquely identifies each field
  • Record
    • A collection of related fields
  • Files or Tables
    • A collection of related records
  • Database
    • A collection of related files or tables
data entities attributes and keys8
Data entities, Attributes, and Keys
  • Entity
    • Generalized class of people, places, or things for which data is collected, stored, and maintained
    • E.g. at a university, students, library books, and courses; the entity in figure 5.2 is employee.
  • Attributes
    • A characteristic of an entity
    • E.g., employee number, last name, first name, hire date, and department number in figure 5.2
  • Key
    • A field or set of fields that identifies a record
      • A primary key: a field or a set of fields the uniquely defines a record: e.g. SSN
      • A secondary key: an alternative key that can be used to access records: e.g., last name
traditional approach file management approach
What is a file management approach?

Each department or area within an organization has its own set of files

Records in one file often do not relate to the records in any other file

Traditional approach (File management approach)
weakness of the traditional approach
Weakness of the “Traditional” Approach
  • Data redundancy
    • Wastes resources such as storage space and people’s time
    • The files often store the same fields in multiple files
    • Requires a larger storage capacity
    • Compromises data integrity
      • Data integrity
        • The degree to which data is correct
        • When a database contains errors, it loses integrity
        • Very important because computers and people use information to make decisions and take actions
  • Program-data dependence
    • Programs and data developed and organized for one application are incompatible with programs and data organized differently for another application
  • Isolated data
    • It is often difficult to access data stored in many files across several departments
what are the strengths of the database approach
Reduced data redundancy

Improved data integrity

Shared data

Reduced development time

Easier modification and updating

so on ( Table 5.1 p. 175)

A high cost involved with acquiring and implementing a database

Increased vulnerability

What are the strengths of the database approach?

What are the weakness of the database approach?

slide15
How do a database application and a file processing application differ in the way they might store data?

File processing example

Database example

data design issues
Data Design Issues
  • Content: What data should be collected?
  • Access: What data should be given to what users?
  • Logical structure: How will the data be organized to make sense to a particular user?
  • Physical organization: Where will the data actually be located?
data modeling
Data Modeling
  • Logical design
    • How data are grouped together and how that are related to one another
  • Physical design
    • Combining or splitting some of the groups identified in the logical design
  • Planned data redundancy
  • Data model
    • A diagram used by a database designer to show the logical relationships among the entities in the database
    • Entity-relationship (ER) diagrams
correspondence with er model
Correspondence with ER Model
  • Relations (tables) correspond with entity types and with many-to-many relationship types
  • Rows correspond with entity instances and with many-to-many relationship instances
  • Columns correspond with attributes
  • NOTE: The word relation (in relational database) is NOT the same same the word relationship (in ER model)
key fields
Key Fields
  • Keys are special fields that serve two main purposes:
    • Primary keys are unique identifiers of the relation in question. Examples include employee numbers, social security numbers, etc. This is how we can guarantee that all rows are unique
    • Foreign keys are identifiers that enable a dependent relation (on the many side of a relationship) to refer to its parent relation (on the one side of the relationship)
basic er symbols
Basic ER Symbols

Entity symbol

Attribute symbol

Relationship symbol

slide24

(a) CUSTOMER entity type with composite attribute

(b) CUSTOMER relation with address detail

database models
Database Models
  • Every database and DBMS is based on a specific data model
  • Consists of rules and standards that define how the database organizes data
  • Defines how users view the organization of the data
  • Three types of database models
    • Hierarchical models
      • Data is organized in a top-down or inverted tree structure
    • Network models
      • An extension of the hierarchical model in which a member may have many owners
    • Relational models
      • Stores data in tables that consist of rows and columns
      • Each row has a primary key and each column has a unique name
operations of a relational database
Operations of a relational database
  • Selection operation
    • The operation extracts data from a row (record)
  • Projection operation
  • Join operation
database management system
Database Management System
  • A DBMS is a data storage and retrieval system which permits data to be stored non-redundantly while making it appear to the user as if the data is well-integrated
  • Software that allows you to create, access, and manage a database
  • Available for many sizes and types of computers.
database management system33

Application

#1

Application

#2

Application

#3

DBMS

Database

containing

centralized

shared data

Database Management System

DBMS manages data resources like an operating system manages hardware resources

database management systems dbmss
Provide a user view

Schema: the logical description of an entire database

Subschema: a file that contains a description of part of the database and identifies which users can view or modify the data items in that part of the database

Provide tools to create & modify the database

Data Definition Language (DDL)

Data dictionary: a detailed description of all the data used in the database

Store & retrieve data

Manipulate data and produce reports

SQL (Structured Query Language)

Data manipulation language (DML)

Database Management Systems (DBMSs)
schema
Schema
  • A description of the entire database
  • Subschema
    • Shows only some of the records and their relationships in the database
  • Data Definition Language (DDL)
    • Command used to create a database
    • Commands that define a database, including creating, altering, and dropping tables and establishing constraints
    • Schemas and subschemas are described using a DDL.
slide38
Data Dictionary

System tables that store metadata

Contains data about each file in the database and each field within those files

Sometimes called metadata

Users usually can view some of these tables

Users are restricted from updating them

slide39
What is a default value for a field?

A value that the DBMS initially displays in a field

Data dictionary allows you to specify a default value for a field

data dictionary benefits
Data Dictionary Benefits
  • Reduced data redundancy
  • Faster program development
  • Easier data & information modification
  • Increased data reliability
slide41
What is a query?

A request for specific data from the database

Four commonly used methods to access data

Query language

Query by example

Form

Report generator

slide42
What is Structured Query Language (SQL)?

The standard data manipulation language for relational databases

A query language that allows you to manage, update, and retrieve data

Has special keywords and rules that you include in SQL statements (Data manipulation language)

projection operation

join operation

selection operation

slide43
Query by Example (QBE)

query results

wizard for querying the database

  • The program retrieves records that match criteria you enter in the form fields
  • Uses a graphical user interface
  • Available in MS Access
  • MS Access translates QBE to SQL and vice versa
slide44
What is a form?

Sometimes called a data entry form

A window on the screen that provides areas for entering or changing data in a database

concurrency control
Concurrency Control
  • Problem – in a multi-user environment, simultaneous access to data can result in interference and data loss
  • Solution – Concurrency Control
    • A method of dealing with situation in which two or more people need to access the same record in a database at the same time.
    • Serializability –
      • Finish one transaction before starting another
    • Locking Mechanisms
      • The most common way of achieving serialization
      • Data that is retrieved for the purpose of updating is locked for the updater
      • No other user can perform update until unlocked
popular dbms
Popular DBMS
  • Popular mainframe-based DBMSs
    • Oracle
    • IBM DB/2
    • Microsoft SQL Server
  • Popular end-user DBMSs
    • MS Access
    • Lotus Approach
    • Inprise’s dBASE
selecting a dbms
Selecting a DBMS
  • Database size
  • Number of concurrent users
  • Performance
  • Integration
  • Features
  • Vendor
  • Cost
database developments
Database Developments
  • Distributed database
  • Data warehouses
    • Data marts
    • Data mining
  • Online analytical processing (OLAP)
distributed database
Distributed Database
  • Distributed Database: A single logical database that is spread physically across computers in multiple locations that are connected by a data communications link
    • Replicated database: a database that holds a duplicate set of frequently used data
  • Decentralized Database: A collection of independent databases on non-networked computers

They are NOT the same thing!

reasons for distributed database
Reasons forDistributed Database
  • Business unit autonomy and distribution
  • Data sharing
  • Data communication costs
  • Multiple application vendors
  • Database recovery
  • Transaction and analytic processing
advantages of distributed database over centralized databases
Advantages ofDistributed Database over Centralized Databases
  • Increased reliability/availability
  • Local control over data
  • Modular growth
  • Lower communication costs
  • Faster response for certain queries
disadvantages of distributed database compared to centralized databases
Disadvantages ofDistributed Database compared to Centralized databases
  • Software cost and complexity
  • Processing overhead
  • Data integrity exposure
  • Slower response for certain queries
slide53
Data Warehouse:
    • A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes
    • Subject-oriented: e.g. customers, patients, students, products
    • Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources
    • Time-variant: Can study trends and changes
    • Nonupdatable: Read-only, periodically refreshed
  • Data Mart:
    • A data warehouse that is limited in scope, a single aspect of a company’s business, e.g. finance, inventory, or personnel
need for data warehousing

Comparison of operational and informational systems

Need for Data Warehousing
  • Integrated, company-wide view of high-quality information (from disparate databases)
  • Separation of operational and informational systems and data (for improved performance)
d ata warehouses
Data Warehouses
  • Data mining: an information analysis tool that involves the automated discovery of patterns and relationships among data
      • E.g.: identifies common characteristics of customers who buy the same products from your company
      • Predicts which customers are likely to leave your company and go to a competitor
      • Reveals the difference between a typical customer this month versus last month
on line analytical processing olap
On-Line Analytical Processing (OLAP)
  • Used to store and deliver data warehouse information
  • The use of a set of graphical tools that provides users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques
  • Relational OLAP (ROLAP)
    • Traditional relational representation
  • Multidimensional OLAP (MOLAP)
    • Cube structure
  • OLAP Operations
    • Cube slicing – come up with 2-D view of data
    • Drill-down – going from summary to more detailed views
slide60

Summary report

Figure 11-23:

Example of drill-down

Drill-down with color added

open database connectivity odbc
Open Database Connectivity (ODBC)
  • Standards that make it easier to transfer and access data among different databases.
  • Advantages of ODBC
    • ODBC can be used to export, import, or link tables between different applications
  • For example, a table in an Access database can be exported to a Paradox database or a spreadsheet. Table and data can also be imported using ODBC.
  • Linking allows an application, such as a database, to use an object in another application, such as a spreadsheet, without actually importing the object into the application. An advantage of linking is that as the object is updated in its original location, the changes will be reflected in the document to which it is linked.
object relational database management systems
Object-Relational Database Management Systems
  • A DBMS capable of manipulating audio, video, and graphical data
  • Hypertext
  • Hypermedia
  • Spatial data technology
    • Uses an object-relational database to store & access data via location. For example, NASA’s database of satellite photos of the Earth could be studied & manipulated to identify sources of pollutants.
ad