Data and Databases - PowerPoint PPT Presentation

Data and databases l.jpg
Download
1 / 44

Data and Databases. The Data Basics. Data Facts concerning things such as people, objects, or events Information data that have been processed and presented in a form suitable for human interpretation Database a collection of interrelated , shared , and controlled data.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Data and Databases

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data and databases l.jpg

Data and Databases


The data basics l.jpg

The Data Basics

  • Data

    • Facts concerning things such as people, objects, or events

  • Information

    • data that have been processed and presented in a form suitable for human interpretation

  • Database

    • a collection of interrelated, shared, and controlled data


Modern database systems l.jpg

Modern Database Systems

Accounting

Finance

Sales


Modern database systems4 l.jpg

Modern Database Systems

Accounting

Application

Programs

Accounting

Finance

Finance

Application

Programs

Sales

Sales

Application

Programs


Modern database systems5 l.jpg

Modern Database Systems

Accounting

Application

Programs

Accounting

Integrated

Database

Finance

Finance

Application

Programs

Sales

Sales

Application

Programs


Modern database systems6 l.jpg

Modern Database Systems

Accounting

Application

Programs

Accounting

Integrated

Database

Finance

Finance

Application

Programs

DBMS

Sales

Sales

Application

Programs


Advantages of modern database environments l.jpg

Advantages of Modern Database Environments

  • Minimal data redundancy

  • Data consistency

  • Integration of data

  • Data sharing

  • Ease of application development

  • Security, privacy, and integrity controls

  • Data accessibility and responsiveness

  • Data independence

  • Reduced program maintenance


Components of the modern database environment l.jpg

Components of the Modern Database Environment

User

Interface


Components of the modern database environment9 l.jpg

Components of the Modern Database Environment

Data Administrators

System Developers

End-users

User

Interface


Components of the modern database environment10 l.jpg

Components of the Modern Database Environment

Data Administrators

System Developers

End-users

Application

Programs

User

Interface

CASE Tools


Components of the modern database environment11 l.jpg

Components of the Modern Database Environment

Data Administrators

System Developers

End-users

Application

Programs

User

Interface

CASE Tools

DBMS


Components of the modern database environment12 l.jpg

Components of the Modern Database Environment

Data Administrators

System Developers

End-users

Application

Programs

User

Interface

CASE Tools

DBMS

Repository

Database


Distributed databases l.jpg

Distributed Databases


What is it l.jpg

What is it?

  • Historically, the traditional database system was highly decentralized

  • Modern integrated databases brought back the concept of centralization

  • Distributed Database concept is now pushing back toward decentralized data.

  • Within the next 10 years, integrated databases may be an “antique curiosity”


Definitions l.jpg

Definitions

  • a collection of interrelated, shared, and controlled data....

    • defines a database

  • Distributed database

    • logically interrelated collection of shared and controlled data distributed over a computer network

  • A Distributed DBMS (DDBMS)

    • software that manages a distributed database and makes that distribution transparent to the user


Ddbms l.jpg

DDBMS

  • single database split into several fragments

    • each fragments stored on separate computer

    • each fragment controlled by a separate DBMS

    • each computer part of a single network

  • Users use applications which access the data

    • only local data ---> local applications

    • data located elsewhere ---> global application

  • A DDBMS contains at least one global application


A sample topology l.jpg

A Sample topology

Database 3

Database 1

Network

Database 2


Homogenous vs heterogeneous l.jpg

Homogenous Vs. Heterogeneous

  • Homogenous

    • easier to design

    • provides for incremental growth (adding new sites is easy)

  • Heterogeneous

    • occurs when integration is considered post facto

    • translations required to communicate between different DBMS

    • relational DBMS sites use a gateway


Data representation l.jpg

Data Representation

  • Binary digit (bit)

  • String of bits (Byte)

  • EBCDIC vs. ASCII

  • Picture Element (Pixel)


How much does your data weigh l.jpg

How much does your data weigh?

  • If an 8GB hard disk weighs approximately one pound …include the weight of shared enclosure, power supply, and electronics …

  • 8 TB would theoretically weight one ton !

  • Some companies

    • Aetna: 21.8 tons (174.6 TB across 4100 DASD)

    • Boeing: 50-150 TB (6 – 19 tons)

    • Atos Origin 37.5 tons (300 TB)

      • Source: Computerworld, April 23, 2001


Data storage l.jpg

Data Storage

  • In Web-era, data is piling up quickly; space at a premium

  • Storage solutions

    • Server-hosted storage

    • SCSI Arrays

    • Network Attached Storage (NAS)

    • Storage Area Networks (SAN)


Server hosted storage l.jpg

Server-hosted storage

  • Both applications and storage on same server

  • Advantage

    • Server, OS, and storage all from the same vendor

    • Easy to replicate

  • Disadvantages

    • Expansion limited by server architecture (may need to replace existing media)

    • Free space on one server not easily accessed by another server

    • Maintenance affects server and storage (CPUs become obsolete before storage)


Scsi arrays small computer system interface scuzzy l.jpg

In a survey by InfoWorld, 67% were using SCSI arrays for storage

Often used with RAID

Advantages

Embedded computer to manage configuration and monitor performance

Can be made fault-tolerant

SCSI cable offers good throughput

Disadvantages

Expansion difficult once space is used

Significant costs of layout (SCSI cable limited in distance)

SCSI ArraysSmall Computer System Interface (“scuzzy”)


Network attached storage nas l.jpg

Network Attached Storage (NAS)

  • Devices that can be plugged into LAN using standard network cables

  • Advantages

    • Easiest and cheapest

    • Pre-configured with OS tailored for data handling

    • Can be few GB to several TB

    • Easy to connect

    • Faulty components can be changed without downtime

  • Disadvantages

    • Adds burden to LAN traffic

    • Access speed limited by bandwidth

    • Each NAS device has to managed independently


Storage area networks san l.jpg

Storage Area Networks (SAN)

  • Dedicated network of servers and storage devices

  • Uses hubs and switches

  • No limit to number of storage servers

  • Uses fiber – can extend long distances; good bandwidth (fibre channel)

  • Easy to set up – needs special adaptors

  • Works with any OS

  • Easy migration from old systems


San why so few l.jpg

SAN – why so few?

  • In the InfoWorld survey, only 14% had SAN

  • Problems cited

    • Lack of internal knowledge

    • High cost (can be several million dollars depending on size)

    • Perception that it is only for large companies

  • Computerworld projects 70% corporate data on SANs by 2005 (Jan 28, 2002 issue)

  • Possible solution

    • Storage Service Provider (SSP)

    • Manage data for you


The future of storage l.jpg

The future of storage

  • Fibre Channel

    • network technology designed for storage and server clustering

  • iSCSI

    • SCSI codes encapsulated in IP packets for transmission over Ethernet networks

  • Fibre Channel over IP (FCIP)

    • Tunnels data between geographically dispersed SANs over IP networks

  • Internet Fibre Channel Protocol

    • Hybrid version of FCIP that sends FC data over IP networks using iSCSI protocols (to interconnect exisiting SANs)

  • Infiniband

    • An I/O technology that allows overcoming problems with tradition PCI buses.


Storage virtualization l.jpg

Storage Virtualization

  • Software that links different storage devices into one virtual pool

    • Can link NAS, SAN, and DASD

  • Helps with storage management by introducing new layer of abstraction

  • Excellent for creating sense of homogeneity

  • Example is SANsymphony by DataCore Software


Data warehouses and data mining l.jpg

Data Warehouses and Data Mining


Data requirements l.jpg

Data Requirements

  • Organizations need access to

    • operational data

    • historical data

    • legacy data

    • subscription databases

    • internet data

  • Organizations need to

    • combine data, slice and dice, do complex analysis...


Analytical processing requirements l.jpg

Analytical Processing Requirements

  • Database systems need to support at least 4 levels of analysis within the firm

    • simple queries

    • “what if” analysis

    • causation

    • prognostication


The levels l.jpg

The levels...

  • Simple queries

    • using historical and current data

    • typically done with spreadsheets or SQL

  • “What-if”

    • if labor costs increase by 5% next year, and sales are stagnant, what will happen to profits?

    • spreadsheets and database tools


Levels l.jpg

Levels...

  • Causation

    • step back and analyze the past to see what caused the current state of events

    • why did cough syrup sales increase in the Northeast in January when it stayed constant elsewhere... influenza? ... competitors go bust?

  • Prognostication

    • what current conditions must change to increase profits by 5% next year/


Data warehouses l.jpg

Data Warehouses

  • Aimed at supporting all levels of analysis and information formats

  • DSS’ have existed for many years

  • Labeled data warehouse in the 1990s and top executives began top pay notice

  • Many different definitions (some relating to data, others to people or processes)


Simple definition l.jpg

Simple Definition

A data warehouse is a collection of integrated, subject-oriented databases designed to support the decision support function, where each unit of data is relevant to some moment in time.


Four defining concepts l.jpg

Four Defining Concepts

  • Subject-oriented

  • Integrated

  • Time-variant

  • Non-volatile


Concepts l.jpg

Concepts....

  • Subject-oriented

    • requires database design

    • revolves around specific business entities

    • many companies simply pull together old files

  • Integrated data

    • data warehouse database designed using a proper methodology

    • consistency in naming conventions for keys, relationships etc.

    • warehouses require large design effort


Data mining l.jpg

Data Mining

True genius resides in the capacity for evaluation of uncertain, hazardous, and often conflicting information

- Sir Winston Churchill


What is data mining l.jpg

What is data mining?

  • Large databases can be searched for relationships patterns, and trends, which prior to the search were not known to exist.

  • Data mining is the process of asking a processing engine to show answers to questions that we do not know how to ask.


Data mining techniques l.jpg

Data Mining techniques

  • Four major types of processing algorithms (or rules):

    • associations

    • clustering

    • classification

    • sequential patterns


Associations link analysis l.jpg

Associations (Link Analysis)

  • Find correlations between one set of items or events and another such set

  • eg: 78% of all people who buy a desktop PC will also buy add-ons

  • eg: large percentage of buyers will buy potato chips if they are stacked near the beverages aisle...


Clustering l.jpg

Clustering

  • Used to discover hitherto unknown or unsuspected class of data

  • Defect Analysis or Group affinity analysis

  • Some particular common characteristic between good customers that cancel their own credit cards


Classification l.jpg

Classification

  • Identifies the process and must discover the rules that whether an item belongs to a particular subset of data (a subtype)

  • Eg: Credit card approval

    • do a variety of customer characteristics put him/her in a subset of customers who can charge?


Sequential patterns l.jpg

Sequential Patterns

  • Mostly used for pattern analysis

  • uses historical data store of all transactions in a warehouse

  • Eg: Buyers who purchase window coverings and then buy linens within three months will purchase furniture within the next 12 months (new residence furnishings buying pattern)


  • Login