1 / 31

Chapter 1

Chapter 1. For IS420/IS520. Define “database”. A database is a self-describing collection of integrated tables that is computerized Self describing Collection of related tables/records Computerized

jason
Download Presentation

Chapter 1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 1 For IS420/IS520

  2. Define “database” • A database is a self-describing collection of integrated tables that is computerized • Self describing • Collection of related tables/records • Computerized • In the real world, any database can only be accessed by the Database Management System (DBMS) that created the database

  3. Why databases • Share data • Data/information needs to be shared to run a business • We can run a company without its CEO, but not without data • Reduce redundancy • Reduce inconsistent data • Support transaction • Support data integrity • Enforce security • Support and enforce for standards • Meet complicated requirements

  4. DBMS • It is a piece of system software • At enterprise level, it is always a piece of server level software • It is OS dependent • The major vendors • Oracle • MySQL • Microsoft • IBM • Sybase • RDS

  5. Services of an enterprise DBMS • Provides basic services: • Moving data to/from physical files using OS • Managing concurrent access, NOT NECESSARILY by multiple users • Managing transactions so that each is an “all or nothing” unit of work • Support SQL • Provisions for backup and recovery • Security to prevent unauthorized access and modification

  6. Layers of Data Abstraction • The DBMS presents users with distinct views of the data (views), while storingthe data only once, to support users and application with different access rights.

  7. From Andy Oppel’s website

  8. External Model • Represents user’s point of view • External level consists of many different external views • Each external view is created for different users or application and show different portion of the actual database • User may be unaware of other entities, attributes and relationships • External views are described in External schemas which are written in the Data Definition Language

  9. Conceptual Model • Data architect point of view • Middle level that contains entirety of the DB • Complete logical model • Entities, attributes, relationships, constraints on data, semantic information, security and integrity • Conceptual model supports external views • Closest to E-R diagram • Some people think this is DBA’s view, I disagree • For example, a split table is not a view for this level • Conceptual model is relatively constant • It is written in DDL, stored in system database and compiled by DBMS

  10. Physical Model • Describes how the data is stored in the DB, that is the physical implementation of the DB • Includes data structures and file organization • Works with Os • To lay out data and other DB objects on the storage devices • Build indexes, etc. • Simply, it is managed by OS but under the instruction of DBMS, which is why DBMS is always OS dependent

  11. Why Three Level Architecture • To achieve data independence • Logical data independence • Physical data independence • This is the main reason we move from file systems to databases

  12. Logical Data Independence • Refers to immunity of external schemas to changes in conceptual schema such as • adding and removing records, • adding tables, indexes, or views • adding columns • However, removing columns and changing access right may affect external schemas • Conceptual schema should not affect external schema or rewrites of application programs

  13. Physical Data Independence • Refers to immunity of conceptual schema to changes in the physical file structure such as different organizations and storage devices • For example, if DBA moved database files from one disk to another should not trigger changes to conceptual or external schemas.

  14. DBMS models • Flat Files • Hierarchical Model • Network Model • ****Relational Model • Object-Oriented Model • Object-Relational Model

  15. Relational Data Model • Primary unit of storage is the table • Each table may be used independently or joins may be used to combine tables • Relationships defined using primary and foreign keys (This is referred as INTEGRATED in our db definition) • Easily defined integrity constraints

  16. Relational Model -- Benefits • Benefits: • Very fast retrieval in most cases • Users need no awareness of physical storage • Complex queries relatively easily developed • Data usually more accurate • Easier application programming, relatively speaking • Standard query language (SQL) • Supported by relational algebra and relational calculus

  17. Relational Model -- Drawbacks • Drawbacks: • Tables must be joined to retrieve related data, a very expensive operation both in terms of CPU and memory • Users must understand relationships between Tables to pull information out of the databases • Users must learn SQL or rely on some query tool

  18. An Example of E-R model

  19. Terminology Comparison

  20. Why Relational • Most stable -- mature technology with well established ANSI and ISO standards • A few reputable vendors for DBMS • Easy to define, maintain and manipulate data with SQL • Data is well protected using defined integrity constraints

  21. History of DBMS • GUAM (Generalized Update Access Method from North American Aviation) provided the first hierarchical structure • In the mid-60’s, IBM joined NAA to develop GUAM into IMS (Information Management System), the first commercial hierarchical DBMS • IDS (Integrated Data Store) from General Electric as the first network DBMS

  22. History of DBMS (2) • IBM extended IMS to include some network capabilities, overcoming the “single parent” restriction • CODASYL (Conference on Data Systems Languages) and the DBTG (Database Task Group) published standards in 1969-1971

  23. History of DBMS (3) • Dr. E.F. (Ted) Codd delivers pioneering white paper "A Relational Model of Data for Large Shared Data Banks" in June, 1970. • CODASYL published specifications for a standard Network DBMS in 1971, which began 5 years of heated controversy between a group of Network DBMS advocates (including Charles Bachman) and a group of Relational advocates.

  24. CODASYL “Camp" Position • Relational Model too mathematical that programmers would not be able to understand and use it. • An efficient implementation of Relational could not be built. • Online transaction processing applications want to do a-record-at-a-time processing, not a set processing

  25. Relational “Camp” Position • Nothing as complicated as the DBTG proposal could possibly be the right way to do data management. • Set-oriented queries are too difficult to program using the DBTG language. • CODASYL has no formal underpinnings (read math).

  26. A true event • The debate came to head at the 1975 ACMSIGMOD conference in a debate where Codd and 2 others squared off against Bachman and 2 others. The audience was more confused at the end of the two talks and ensuing discussion than at the outset.

  27. CODASYL vs. Relational • By the late 1970's, interest in CODASYL began to decline. Michael Stonebraker believes this was because of 2 factors: • Easier to use relational languages such as QUEL and SQL were developed. • Prototype Relational RDBMS's were developed that proved that implementations could be done with reasonable efficiency.

  28. Prototype Relational RDBMSs • System R, developed by 15 IBM researchers in San Jose under the direction of Frank King from 1974 to 1978. • INGRES developed by a team of UC Berkeley students under the direction of Michael Stonebraker and Eugene Wong from 1973 to 1977. • Sybase was started with some INGRES people

  29. Early Commercial RDBMSs • System R was built commercially and became the basis for HP ALLBASE and IDMS/SQL. • A bit later, Larry Ellison started Oracle and independently implemented the external specifications for System R. • IBM, with some rewriting, developed System R into SQL/DS and DB2.

  30. History of DBMSs • In 1976, Peter Chen presented the Entity-Relationship model, bolstering modeling weaknesses in the relational model. Many other modeling techniques followed. • Nowadays, when we say database, we mean relational database • Most databases are there to support transactional applications (one record at a time to record, multi records to retrieve for displaying), such as Amazon • Other database application types are batch and decision support systems.

  31. Trends •  XML for document processing and exchange of information • Data mining and decision support systems • Unstructured queries • NoSQL (MongoDB) • DBMS for cloud computing such as RDS from AMAZON

More Related