1 / 37

CS511 Design of Database Management Systems

CS511 Design of Database Management Systems. Lecture 02: Codd’s Relational Model Kevin C. Chang. Announcements. Comments/questions in class: please ask for and use a microphone to speak up please tell us your name first. Contributions of this Paper?. Contributions of this Paper?.

dacian
Download Presentation

CS511 Design of Database Management Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS511Design of Database Management Systems Lecture 02: Codd’s Relational Model Kevin C. Chang

  2. Announcements • Comments/questions in class: • please ask for and use a microphone to speak up • please tell us your name first

  3. Contributions of this Paper?

  4. Contributions of this Paper? • Concepts of data independence and declarative queries • argues need for more natural data-centric modeling • argues declarative queries • Complete definition of relational model • data structure based on relations • algebra for manipulating data • Formal theoretic notions • expressive power, redundancy, and consistency • starting research on “database theories”

  5. Data Model • Mathematical formalism providing: • notation & structure for describing data • set of operations used to manipulate that data • A way of organizing a collection of facts pertaining to a system being managed • A way of thinking about the world and the phenomena • Fundamental “paradigm” of data management

  6. DBMS: Levels of Abstraction?? • Physical implementation • storage structures, access methods • Logical data model • conceptual data structure and manipulation • Views • different portions of the database • ?? Who should see each different level?

  7. The State of the World: COBOL Camp • Network/hierarchical DBMS, 1960’s • IDS network DBMS: Bachman at GE, 1961 • IMS hierarchical DBMS: IBM in 1968 – Still exists today! # • CODASYL approach to data management, 1960’s • CODASYL: Conf. Of Data System Languages, set up by US DOD, to standardize software applications • COBOL (comm. bus. oriented lang.) defined by CODASYL • ruled the business data processing world • incorporating prior data definition languages • DBTG (Database Task Group, under CODASYL), 1971 • closely aligned with COBOL • DBTG Report would standardize network model • Bachman (for network model) got Turing award in 1973

  8. The State of the World: Relational camp • Codd’s paper in 1970 • resistance even within IBM • First implementations, 1973 • System R at IBM San Jose Lab • INGRES at UC.Berkeley • The “Great Debate” in 1975 SIGMOD conf. • Codd got Turing award in 1981

  9. Arguments Against the Other Side? • COBOL/CODASYL  Relational • too mathematical (to understand) • Relational  COBOL/CODASYL • too complicated (to program)

  10. All About Paradigm Shift • Bachman: offline sequential access --> online navigation This revolution in thinking is changing the programmer from a stationery viewer of objects passing before him in core into mobile navigator who is able to probe and traverse a database at will. The Programmer as Navigator, 1973 Turing Lecture • Codd: navigation over records & links --> declaration over relations The most important motivation for this research work is to provide a sharp and clear boundary between the logical and physical aspects of data management. … We call this data independence objective. Relational Databases: A Practical Foundation for Productivity, 1981 Turing Award Lecture

  11. Network Model: DBTG Report • Network DB: • a collection of records • record = collection of fields • similar to an entity in E-R model • records connected by binary, many-to-one links • similar to binary relationships in E-R model • simulate one-to-one, many-to-many by many-to-one

  12. Network Model: Implementation # • Student record linked to enrollment record • A lot of pointer linkage • ring-structured ptrs implements many-one links • Data manipulation is thus navigational Johnson … CS001 A+ CS308 B

  13. DBTG Query Example select name from student where dept = “CS” student.dept = “CS”; findany student using dept; while DB-status = 0 do begin get student; print (student.name); findduplicate student using dept; end

  14. DBTG Query Example: Predicates select name from student where dept = “CS” and grade >= “A-” student.dept = “CS”; findany student using dept; while DB-status = 0 do begin get student; if student.grade >= “A-” print (student.name); findduplicate student using dept; end

  15. DBTG Query Example: Navigation # select E.grade from student S, enrollment E where S.name = “Johnson” and E.id = S.id student.name = “Johnson”; findany student using name; find first enrollment within StudentEnroll while DB-status = 0 do begin get enrollment; print (enrollment.grade); findnext enrollment within StudentEnroll; end

  16. What’s Wrong?

  17. What’s Wrong? • Mix presentation and access in modeling • very much like HTML today :-) • Difficult and complex to program • Data dependence

  18. Data Dependence • Ordering dependence • stored ordering should be independent of access & presentation • Indexing dependence • indices should be redundant and for “optimization” • Access path dependence • access paths to data should not be hard-coded

  19. Relational Model Provide Independence? • Ordering dependence? • Indexing dependence? • Access path dependence?

  20. Relations • Definition • given sets S1, S2, …, Sn (not distinct) • relation R is a subset of the Cartesian product S1 x S2 x … x Sn • Sj is jth domain of R, n is degree of R • Properties • each row represents an n-tuple of R • ordering of rows is immaterial • all rows are distinct • ordering of columns is significant • label each column with the name of the corresponding domain

  21. Data Manipulation • Relational algebra • operations • Relation calculus • semantics in terms of logics • Essential Beauty of Relational Model: Named Relations Expressible Relations

  22. Operations on Relations • Usual set operations: • since relations are sets of homogeneous tuples

  23. Operations on Relations: Relation Deriving • Permutation • interchange the columns of an n-ary relation • Projection • select columns and remove any duplication in the rows • Join • selectively combining tuples in two relations • as a “class” of new relations that losslessly take some columns from either source relations-- why such a def suffices for here? • Composition • join two relations and remove join columns • Restriction • filter one relation with another

  24. Algebra: Questions • What’s missing in this set of operators? • Is it minimal? • How is it different from “current” algebra?

  25. The Algebra Primitives: NOW • Selection • Projection • Set union • Set difference • Cartesian product • Renaming

  26. Redundancy and Consistency • Redundancy • redundant if something can be “derived” from others • foundation: what operations allowed in “derivation” • Consistency • data snapshot must satisfy some constraints • Set the landscape for database theory research • e.g., normal forms; normalization

  27. What’s Good about the Relational Model? • Simplicity! • Mathematically complete data model • Declarative query languages • queries can be automatically compiled, executed, and optimized without resorting to programming

  28. Unexpected Benefits • Client-server architecture • SQL request/response enables high-level, compact exchange between clients and server • clients: input and output, application logics • server: data processing • Parallel processing: relations in and out • pipeline: piping the output of one op into the next • partition: N op-clones, each processes 1/N input • Graphical user interfaces • relations fits the spreadsheet (table) metaphor

  29. What Have Changed Over the Years? • Changed by implementations • First implementations (System R and INGRES) started 1973, 74 • Changed over time

  30. What Have Changed Over the Years? • Columns identified by position instead of names • ordering of columns was significant • names (as role.domain) were simply “user interface” • Rows were distinct-- set semantics of relation • SQL: “select distinct” to eliminate duplicates

  31. What Have Changed Over the Years? • Semantically defined non-algebraic join • Codd’s joins can be “plural” • joins are now defined “operationally”

  32. What Have Changed Over the Years? • Non-simple domains: i.e. complex objects • allowed only built-in data types • new: object-relational DB, multimedia DB • Generations of relations: temporal aspect • temporal databases • e.g.: query GPA at the end of year 2000

  33. Problems with the Relational Model • Too simple? • unable to handle beyond business data processing? • Data is often hierarchical/complex in nature • normalization is unnatural decomposition of data for storage, to be assembled by joins at query time

  34. Network/Hierarchical Model Coming Back? • Web is a huge navigational database! • XML is both navigational and hierarchical <student> <name>John Smith</name> <dept>CS</dept> <enrollments> <enrollment> <course>CS311</course> <grade>A+</grade> </enrollment> … … // more enrollments </enrollment> <student>

  35. Relational System Influence • Codd’s paper published in 1970 • First implementations, 1973-- • System R at IBM San Jose Lab, 1974-1978 • INGRES at UC.Berkeley, 1973-1977 • System R influence: • IBM DB2 • HP ALLBASE • Oracle: started from published spec. of System R • INGRES: • member later funded Sybase • evolved into Microsoft SQL server by buying code from Sybase

  36. What’s Next? • System R: • one of the first implementations of relational DBMS • IBM San Jose Research Labs

  37. End Of Talk

More Related