lecture 1 introduction to databases l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Lecture 1: Introduction to databases PowerPoint Presentation
Download Presentation
Lecture 1: Introduction to databases

Loading in 2 Seconds...

play fullscreen
1 / 20

Lecture 1: Introduction to databases - PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on

Lecture 1: Introduction to databases. Timothy G. Griffin Easter Term 2008 – IB/Dip/IIG www.cl.cam.ac.uk/Teaching/current/Databases/. Database Prehistory. Data entry . Storage and retrieval . Query processing . Sorting . Early Automation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lecture 1: Introduction to databases' - adamina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
lecture 1 introduction to databases

Lecture 1:Introduction to databases

Timothy G. Griffin

Easter Term 2008 – IB/Dip/IIG

www.cl.cam.ac.uk/Teaching/current/Databases/

database prehistory
Database Prehistory

Data entry

Storage and retrieval

Query processing

Sorting

early automation
Early Automation
  • Data management and application code were all tangled together
    • Hard to modify
    • Hard to generalize
  • Many competing approaches
  • Data manipulation code written at very low levels of abstraction
our hero e f codd
Our Hero --- E. F. Codd

Edgar F. "Ted" Codd ( August 23, 1923 - April 18, 2003) was a British computer scientist who invented relational databases while working for IBM.

He was born in Portland, Dorset, studied maths

and chemistry at Oxford. He was a pilot in the

Royal Air Force during WWII. In 1948 he joined IBM

in New York as a mathematical programmer. He fled

the USA to Canada during the McCarthy period.

Later, he returned to the USA to earn a doctorate in CS

from the University of Michigan in Ann Arbor. He then

joined IBM research in San Jose.

His 1970 paper “A Relational Model of Data for

Large Shared Data Banks” changed everything.

In the mid 1990’s he coined the term OLAP.

database management systems dbmss
Database Management Systems (DBMSs)

Database abstractions

allow this interface to

be cleanly defined and

this allows applications

and data management

systems to be

implemented

separately.

Your Applications

Go Here

DBMS

Raw Resources (bare metal)

today database systems are ubiquitous
Today, Database Systems are Ubiquitous

Database system design from the European Bioinformatics Institute (Hinxton UK)

Other archives

Database

design

Development

DB

End Users

Data exchange

Service Tools

Production DB

Service DB

Submission tools

Submitters

Data Distrib.

Add value

(computation)

Releases

&

Updates

Releases

&

Updates

Q/C etc

Add value (review etc.)

what is a database system
What is a database system?
  • A database is a large, integrated collection of data
  • A database contains a model of something!
  • A database management system (DBMS) is a software system designed to store, manage and facilitate access to the database
what does a database system do
What does a database system do?
  • Manages Very Large Amounts of Data
  • Supports efficient access to Very Large Amounts of Data
  • Supports concurrent access to Very Large Amounts of Data
  • Supports secure, atomic access to Very Large Amounts of Data
databases are a rich area for computer science
Databases are a Rich Area for Computer Science
  • Programming languages and software engineering (obviously)
  • Data structures and algorithms (obviously)
  • Logic, discrete maths, computation theory
    • Some of today’s most beautiful theoretical results are in “finite model theory” --- an area derived directly from database theory
  • Systems problems: concurrency, operating systems, file organisation, networks, distributed systems…

Many of the concepts covered in this course are “classical” --- they form

the heart of the subject. But the field of databases is still evolving and

producing new and interesting research (hinted at in lectures 11 & 12).

what this course is about
What this course is about
  • According to Ullman, there are three aspects to studying databases:
    • Modelling and design of databases
    • Programming
    • DBMS implementation
  • This course addresses 1 and 2
course outline
Course Outline
  • Introduction
  • Entity-Relationship Model
  • The Relational Model
  • The Relational Algebra
  • The Relational Calculus
  • Schema refinement: Functional dependencies
  • Schema refinement: Normalisation
  • Transactions
  • Online Analytical Processing (OLAP)
  • More OLAP
  • Basic SQL and Integrity Constraints
  • Further relational algebra, further SQL
recommended reading
Recommended Reading
  • Date, “An introduction to database systems”, 8th ed.
  • Elmasri & Navathe, “Fundamentals of database systems”, 4th ed.
  • Silberschatz, Korth & Sudarshan, “Database system concepts”, 4th ed.
  • Ullman & Widom, “A first course in database systems”.
  • OLAP
    • DB2/400: Mastering Data Warehousing Functions. (IBM Redbook) Chapters 1 & 2 only. http://www.redbooks.ibm.com/abstracts/sg245184.html
    • Data Warehousing and OLAPHector Garcia-Molina (Stanford University)http://www.cs.uh.edu/~ceick/6340/dw-olap.ppt
    • Data Warehousing and OLAP Technology for Data Mining Department of ComputingLondon Metropolitan Universityhttp://learning.unl.ac.uk/csp002n/CSP002N_wk2.ppt
some systems to play with
Some systems to play with
  • mysql:
    • www.mysql.org
    • Open source, quite powerful
  • PostgreSQL:
    • www.postgresql.org
    • Open source, powerful
  • MicrosoftAccess:
    • Simple system, lots of nice GUI wrappers
  • Commercial systems:
    • Oracle 10g (www.oracle.com)
    • SQL Server 2000 (www.microsoft.com/sql)
    • DB2 (www.ibm.com/db2)
database system architecture
Database system architecture
  • It is common to describe databases in two ways
    • The logical level:
      • What users see, the program or query language interface, …
    • The physical level:
      • How files are organised, what indexing mechanisms are used, …
  • It is traditional to split the logical level into two: overall database design (conceptual) and the views that various users get to see
  • A schema is a description of a database
three level architecture

External

level

Conceptual

level

Physical

level

Three-level architecture

External

Schema 1

External

Schema 2

External

Schema n

Conceptual

Schema

Internal

Schema

logical and physical data independence
Logical and physical data independence
  • Data independence is the ability to change the schema at one level of the database system without changing the schema at the next higher level
  • Logical data independence is the capacity to change the conceptual schema without changing the user views
  • Physical data independence is the capacity to change the internal schema without having to change the conceptual schema or user views
database design process
Database design process
  • Requirements analysis
    • User needs; what must database do?
  • Conceptual design
    • High-level description; often using E/R model
  • Logical design
    • Translate E/R model into (typically) relational schema
  • Schema refinement
    • Check schema for redundancies and anomalies
  • Physical design/tuning
    • Consider typical workloads, and further optimise

Next Lecture

the fundamental tradeoff of database performance tuning
The Fundamental Tradeoff of Database Performance Tuning
  • De-normalized data can often result in faster query response
  • Normalized data leads to better transaction throughput, and avoids “update anomalies” (corruption of data integrity)
  • Yes, indexing data can speed up transactions, but this just proves
  • the point --- an index IS redundant data. General rule of thumb:
  • indexing will slow down transactions!

What is more important in your database --- query response

or transaction throughput? The answer will vary.

What do the extreme ends of the spectrum look like?

a theme of this course oltp vs olap
A Theme of this Course:OLTP vs. OLAP
  • OLTP = Online Transaction Processing
    • Need to support many concurrent transactions (updates and queries)
    • Normally associated with the “operational database” that supports day-to-day activities of an organization.
  • OLAP = Online Analytic Processing
    • Often based on data extracted from operational database, as well as other sources
    • Used in long-term analysis, business trends.
design heterogeneity
Design Heterogeneity

De-normalized Derived Tables

--- for fast access

Database system design from

the European

Bioinformatics Institute (Hinxton UK)

Other archives

Database

design

Normalized Tables

Development

DB

End Users

Data exchange

Service Tools

Production DB

Service DB

Submission tools

Submitters

Data Distrib.

Add value

(computation)

Releases

&

Updates

Releases

&

Updates

Q/C etc

Add value (review etc.)