C20 0046 database management systems lecture 1
Download
1 / 39

- PowerPoint PPT Presentation


  • 188 Views
  • Uploaded on

C20.0046: Database Management Systems Lecture #1. M.P. Johnson Stern School of Business, NYU Spring, 2008. What Is a Database?. A large, integrated collection of data which models a real-world enterprise: Entities students, courses, instructors, TAs Relationships

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - mikko


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
C20 0046 database management systems lecture 1 l.jpg

C20.0046: Database Management SystemsLecture #1

M.P. Johnson

Stern School of Business, NYU

Spring, 2008

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


What is a database l.jpg
What Is a Database?

  • A large, integrated collection of data

  • which models a real-world enterprise:

    • Entities

      • students, courses, instructors, TAs

    • Relationships

      • Hillary is currently taking C20.0046

      • Barack is currently teaching C20.0046

      • John is currently TA-ing C20.0046 but took it last semester

  • A Database Management System (DBMS)is a software package that stores and manages DBs

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Databases are everywhere non web l.jpg
Databases are everywhere: non-web

  • criminal/terrorist: TIA

  • NYPD’s CompStat

    • Tracking crime stats by precinct

  • airline bookings

  • Retailers: Wal-Mart, etc.

    • when to re-order, purchase patterns, data-mining

  • Genomics

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Databases are everywhere web l.jpg
Databases are everywhere: web

  • retail: Amazon, etc.

    • data-mining: Page You Made

  • search engines

  • searchable DBs: IMDB, tvguide.com, etc.

  • Web2.0 sites:

    • flickr = images + tags

  • CMS systems (Wikis, blog & forum software, etc.)

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Databases involved in ordering a pizza l.jpg
Databases involved in ordering a pizza?

  • Pizza Hut’s DB

  • Credit card records

  • CC  approval by credit agencies

  • phone company’s records

    • (“Pull his LUDs, Lenny.”)

  • Caller ID

    • Error-checking, anticrime

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Your wallet is full of db records l.jpg
Your wallet is full of DB records…

  • Driver’s license

  • Credit cards

  • NYUCard

  • Medical insurance card

  • Social security card

  • Money (serial numbers)

  • Photos (ids on back)

  • Etc…

“You may not be interested in databases, but databases are interested in you.” - Trotsky

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Example of a traditional db app l.jpg
Example of a Traditional DB App

  • Suppose we build a system

  • We store:

    • checking accounts

    • savings accounts

    • account holders

    • state of each of each person’s accounts

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Can we do without a dbms l.jpg
Can we do without a DBMS?

Sure! Start by storing the data in files:

checking.txt savings.txt customers.txt

Now write C or Java programs to implement specific tasks…

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Doing it without a dbms l.jpg
Doing it without a DBMS...

  • Transfer $100 from George’s savings to checking:

Write a C program to do the following:

  • Read savings.txt

  • Find&update the line w/“George”

  • balance -= 100

  • Write savings.txt

  • Read checking.txt

  • Find&update the line w/“George”

  • balance += 100

  • Write checking.txt

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Problems without an dbms l.jpg
Problems without an DBMS...

1. System crashes:

  • Same problem even if reordered

  • High-volume  (Rare  frequent)

    2. Simultaneous access by many users

  • George and Dick visit ATMs at same time

  • Lock checking.txt before each use–what is the problem?

  • Read savings.txt

  • Find&update the line w/ “George.”

  • Write savings.txt

  • Read checking.txt

  • Find&update the line w/ “George”

  • Write checking.txt

CRASH!

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Problems without a dbms l.jpg
Problems without a DBMS...

3. Large data sets (100s of GBs, or TBs, …)

  • No indices

    • Finding “George” in huge flatfile is expensive

  • Modifications intractable without better data structures

    • “George”  “Georgie” is very expensive

    • Deletions are very expensive

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Problems without an dbms12 l.jpg
Problems without an DBMS...

5. Security?

  • File system may lack security features

  • File system security may be coarse

    6. Application programming interface (API)?

  • Interfaces, interoperability

    7. How to query the data?

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


In homebrew system must support l.jpg
In homebrew system, must support

  • failover/rovery

  • concurrent use

  • deal with large datasets?

  • security

  • interop?

  • querying in what?

  •  DBMS as application

  • Q: How does a DBMS solve these problems?

  • A: See third part of course, but for now…

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


One big issue transaction processing l.jpg
One big issue: Transaction processing

  • Grouping of several queries (or other DB operations) into one transaction

  • ACID test properties

    • Atomicity

      • all or nothing

    • Consistency

      • constraints on relationships

    • Isolation

      • concurrency control

      • simulated solipsism

    • Durability

      • Crash recovery

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Atomicity durability l.jpg
Atomicity & Durability

  • Avoiding inconsistent state

  • A DBMS prevents this outcome

    • xacts are all or nothing

  • One simple idea: log progress of and plans for each xact

    • Durability: changes stay made (with log…)

    • Atomicity: entire xact is committed at once

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Isolation l.jpg
Isolation

  • Many users  concurrent execution

    • Disk access is slow (compared to CPU)

    •  don’t waste CPU – keep running

  • Interweaving actions of different user programs

  •  but can lead to inconsistency:

    • e.g., two programs simultaneously withdraw from the same account

  • For each user, should look like a single-user system

    • Simulated solipsism

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Isolation17 l.jpg
Isolation

  • Contrast with a file in two Notepads

    • Strategy: ignore multiple users

    • whichever saves last wins

    • first save is overwritten

  • Contrast with a file in two Words

    • Strategy: blunt isolation

    • One can edit

    • To the other it’s read-only

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Consistency l.jpg
Consistency

  • Each xact (on a consistent DB) must leave it in a consistent state

    • can define integrity constraints

    • checks that the defined claims about the data

    • Only xacts obeying them are allowed

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


A level up data models l.jpg
A level up: data models

  • Any DBMS uses a data model: collection of concepts for describing data

  • Relational data model: basically universal

    • Oracle, DB2, SQLServer, other SQL DBMSs

      • Relations: table of rows & columns

    • a rel’s schema defines its fields

  • Though some have OO extensions…

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Data schemas l.jpg
Data Schemas

  • Schema: description of partic set of data, using some data model

  • “Physical schema”

    • Physical files on disk

  • Schema

    • Set of relations/tables, with structure

  • Views (“external schema”)

    • Virtual tables generated for user types

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Schema e g college registrar l.jpg
Schema e.g.: college registrar

  • Schema:

    • Students(ssn: string, name: string, login: string, age: int, gpa: real)

    • Courses(cid: string, cname: string, credits: int)

    • Enrolled(sid:string, cid:string, grade: string)

  • Physical schema:

    • Relations stored as unordered text files.

    • Indices on first column of each rel

  • Views:

    • My_courses(cname: string, grade: string, credits: int)

    • Course_info(ssn: string, name: string, status: string)

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


How the programmer sees the dbms l.jpg
How the programmer sees the DBMS

  • Start with SQL DDL to create tables:

  • Continue with SQL to populate tables:

CREATE TABLE Students (

Name CHAR(30)

SSN CHAR(9) PRIMARY KEY NOT NULL,

Category CHAR(20)

);

INSERT INTO Students

VALUES('Hillary', '123456789', 'undergraduate');

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


How the programmer sees the dbms23 l.jpg
How the programmer sees the DBMS

Takes:

Students:

  • Ultimately files, but complex

Courses:

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Querying s tructured q uery l anguage l.jpg
Querying: Structured Query Language

  • Find all the students who have taken C20.0046:

  • Find all the students who C20.0046 previously:

  • Find the students’ names:

SELECT SSN FROM Takes

WHERE CID='C20.0046';

SELECT SSN FROM Takes

WHERE CID='C20.0046' AND Semester='Fall, 2005';

SELECT Name FROM Students, Takes

WHERE Students.SSN=Takes.SSN AND

CID='C20.0046' AND Semester='Fall, 2005';

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Database industry l.jpg
Database Industry

  • Relational databases are based on set theory

  • Commercial DBMSs: Oracle, IBM’s DB2, Microsoft’s SQL Server, etc.

  • Opensource: MySQL, PostgreSQL, etc.

  • DBAs manage these

  • Programmers write apps (CRUD, etc.)

  • XML (“semi-structured data”) also important

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


The study of dbms l.jpg
The Study of DBMS

  • Primary aspects:

    • Data modeling

    • SQL

    • DB programming

    • DBMS implementation

  • This course covers all four (tho less of #4)

  • Also will look at some more advanced areas

    • XML, websearch, column-oriented DBs, RAID, RegExs, MapReduce…

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Course outline l.jpg
Course outline

  • Database design:

    • Entity/Relationship models

    • Modeling constraints

  • The relational model:

    • Relational algebra

    • Transforming E/R models to relational schemas

  • SQL

    • DDL & query language

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Course outline28 l.jpg
Course outline

  • Programming for databases

  • Some DB implementation

    • Indexes, sorting, xacts

  • Advanced topics…

  • May change as course progresses

    • partly in response to audience

  • Also “current events”

    • Slashdot/whatever, Database Blog, etc.

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Textbook database management systems l.jpg
Textbook: Database Management Systems

  • by Raghu Ramakrishnan & Johannes Gehrke

    • 3 edition (August 14, 2002)

  • Available:

    • NYU bookstore

    • Amazon/BN (may be cheaper)

    • Amazon.co.uk (may be cheaper still)

    • Links on class page

    • Difficult but good

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Sql readings l.jpg
SQL Readings

  • Many SQL references available online

  • Good online (free) SQL tutorials include:

    • A Gentle Introduction to SQL (http://sqlzoo.net/)

    • SQL for Web Nerds (http://philip.greenspun.com/sql/)

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Communications l.jpg
Communications

  • M. P. Johnson

  • To receive class mail:

    • Activate account: http://start.stern.nyu.edu

    • Forward mail: http://simon.stern.nyu.edu

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Communications32 l.jpg
Communications

  • Web page: http://pages.stern.nyu.edu/~mjohnson/dbms/

    • syllabus

    • course policies

    • antecedent courses

  • Blackboard web site

    • Some materials will be available here

    • Discussion board

      • send general-interest messages here to benefit all

    • Go to http://sternclasses.nyu.edu

    • Click on C20.0046

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Grading l.jpg
Grading

  • Prerequisites:

    • Light programming experience

    • A bit of “mathematical maturity”

    • Interest in IT/CS

  • Requirements & base score:

    • Homework 15%: O(3)

    • Project: 30% - see below…

    • Midterm (closed book/notes): 20%

    • Final (closed book/notes; likely 2hrs in class): 25%

    • Class participation/pop-quizzes: 10%

    • Stern Curve

  • Consistent class attendance is required

    • Absences will seriously affect your total grade.

    • Final score = base score – 2n-1

    • where n = # missed quizzes (if n>0)

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


The project design end to end db web app l.jpg
The Project: design end-to-end DB web app

  • data model

    • Identify entities & their relationships

    •  relations

  • creation of DB in Oracle/MySQL

    • Population with real(alistic) data

  • web app for accessing/modifying data

    • Identification of “interesting” questions & actions

    • Produce DBMS interface

  • Work in pairs (/threes)

    • Choose topic on your own

    • Start forming your group today!

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


Collaboration model l.jpg
Collaboration model

  • Homework and exams done individually

  • Project done with your team members only, though can in general use any tools

  • Non-cited use of others’ problem solutions, code, etc. = plagiarism

  • See Stern’s stern academic honesty policy

  • Contact me if you’re at all unclear before a particular case

  • Cite any materials used if you’re at all unclear after

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


On going feedback l.jpg
On-going Feedback

  • Don’t be afraid to ask questions

    • Some parts will be abstract/mathematical

  • Topic selection will be partly based on student interest

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


So what is this course about really l.jpg
So what is this course about, really?

  • Languages: SQL (some XML …)

  • Data modeling

  • Some theory! (rereading…)

    • Functional dependencies, normal forms

    • e.g., how to find most efficient schema for data

  • Some DBMS implementation (algs & data structs)

  • Algorithms and data structures (in the latter part)

    • e.g., indices make data much faster to find – how?

  • Lots of DB implementation and hacking for the project

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


For next time l.jpg
For next time

  • Get the book

  • Skim chapter 1

  • Start reading chapter 2

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


For right now tonight email survey l.jpg
For right now/tonight: email survey

  • Send to SOMEWHERE

  • name

  • previous cs/is/math/logic courses

  • previous programming experience

  • career plans: programmer, DBA, MBA, etc.

  • why taking class/what you’re interested in learning about

M.P. Johnson, DBMS, Stern/NYU, Spring 2008


ad