1 / 40

C20.0046: Database Management Systems Lecture #1

C20.0046: Database Management Systems Lecture #1. Matthew P. Johnson Stern School of Business, NYU Spring, 2004. Personnel. Instructor: Matthew P. Johnson KMEC 8-176, mjohnson@stern.nyu.edu Office hours: KMC 8-176, Th 12:15-2:15

jana
Download Presentation

C20.0046: Database Management Systems Lecture #1

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. C20.0046: Database Management SystemsLecture #1 Matthew P. Johnson Stern School of Business, NYU Spring, 2004 M.P. Johnson, DBMS, Stern/NYU, Sp2004

  2. Personnel • Instructor: Matthew P. Johnson • KMEC 8-176, mjohnson@stern.nyu.edu • Office hours: KMC 8-176, Th 12:15-2:15 • please visit! • Tutor/TF/grader: Matthew P. Johnson! M.P. Johnson, DBMS, Stern/NYU, Sp2004

  3. Communications • Web page: http://www.columbia.edu/~mpj9/dbms • syllabus • course policies • may move in the future… • Blackboard web site • Some materials will be available here • Discussion board • send general-interest messages here to benefit all! • Go to http://sternclasses.nyu.edu • Click on C20.0046 M.P. Johnson, DBMS, Stern/NYU, Sp2004

  4. Acknowledgements • Thanks to Ramesh, Ullman, et al., Raghu and Johannes, Dan Suciu, Arthur Keller, David Kuijt for course materials • See classpage for other related, antecedent DBMS courses M.P. Johnson, DBMS, Stern/NYU, Sp2004

  5. What Is a Database? • A very large, integrated collection of data. • Models real-world enterprise. • Entities (e.g., students, courses, instructors, TAs) • Relationships (e.g., Joe is taking C20.0046) • George is currently taking C20.0046 • Dick is currently teaching C20.0046 • Condi is currently TA-ing C20.0046 but took it last semester • A Database Management System (DBMS)is a software package designed to store and manage databases. M.P. Johnson, DBMS, Stern/NYU, Sp2004

  6. Databases are everywhere • Example: Ordering a pizza • Databases involved? • Pizza Hut’s DB • stores previous orders by customer • stores previous credit cards used • Credit card records • huge databases of (attempted) purchases • location, date, amount, parties • phone company’s records • Local Usage Details (“Pull his LUDs, Lenny.”) • Caller ID • ensures reported address matches destination M.P. Johnson, DBMS, Stern/NYU, Sp2004

  7. Your wallet is full of DB records • Driver’s license • Credit cards • NYUCard • Medical insurance card • Social security card • Gym membership • Money (serial numbers) • Maybe even photos M.P. Johnson, DBMS, Stern/NYU, Sp2004

  8. Databases are everywhere • Q: Websites backed by DBMSs? • retail: Amazon, etc. • data-mining: Page You Made • search engines: Google, etc. • directories: Internic, etc. • searchable DBs: IMDB, tvguide.com, etc. • Q: Non-web examples of DBMSs? • criminal/terrorist: TIA • airline bookings • NYPD’s CompStat • all serious crime stats by precinct • Retailers: Wal-Mart, etc. • when to re-order, purchase patterns, data-mining • Genomics! M.P. Johnson, DBMS, Stern/NYU, Sp2004

  9. Example of a Traditional DB App Suppose we are building a system to store the information about: • checking accounts • savings accounts • account holders • state of each of each person’s accounts M.P. Johnson, DBMS, Stern/NYU, Sp2004

  10. Can we do it without a DBMS? Sure we can! Start by storing the data in files: checking.txt savings.txt customers.txt Now write C or Java programs to implement specific tasks M.P. Johnson, DBMS, Stern/NYU, Sp2004

  11. Doing it without a DBMS... • Transfer $100 from George’s savings to checking: Write a C program to do the following: • Read ‘savings.txt’ • Find&update the record “George” • balance -= 100 • Write ‘savings.txt’ • Read ‘checking.txt’ • Find&update the record “George” • balance += 100 • Write ‘checking.txt’ M.P. Johnson, DBMS, Stern/NYU, Sp2004

  12. Problems without an DBMS... 1. System crashes: • Q: What is the problem ? • A: George lost his $100 • Same problem even if reordered 2. Simultaneous access by many users • George and Dick visit ATMs at same • Lock checking.txt before each use– what is the problem? • Read ‘savings.txt’ • Find&update the rec “George.” • Write ‘savings.txt’ • Read ‘checking.txt’ • Find&update the rec “George” • Write ‘checking.txt’ CRASH ! M.P. Johnson, DBMS, Stern/NYU, Sp2004

  13. Problems without an DBMS... 3. Large data sets (say 50GB) • Why is this a problem? • No indices • Finding “George” in huge flatfile is expensive • Modifications intractable without better data structures • “George”  “Georgie” is very expensive • Deletions are very expensive M.P. Johnson, DBMS, Stern/NYU, Sp2004

  14. Problems without an DBMS... 5. Security? • File system may be insecure • File system security may be coarse 6. Application programming interface (API)? • suppose need other apps to access DB 7. How to interact with other DBMSs? M.P. Johnson, DBMS, Stern/NYU, Sp2004

  15. General problems to solve • In building our own system, many Qs arise: • how do we store the data? (file organization, etc.) • how do we query the data? (write programs…) • make sure that updates don’t mess things up? • leave the DB “consistent” • provide different views on the data? • e.g., ATM user’s view v. bank teller’s view • how do we deal with crashes? • Too hard! Go buy a DBMS! • Q: How does a DBMS solve these problems? • A: See third part of course M.P. Johnson, DBMS, Stern/NYU, Sp2004

  16. Big issue: Transaction processing • Grouping of several queries (or other database actions) into one transaction • ACID properties • Atomicity • all or nothing • Consistency • constraints on relationships • Isolation • concurrency control • Simulated solipsim • Durability • Crash recovery M.P. Johnson, DBMS, Stern/NYU, Sp2004

  17. Atomicity & Durability • Saw how George lost $100 with makeshift DBMS • DBMS prevents this outcome • xacts are all or nothing • One idea: Keep a log (history) of all actions in set of xacts • Durability: Use log to redo or undo certain ops in crash recovery • Atomicity: don’t really commit changes until end • Then, all at once M.P. Johnson, DBMS, Stern/NYU, Sp2004

  18. Isolation • Concurrent execution is essential for performance. • Frequent, slow disk accesses •  don’t waste CPU – keep running • Interleaving actions of different user programs • can lead to inconsistency: • e.g., two programs simultaneously withdraw from the same account • DBMS ensures such problems don’t arise: • users can pretend they are using a single-user system. M.P. Johnson, DBMS, Stern/NYU, Sp2004

  19. Isolation • Contrast with a file in two Notepads • Strategy: ignore multiple users • whichever saves last wins • first save is overwritten • Contrast with a file in two Words • Strategy: blunt isolation • One can edit • To the other it’s read-only M.P. Johnson, DBMS, Stern/NYU, Sp2004

  20. Consistency • Each xant (on a consistent DB) must leave it in a consistent state • can define integrity constraints • checks the defined claims about the data remain true M.P. Johnson, DBMS, Stern/NYU, Sp2004

  21. Data Models • Any DBMS uses a data model: collection of concepts for describing data • Schema: description of partic set of data, using some data model • Relational data model: most widely used (by far) data model • Oracle, DB2, SQLServer, other SQL DBMSs • main concept: relation ~ table of rows & columns • a rel’s schema defines its fields M.P. Johnson, DBMS, Stern/NYU, Sp2004

  22. Example: university database • Conceptual schema: • Students(ssn: string, name: string, login: string, age: int, gpa: real) • Courses(cid: string, cname: string, credits: int) • Enrolled(sid:string, cid:string, grade: string) • Physical schema: • Relations stored as unordered text files. • Indices on first column of each rel • External Schema (View): • Course_info(ssn: string, name: string) • My_courses(cname: string, grade: string) M.P. Johnson, DBMS, Stern/NYU, Sp2004

  23. How the programmer sees the DBMS • Start with DDL to create tables: • Continue with DML to populate tables: CREATE TABLE Students ( Name CHAR(30) SSN CHAR(9) PRIMARY KEY NOT NULL, Category CHAR(20) ) . . . INSERT INTO Students VALUES(‘Howard, ‘123456789’, ‘undergraduate’) . . . . M.P. Johnson, DBMS, Stern/NYU, Sp2004

  24. How the programmer sees the DBMS • Tables: • Still implemented as files, but behind the scenes can be quite complex Takes: Students: Courses: “data independence” = separate logical view from physical implementation M.P. Johnson, DBMS, Stern/NYU, Sp2004

  25. Querying: Structured Query Language • Find all the students who have taken C20.0046: • SELECT SSN FROM Takes WHERE CID=“C20.0046” • Find all the students who C20.0046 last fall: • SELECT SSN FROM Takes WHERE CID=“C20.0046” AND Semester=“Fall, 2003” • Find the students’ names: • SELECT Name FROM Students, Takes WHERE Students.SSN=Takes.SSN AND CID=“C20.0046” AND Semester=“Fall, 2003” • Query processor does this efficiently. M.P. Johnson, DBMS, Stern/NYU, Sp2004

  26. Database Industry • Relational databases are a great success of theoretical ideas. • based on most “theoretical” type of math there is: set theory • DBMS companies are among the largest software companies in the world. • Oracle, IBM (with DB2), Microsoft (SQL Server, Microsoft Access), Sybase. • Also opensource: MySQL, Postgres, etc. • $20B+ industry. • XML (“semi-structured data”) also important • New lingua franca for exchanging data M.P. Johnson, DBMS, Stern/NYU, Sp2004

  27. The Study of DBMS • Several aspects: • Modeling and design of databases • DBMS programming: querying and update • DBMS implementation • This course covers all three • though more time on first two • Also will look at some more advanced areas • XML, data-mining, LDAP? M.P. Johnson, DBMS, Stern/NYU, Sp2004

  28. Databases are used by • DB app programmers • desktop app programmers • web developers • Database administrators (DBAs) • design schemas • security/authorization • crash recovery • tuning • better paid than programmers! • Everyone else (perhaps indirectly) M.P. Johnson, DBMS, Stern/NYU, Sp2004

  29. Course outline • Database design: • Entity/Relationship models • Modeling constraints • The relational model: • Relational algebra • Transforming E/R models to relational schemas • SQL • Views and triggers M.P. Johnson, DBMS, Stern/NYU, Sp2004

  30. Outline (Continued) • Connecting to a database from a programming language • Storage and indexing • Transactions • XML • Advanced topics • May change as course progresses • partly in response to audience M.P. Johnson, DBMS, Stern/NYU, Sp2004

  31. Textbook • Database Systems: The Complete Book • Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer D. Widom • 1st Edition (2001) • Available: • NYU bookstore • Amazon/BN (direct links on classpage) • Amazon.co.uk (total is/was about $20 less) • First two chapters in PDF on classpage M.P. Johnson, DBMS, Stern/NYU, Sp2004

  32. SQL Readings • Optional reference: SQL in a Nutshell • Online (free) SQL tutorials include: • A Gentle Introduction to SQL (http://sqlzoo.net/) • SQL for Web Nerds (http://philip.greenspun.com/sql/) M.P. Johnson, DBMS, Stern/NYU, Sp2004

  33. Grading • Prerequisites: Programming experience • presumably C/C++/Java • Work & Grading: • Homework 30%: O(4) • Project: 30% - see below. • Midterm (closed book/notes): 15% • Final (closed book/notes): 20% • Class participation: 5% • Stern Curve • Class attendance is required • Absences will affect your total grade M.P. Johnson, DBMS, Stern/NYU, Sp2004

  34. The Project: design end-to-end DB app • data model • Identify entities (and fields), relationships • Identify resulting relations (tables) • creation of DB in Oracle • Insertion of real(alistic) data • (web) app for accessing/modifying data • Identification of “interesting” questions to ask • Production of DBMS interface • Work in pairs (start forming now) • Choose topic on your own: previous e.g.s online • Start forming your group today! M.P. Johnson, DBMS, Stern/NYU, Sp2004

  35. Collaboration • Homework and exams done individually • Project done with your team members only • Non-cited use of others’ problem solutions, code, etc. = plagiarism • See Stern’s stern academic honesty policy • Contact me if you’re at all unclear before a particular case • Cite any materials used if you’re at all unclear after a particular case M.P. Johnson, DBMS, Stern/NYU, Sp2004

  36. On-going Feedback • Don’t wait until the end-of-semester course evals to complain or give feedback on how to improve course. (It’s too late for you then!) • Come see me early on during my office hours • or send me email with your concerns • “We’re in touch, so you be in touch.” M.P. Johnson, DBMS, Stern/NYU, Sp2004

  37. Summary • DBMS used to maintain, query large datasets. • Benefits include recovery from system crashes, concurrent access, data integrity, security, and quick application development. • Database skills are critical in financial services, marketing and other business areas! M.P. Johnson, DBMS, Stern/NYU, Sp2004

  38. So what is this course about, really ? A bit of everything ! • Languages: SQL, XPath, XQuery • Data modeling • Some theory! • Functional dependencies, normal forms • e.g., how to find most efficient schema for data • Algorithms and data structures (in the third part) • e.g., indices make data much faster to find – but how? • Lots of implementation and hacking for the project • Business DBMS examples/cases • Most importantly: how to meet real-world needs M.P. Johnson, DBMS, Stern/NYU, Sp2004

  39. For next time • Get the book • Read chapters 1, 2.1-2.2 M.P. Johnson, DBMS, Stern/NYU, Sp2004

  40. For right now: written survey • name • previous cs/is/math/logic courses • previous programming experience • career plans: programmer, DBA, MBA, etc. • why taking class • any religious holidays during class M.P. Johnson, DBMS, Stern/NYU, Sp2004

More Related