1 / 18

Relational Databases: Overview, Operators, and Queries

This recitation covers the history and growth of relational databases, the difference between database systems and file systems, the concept of data abstraction, tables in a relational database, queries and relational operators, query processing, and relational joins.

pkim
Download Presentation

Relational Databases: Overview, Operators, and Queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 6001structure & interpretation of computer programsrecitation 8/ october 10, 1997

  2. overview • today’s ideas • relational databases • relational operators and queries • simple query optimization

  3. history • origins • 1960’s: network and hierarchical databases • 1970: Codd invents the relational model; revolutionizes the field • early 1980’s: IBM starts DB2, Oracle founded • growth • late 1980’s: RDBs become big business: Oracle, Sybase, Ingres, Informix, etc • 1997: Oracle’s revenue is $5B, total RDB market about $10-15B • future • object-oriented databases unlikely to take off • but hybrid ‘object-relational databases’ are starting to get popular

  4. what’s a database? • what does a database do? • stores large volumes of data • allows rapid access for many different kinds of query • databases includes mechanisms for handling • secondary storage management: indexing and storage of data that doesn’t fit in memory • persistence: data shouldn’t go away! • concurrency control: many users at once • distribution: users at different sites • data protection: making sure updates are consistent • recovery: when system fails, eg • the difference between DB systems and file systems • programmer doesn’t have to worry about how data is laid out on disk(sometimes called the ‘data independence principle’)

  5. the big idea • what was wrong with early databases (network and hierarchical) • the user’s operations were coded directly in terms of data structures • these data structures closely mirrored the way the data was actually stored • when developer changed the data structures for performance,the queries had to be rewritten • the solution • have an abstract data model • queries and insertions all expressed in terms of abstract model • hide the details of the actual data structures from the user • in other words… • data abstraction!

  6. tables • what’s in a relational database • an RDB consists of a collection of tables or relations • each table has a name • columns have names too (called attributes) • each row is a sequence of values one for each column • these may be numbers, booleans, strings, etc • rows are also called tuples or records • example • EVENTS table • EVENT CAFEMEISTER HOURS • java-jump ben 10 • tour-de-café alice 15 • seattle-sleepless carol 8 • ATHLETES table • NAME SPONSOR • alice coffee-connection • ben starbucks • carol green-mountain

  7. queries • to extract information from the RDB • present it with a query • examples • who won the java-jump event? • what company is sponsoring ben? • which company sponsored the winner of the java-jump event? • kinds of query • some queries can be answered by examining only one table • who won the java-jump event? –> look at EVENTS table • what company is sponsoring ben? –> look at ATHLETES table • some need more than one table • which company sponsored the winner of the java-jump event? • –> look up winner in EVENTS and then find sponsor in ATHLETES • query processing • queries are declarative: they say what you want, not how to get it • database system translates query into operation on internal data structures • may optimize the query before applying, so that it goes faster

  8. queries on a single table • for uniformity • think of result of a query as a table itself • two ways to make a table smaller • eliminate some columns • eliminate some rows • eliminating columns • query • select <column names> from <table> • example • select CAFEMEISTER, HOURS from EVENTS • CAFEMEISTER HOURS • ben 10 • alice 15 • carol 8

  9. simple queries, continued • eliminating rows • query • filter <condition> from <table> • example • filter NAME = ben from ATHLETES • NAME SPONSOR • ben starbucks • operators • filter and select are called relational operators • they are like the operations of an abstract data type • take a table and produce another table

  10. combining queries • example • who won the java-jump event? • want a table containing only the name of the winner • so only column is CAFEMEISTER • want to restrict rows so that EVENT is java-jump • query: • select CAFEMEISTER from • filter EVENT = java-jump from • EVENTS • could do it the other way round too • other examples • who sponsors alice? • which athletes did more than 9 hours? • whom does starbucks sponsor?

  11. relational join • relational join • given two tables T1 and T2 with columns a1, b1, c1, … and a2, b2, c2, … • make a new table with columns a1, b1, c1, … a2, b2, c2, … • for each row R1 in T1 and each row R2 in T2, make a new row R1 R2 • how many rows if each table has k rows? k^2 • assume that column names of T1 and T2 are disjoint

  12. example of a join • join EVENTS and ATHLETES • EVENT CAFEMEISTER HOURS NAME SPONSOR • java-jump ben 10 alice coffee- connection • tour-de-café alice 15 alice coffee- connection • seattle-sleepless carol 8 alice coffee- connection • java-jump ben 10 ben starbucks • tour-de-café alice 15 ben starbucks • seattle-sleepless carol 8 ben starbucks • java-jump ben 10 carol green-mountain • tour-de-café alice 15 carol green-mountain • seattle-sleepless carol 8 carol green-mountain

  13. queries involving two tables • now we can formulate a query on two tables by • joining the tables together into a single table • applying filter and select to the new table • example • which company sponsored the winner of the java-cup event? • query is • select SPONSOR from • filter EVENT = java-jump from • filter CAFEMEISTER = NAME from • join EVENTS and ATHLETES • before filter, we have • EVENT CAFEMEISTER HOURS NAME SPONSOR • java-jump ben 10 ben starbucks • final table (result of query) is • SPONSOR • starbucks

  14. more example queries • exercises • which events did coffee-connection sponsor the winner of? • who sponsored an athlete who did more than 12 hours?

  15. query optimization • note • join is expensive! • want to apply it to small tables whereever possible • could have done one of the selects before the join first • instead of • select SPONSOR from • filter EVENT = java-jump • filter CAFEMEISTER = NAME from • join EVENTS and ATHLETES • we could write • select SPONSOR from • filter CAFEMEISTER = NAME from • join ATHLETES and • filter EVENT = java-jump from EVENTS • now join gives table with only 3 rows instead of 9 • we’d like • to write the query either way, and have the database figure out a better way to write it • this is called query optimization: can be expressed with simple rules

  16. join of tables with shared column names • convenient to • use same names for related columns • for example, CAFEMEISTER in EVENTS and NAME in ATHLETES could both be NAME • refine the meaning of join • can’t have two columns with same name • so • form a table with both columns • eliminate rows with different values in the two columns • merge the two columns • example • assume CAFEMEISTER column in EVENTS is now called NAME • join of EVENTS and ATHLETES: • EVENT NAME HOURS SPONSOR • tour-de-café alice 15 coffee- connection • java-jump ben 10 starbucks • seattle-sleepless carol 8 green-mountain

  17. query optimizations • which of these rules are valid? • filter C1 from filter C2 from T = filter C1 and C2 from T • select COLS1 from select COLS2 from T = select COLS1 U COLS2 from T • under what conditions are these valid? • select COLS from (join of T1 and T2) = join (select COLS from T1) and (select COLS from T2) • filter C from (join of T1 and T2) = join of (select C from T1) and T2

  18. puzzles for student presentations • procedures with variable length argument lists • find out from the R4RS manual how to write a procedurethat can take a variable number of arguments • make up an entertaining procedure that exploits this feature • powerlists • define a procedure that takes a list and returns a list ofall the sublists • for example, (p (list 1 2 3)) should return a list containing()(1)(2)(3)(1 2)(1 3)(2 3)(1 2 3)in some order • use map!

More Related