1 / 21

Advanced Computing

Advanced Computing. Data Bases: Basic concepts Based in part on open access material from: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrke http://pages.cs.wisc.edu/~dbbook/index.html Databases by Timothy Griffin www.cl.cam.ac.uk/Teaching/current/Databases &

moenj
Download Presentation

Advanced Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Computing Data Bases: Basic concepts Based in part on open access material from: Database Management Systems by Raghu Ramakrishnan and Johannes Gehrkehttp://pages.cs.wisc.edu/~dbbook/index.html Databases by Timothy Griffinwww.cl.cam.ac.uk/Teaching/current/Databases & Scientific Data Management by Laura Bright & Bill Howehttp://web.cecs.pdx.edu/~howe/cs410/

  2. What is a database? "A database is a collection of information stored (usually in a computing system) in a systematic way, such that users (usually a computer program) can consult it to answer questions.“ (see WikiPedia - The Free Encyclopedia.)Databases are in almost any medium to large system in the world today. This is useful to handle scalability. As the expectations of users and the complexity of applications continue to increase, most application systems would need a database to store and manage information.Databases have grown larger and more powerful as technology advances. It is now common to have databases which are terabytes in size and hosted on multiple serversData is stored in database tables and SQL is used to access or modify this data.

  3. Database management systems • A database is a collection of (usually persistent) data that model the entities of interest, by means of properties. • To handle large amounts of data computers use what we call a Database Management System (DBMS) – These are software systems that support the creation, population, and querying of a database Aims in the access to data: Efficiency, Concurrency, Security

  4. Database Tables • Tables are the basic structures within databases that are used to store data. Each database table consists of rows and columns. Rows are called Records (the entities to be modelled) , and columns are called Fields (their properties of interest). • As the number of records grows, more rows are added into the table • A record is expressed as a collection of fields, which describe the relevant aspects of the single unit of data stored in the table. • Each column in the database table shows the fields of the different records and has a data-type associated.

  5. Data types • Each field in a table has a data-type associated. • Types fix the way fields are represented and stored in the binary implementation inside the computing system. • This allows to have efficient mechanism to reach a record in the serialized version of the table.

  6. Relationships • As data grow in a database, it may be more efficient to have several tables to store it. When data types become more complex, and/or the properties to attach to a field would be better expressed as records in another table, records of this tables are referenced from fields in records of the other tables. • The referencing mechanism is the definition of one of the fields of the record as a key • Keys are used in the referencing table as data stored in a field

  7. Relational Database • When data need to be expressed in several related tables, we call it a relational database • The software used to manage them is a Relational Database Management System (RDBMS). It consists of a number of tables and single schema (definition of tables and attributes) • Example: Students (sid, name, login, age, gpa) • Students  identifies the table • sid, name, login, age, gpa  identify attributes • sid  is primary key

  8. Programming over Database Management Systems (DBMSs) Database abstractions allow this interface to be cleanly defined and this allows applications and data management systems to be implemented separately. Applications Go Here DBMS Raw Resources (and its binary code)

  9. External level Conceptual level Physical level Three-level architecture External Schema 1 External Schema 2 External Schema n … Conceptual Schema Internal Schema

  10. An Example Table • Students (sid: string, name: string, login: string, age: integer, gpa: real)

  11. Another example: Courses • Courses (cid, instructor, quarter, dept)

  12. Keys • Primary key – minimal subset of fields that is unique identifier for a tuple • sid is primary key for Students • cid is primary key for Courses • Foreign key –connections between tables • Courses (cid, instructor, quarter, dept) • Students (sid, name, login, age, gpa) • How do we express which students take each course?

  13. Many to many relationships • In general, need a new table Enrolled(cid, grade, studid) Studid is foreign key that references sid in Student table Student Foreign key Enrolled

  14. Indexes • When a table will be searched using different fields, it is convenient to have them pre-ordered according to those fields. • To have quick access to it, it is convenient to have an additional table that links ordering values(fields) to keys of the records in the related table. • Idea: speed up access to desired data • “Find all students with gpa > 3.3” • May need to scan entire table • Index consists of a set of entries pointing to locations of each search key

  15. Types of Indexes • Clustered vs. Unclustered • Clustered- ordering of data records same as ordering of data entries in the index • Unclustered- data records in different order from index • Primary vs. Secondary • Primary – index on fields that include primary key • Secondary – other indexes

  16. Example: Clustered Index • Sorted by sid

  17. Example: Unclustered Index • Sorted by sid • Index on gpa

  18. Comments on Indexes • Indexes can significantly speed up query execution • But inserts more cost • May have high storage overhead • Need to choose attributes to index wisely! • What queries are run most frequently? • What queries could benefit most from an index?

  19. Summary: Why are RDBMS useful? • Data independence – provides abstract view of the data, without details of storage • Efficient data access – uses techniques to store and retrieve data efficiently • Reduced application development time – many important functions already supported • Centralized data administration • Data Integrity and Security • Concurrency control and recovery

  20. So, why don’t scientists use them? • “I tried to use databases in my project, but they were just too [slow | hard-to-use | expensive | complex] . So I use files”. • Gray and Szalay, Where Rubber Meets the Sky: Bridging the Gap Between Databases and Science • Some other limitations of RDBMS • Arrays • Hierarchical data

  21. Example: Taxonomy of Organisms • Hierarchy of categories: • Kingdom - phylum – class – order – family – genus - species • How would you design a relational schema for this? Animals Chordates Arthropods Vertebrates insects spiders crustaceans birds reptiles mammals

More Related