Cs 540 database management systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 38

CS 540 Database Management Systems PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on
  • Presentation posted in: General

CS 540 Database Management Systems. RDBMS Architecture and Data Storage. Database Implementation. User Requirements. SQL. Physical Storage . Conceptual Design. Schema. Data. Relational Model. Entity Relationship(ER) Model. Files and Indexes. The big advantage of RDBMS.

Download Presentation

CS 540 Database Management Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cs 540 database management systems

CS 540 Database Management Systems

RDBMS Architecture and Data Storage


Database implementation

Database Implementation

User Requirements

SQL

Physical Storage

Conceptual Design

Schema

Data

Relational Model

Entity Relationship(ER) Model

Files and Indexes


The big advantage of rdbms

The big advantage of RDBMS

  • It separates logical level (schema) from physical level (implementation).

  • Physical data independence

    • Users do not worry about how their data is stored and processes on the physical devices.

    • It is all SQL!

    • Their queries work over (almost) all RDBMS deployments.


Issues in logical level

Issues in logical level

  • Data models

    • Relational, XML , …

  • Query language

  • Data quality

    • normalization

  • Usability

  • ...


Issues on physical level

Issues on physical level

  • Processor: 100 – 1000 MIPS

  • Main memory: 1μs – 1 ns

  • Secondary storage: higher capacity and durability

  • Disk random access : Seek time + rotational latency + transfer time

    • Seek time: 4 ms - 15 ms!

    • Rotational latency: 2 ms – 7 ms!

    • Transfer time: around 1000 Mb/ sec

    • Read, write in blocks.


Storage capacity versus access time

Storage capacity versus access time

offline

tape

nearline

tape &

optical

disks

1015

1013

magnetic

optical

disks

1011

electronic

secondary

online

tape

109

typical capacity (bytes)

electronic

main

107

from Gray & Reuter

updated in 2002

105

cache

103

103

10-9

10-6

10-3

10-0

access time (sec)


Storage cost versus access time

Storage cost versus access time

from Gray & Reuter

104

cache

electronic

main

online

tape

102

electronic

secondary

magnetic

optical

disks

nearline

tape &

optical

disks

dollars/MB

100

10-2

offline

tape

10-4

103

10-9

10-6

10-3

10-0

access time (sec)


Gloomy future moor s law

Gloomy future: Moor’s law

  • Speed of processors and cost and maximum capacity of storage increase exponentially over time.

  • But storage (main and secondary) access time grows much more slowly.

  • This is why managing and analyzing big data is hard.


Issues in physical level

Issues in physical level

Three things are important in the database systems: performance, performance, and performance! ( Bruce Lindsay, co-creator of System R)


Issues in physical level1

Issues in physical level

  • Other things also matter

    • Reliability when it comes to transactions.

  • But performance is still a big deal.


Is it easy to achieve good performance

Is it easy to achieve good performance?

  • Let’s build an RDBMS.

  • It supports core SQL.

  • No stored procedure for this version!


Storing data

Storing Data

  • Store each relation in an ASCII file:

    Person (SSN, Name, Age)

    person.txt:

    111222333 - John - 24

    444222111 - Charles - 43


Storing data1

Storing Data

  • Store schema information in a catalogue relation:

    Catalogue (AttrName, Type, RelName, Position)

    catalogue.txt:

    SSN - String – Person - 1

    Name - String - Person - 2

    Age – Integer – Person - 3


Sql support

SQL Support

  • SQL compiler

  • Like any other computer language compiler.

    SELECT SSN

    FROM Person;

    SSN

    111222333

    444222111


Query execution selection

Query Execution: Selection

  • Find the selection attribute position from the catalogue.

  • Scan the file that contains the relation.

  • Show the tuples that satisfy the condition.

    SELECT *

    FROM Person

    WHERE SSN = 111222333;


Query execution join

Query Execution: Join

  • Read the catalogue to find the info on join attributes.

  • Read the first relation, for each tuple:

    • Read the second relation, for each tuple:

    • Assemble the join tuple

    • Output if they satisfy the condition.

      SELECT *

      FROM Person, PersonAddr

      WHERE Person.SSN = PersonAddr.SSN

      and Person.SSN = 111222333;


Performance issues storing data

Performance Issues: Storing Data

  • Update John to Sheldon

    • Rewrite the whole file very slow

    • Type conversion slow

  • Delete the tuple with SSN of 111222333.

    Person (SSN, Name, Age)

    person.txt:

    111222333 - John - 24

    444222111 - Charles - 43


Performance issues selection

Performance Issues: Selection

  • We have to scan the whole relation to select some tuples  very slow

  • We can use an index to find the tuples much fasters.

    SELECT *

    FROM Person

    WHERE SSN = 111222333;


Performance issues selection1

Performance Issues: Selection

  • Read tuples one by one

    • Much faster if we read a whole bunch of them together: caching

      SELECT *

      FROM Person

      WHERE SSN = 111222333;


Performance issues join

Performance Issues: Join

  • Quadratic I/O access

    • Very slow for large relations

      SELECT *

      FROM Person, PersonAddr

      WHERE Person.SSN = PersonAddr.SSN and Person.SSN = 111222333;


Performance issues query execution

Performance Issues: Query Execution

  • Two ways of executing the query

    • First join, then select

    • First select, then join much faster

    • Query (execution) optimization.

      SELECT *

      FROM Person, PersonAddr

      WHERE Person.SSN = PersonAddr.SSN

      and Person.SSN = 111222333;


Reliability

Reliability

  • Update the name in person

    • Power outage  is the operation done?

    • Disk crash

      Update Person

      SET Name = Smith

      WHERE Person.SSN = 111222333;


Probably not that many people download our rdbms

Probably not that many people download our RDBMS 

  • Let’s redesign the components of our RDBMS


Database implementation1

Database Implementation

Data storage

User Requirements

Physical Storage

Conceptual Design

Schema

Relational Model

Entity Relationship(ER) Model

Files and Indexes


Random access versus sequential access

Random access versus sequential access

  • Disk random access : Seek time + rotational latency + transfer time.

  • Disk sequential access: reading blocks next to each other

  • No seek time or rotational latency

  • Much faster than random access


Units of data on physical device

Units of data on physical device

  • Fields: data items

  • Records

  • Blocks

  • Files


Fields

Fields

  • Fixed size

    • Integer, Boolean, …

  • Variable length

    • Varchar, …

    • Null terminated

    • Size at the beginning of the string


Records sets of fields

Records: Sets of Fields

  • Schema

    • Number of fields, types of fields, order, …

  • Fixed format and length

    • Record holds only the data items

  • Variable format and length

    • Record holds fields and their size, type, … information

  • Range of formats in between


Record header

Record Header

  • Pointer to the record schema ( record type)

  • Record size

  • Timestamp

  • Other info …


Blocks

Blocks

  • Collection of records

  • Reduces number of I/O access

  • Different from OS blocks

    • Why should RDBMS manage its own blocks?

    • It knows the access pattern better than OS.

  • Separating records in a block

    • Fixed size records: no worry!

    • Markers between records

    • Keep record size information in records or block header.


Spanned versus un spanned

Spanned versus un-spanned

  • Unspanned

    • Each records belongs to only one block

  • Spanned

    • Records may store across multiple records

    • Saves space

    • The only way to deal with large records and fields: blob, image, …


Heap versus sorted files

Heap versus Sorted Files

  • Heap files

    • There is not any order in the file

    • New blocks (records) are inserted at the end of file.

  • Sorted files

    • Order blocks (and records) based on some key.

    • Physically contiguous or using links to the next blocks.


Average cost of data operations

Average Cost of Data Operations

  • Insertion

    • Heap files are more efficient.

    • Overflow areas for sorted files.

  • Search for a record

    • Sorted files are more efficient.

  • Search for a range of records

    • Sorted files are more efficient.

  • Deletion

    • Heap files are more efficient

    • Although we find the record faster in the sorted file.


Indirection

Indirection

  • The address of a record on the disk

  • Physical address

    • Device ID, Cylinder #, Track #, …

  • Map physical addresses to logical addresses

    • Flexible in moving records for insertion and deletion

    • Costly lookup

    • Many options in between, tradeoff

Physical address on disk

Logical address


Block header

Block Header

  • Data about block

  • File, relation, DB IDs

  • Block ID and type

  • Record directory

  • Pointer to free space

  • Timestamp

  • Other info …


Row and column stores

Row and Column Stores

  • We have talked about row store

    • All fields of a record are stored together.


Row and column stores1

Row and Column Stores

  • We can store the fields in columns.

    • We can store SSNs implicitly.


Row versus column store

Row versus column store

  • Column store

    • Compact storage

    • Faster reads on data analysis and mining operations

  • Row store

    • Faster writes

    • Faster reads for record access (OLTP)

  • Further reading

    • Mike Stonebreaker, et al, “C-Store, a column oriented DBMS”, VLDB’05.


  • Login