Ecommerce technology 20 751 databases
This presentation is the property of its rightful owner.
Sponsored Links
1 / 48

eCommerce Technology 20-751 Databases PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on
  • Presentation posted in: General

eCommerce Technology 20-751 Databases. Concepts. Relational model SQL DB construction Normalization ER diagrams Transactions Web support. Critical Role of Data. Without data, an organization cannot function especially in eCommerce Initially, data was prepared for specific applications

Download Presentation

eCommerce Technology 20-751 Databases

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ecommerce technology 20 751 databases

eCommerce Technology20-751Databases


Concepts

Concepts

  • Relational model

  • SQL

  • DB construction

    • Normalization

    • ER diagrams

  • Transactions

  • Web support


Critical role of data

Critical Role of Data

  • Without data, an organization cannot function

    • especially in eCommerce

  • Initially, data was prepared for specific applications

    • payroll data for the payroll system

    • parts lists for the bill of materials system

    • sales data for statistical analysis

  • By 1970, clear that data had common properties

  • Data for many applications could be stored together in an organized way

    • database instead of separate collections


What is a database

What is a Database?

  • No formal definition

  • A collection of related data allowing:

    • insert(add new data)

    • delete(delete existing data)

    • update(change existing data = delete + insert)

    • query(retrieve all data having a certain property)

  • What does “related” mean?


Database management system

Database Management System

  • Based on a data model, e.g. relational, object, hierarchical

  • Has data definition language (DDL) to identify data

  • Has data manipulation language (DML) for queries and updates

  • Separates structure of data from DB implementation

  • Enforces data structure and content rules

  • Handles transactions, concurrent operations

  • Allows backup and recovery from errors

  • Connects to other software


The relational model

The Relational Model

  • A set is a collection of unique items

    { CS, HCII, ISRI, RI, LTI, CALD } Divisions of SCS

    { CS, HCII, CS, HCII, RI, CS } NOT A SET (repeated elements)

  • A relation on two sets A, B is a set of pairs of elements, one from A and one from B

    A = { 46-870, 20-751, 46-749, 20-753, 20-770 }

    B = { GSIA, SCS }

    R = { (46-870, GSIA), (20-751, SCS), (20-753, SCS), (20-770, GSIA), (46-749, GSIA) }

  • Relations can be defined on any number of sets


The relational model of data

The Relational Model of Data

A = { 46-870, 20-751, 46-749, 20-753, 20-770 }

B = { GSIA, SCS }

R = { (46-870, GSIA), (20-751, SCS), (20-753, SCS), (20-770, GSIA), (46-749, GSIA) }

20-770

46-870

46-749

20-753

20-751

GSIA

SCS

This is the graph of the relation R


The relational model of data1

The Relational Model of Data

A = { 46-870, 20-751, 46-749, 20-753, 20-770 }

B = { GSIA, SCS }

R = { (46-870, GSIA), (20-751, SCS), (20-753, SCS), (20-770, GSIA), (46-749, GSIA) }

CONTAINS ONLY

COURSE NUMBERS

CONTAINS ONLY

SCHOOL NAMES

This is a table of the relation R


The relational model of data2

The Relational Model of Data

Relations are not necessarily binary. May involve many sets:

  • Each row is a 7-tuple. Relation on 7 sets.

  • No implied ordering of either rows or columns. Sorting is irrelevant

  • Note: bad table design since “DEPT” is an attribute of “FACULTY”, not “COURSE”


Tables

Tables

  • A relation can be represented as a table

  • One row for each tuple in the relation

  • Easier to draw than a graph

  • Table has implicit order (of rows and columns)

    • But: a relation has no ordering, either of tuples or attributes

  • The cardinality C(R) of a relation R is the number of tuples it contains = # of rows in its table

  • Relational model represents data as a collection of unordered two-dimensional tables


Ecommerce technology 20 751 databases

Keys

  • Key: an attribute (or minimum set of attributes) that uniquely defines a tuple

    • In the example relation, “Course” is a key

  • A relation may have more than one key.

  • A set of attributes that can serve as a key is a candidate key.

  • One is chosen as the primary key.

  • Keys are used to reference (retrieve) tuples.


Foreign keys

Foreign Keys

  • A key from one relation that is an attribute of another relation is a foreign key.

  • If we had a “Faculty” relation, then “Faculty” would be a foreign key in the “Courses” relation.

  • Foreign keys connect relations together.

FOREIGN KEY:

PRIMARY KEY

PRIMARY KEY


Operations on relations

Course, Room(Courses)

Courses

Operations on Relations

  • Projection

    List specific attributes L (columns) of R, written L(R)

    E.g. show course number and room


Operations on relations1

Operations on Relations

  • Selection (extract horizontal slices)

    • List all tuples of relation R whose attributes satisfy condition C, written C(R)

    • E.g. show all tuples with Room = 152, Room=152(R)

  • Projection & Selection are unary (1-table) operations


Structured query language sql

Structured Query Language (SQL)

  • A data manipulation language for manipulating relational databases

  • SELECT queries the database

  • UPDATE modifies relations

  • DELETE removes tuples

    Syntax of the SQL SELECT command:

    SELECT { attributes }FROM { table }WHERE { attribute-conditions };


Structured query language sql1

Structured Query Language (SQL)

YIELDS DISTINCT TUPLES

SINCE CourseNo IS A KEY

  • Projection

    • SQL: SELECT CourseNo, Room FROM Courses;

    • SQL: SELECT DISTINCT Room FROM Courses;

  • Selection

    • SELECT * FROM Courses WHERE Room= “152”;

    • Give a table of all courses that meet in 152

MUST ASK FOR DISTINCT TUPLES

SINCE Room IS NOT A KEY


Ecommerce technology 20 751 databases

Join

  • The natural joinA * B consists of tuples with matching attributes (names & values) in A and B

  • Natural join is a way of obtaining information across tables


Natural join a b

Natural Join A * B

  • Attribute names and values must match

  • Also called “inner join”: Statistics * Geography

  • Cartesian product and join are binary operations

Statistics:

Geography:

*

=


Joins in sql

Joins in SQL

  • SELECT City, State, RadioFROM StatisticsINNER JOIN GeographyON Statistics.City = Geography.City

Statistics:

Geography:

Result of Query:


Other sql constructs

Other SQL Constructs

  • ORDERBY (sorting)

    • SELECT Company, OrderNumber FROM Orders ORDER BY Company;

  • BETWEEN

    • SELECT * FROM Persons WHERE LastName BETWEEN 'Hansen' AND 'Pettersen‘;

  • ALTER TABLE (changes table structure)

  • Functions

    • SUM()

    • COUNT()

    • MAX()


Database constraints

Database Constraints

  • Domain (data validity) constraints

    • All values in a column must be from the same domain

    • Example: all salaries are positive numeric dollar amounts. “Monthly” is invalid.

  • Entity Integrity

    • Every entity must have a unique primary key. (Otherwise, can’t access the entity)

  • Referential Integrity

    • Every foreign key value in a relation must match a primary key in the foreign relation table


Functional dependency

Functional Dependency

  • Attribute B is functionally dependent on attribute A if the value of A uniquely determines B

    • One-to-one relationship: two functional dependencies: A depends on B; B depends on A

    • Many-to-one relationship: one functional dependency: B depends on A

    • Many-to-many relationship: no dependencies: neither A nor B depends on the other

  • Functional dependencies are constraints between attributes or sets of attributes. They must be maintained or error or inconsistency will result.


Normalization

Normalization

  • A relation is well-structured if it is non-redundant and allows INSERT, MODIFY and DELETE without error or inconsistency.

  • Normalization assists in maintaining functional dependencies and preventing errors and inconsistencies.

  • DELETE anomaly:

  • Deleting “Jones” removes all information about course 46-870 (namely that its room is 150)

  • In the information is in another table, it shouldn’t be here also.


Normalization1

Normalization

  • MODIFY anomaly:

  • Suppose Smith’s email address changes. Every line in the table corresponding to Smith must be changed or data will be inconsistent.

  • An attribute unique to a key should be entered only once in the database.


Normalization2

Normalization

  • Restructuring to produce smaller, well-structured equivalent relations, reduce data replication

  • First Normal Form. Make all attributes atomic. No multiple values.

MULTIPLE

VALUES

MULTIPLE

VALUES

FIRST

NORMAL

FORM:


Second normal form

Second Normal Form

  • Eliminate partial functional dependencies. Every non-key attribute must depend on all key attributes (or redundancy can result).

Capital DEPENDS ON State ONLY, NOT CITY

KEY IS(City, State)

NOT IN 2NF:

2NF:

DECOMPOSE

INTO TWO

TABLES

There are many other normal forms and normalization rules


Entity relationship er diagrams

Entity-Relationship (ER) Diagrams

  • Must specify:

    • Entities (things to be represented in the database)

    • Attributes (properties of entities)

    • Relationships (relations among entities)

  • These can be modeled by entity-relationship diagrams

  • The diagrams are used as a guide to designing the database


Entity relationship er diagrams1

Entity-Relationship (ER) Diagrams

  • Entity types

    • Entity type: Store

    • Entities: Downtown Store, Squirrel Hill Store, Oakland Store

  • Relationships between entity types:

  • This is the “Has” relationship

  • Direction of arrow is important (“Branch has Staff,” not “Staff Has Branch”)

EXAMPLE FROM CONNOLLY & BEGG


Entity relationship er diagrams2

Entity-Relationship (ER) Diagrams

  • Relationships need not be binary:

  • This is the “Arranges” relationship; it can be though of as a 4-tuple (Solicitor, Bid, Buyer, Institution)

EXAMPLE FROM CONNOLLY & BEGG


Entity relationship er diagrams3

Entity-Relationship (ER) Diagrams

EXAMPLE FROM CONNOLLY & BEGG


Entity relationship er diagrams4

Entity-Relationship (ER) Diagrams

EXAMPLE FROM CONNOLLY & BEGG


Web database connectivity

Web Database Connectivity

JDBC = Java Database Connectivity

SQLJ = Java-Embedded

SQL

SOURCE: CONNOLLY & BEGG


Distributed databases

Distributed Databases

  • Databases in which data is stored in more than one location but appears local to the user

    • Replicated: multiple copies of database

    • Partitioned: data is split among locations

  • Fragmentation

    • Information about fragments is stored in a distributed data catalog (DDC)

    • Horizontal v. vertical fragmentation


Distributed databases1

Distributed Databases

  • Advantages

    • Reduced load on central DB

    • Lower cost (data spread among small machines)

    • Reliability (machine failure is not fatal)

    • Fast access to local data

    • Ease of growth

  • Disadvantages

    • Complexity. Difficult to maintain consistency

    • Security (many access points)

    • Telecommunications required


Distributed databases2

Distributed Databases

  • Products

    • PeerDirect

  • Issues

    • Updating of information in a distributed database is a form of transaction processing


What is a transaction

What is a Transaction?

  • An action requiring a series of steps and database updates.

  • Transactions are the basis of Ecommerce.

  • Transactions may be distributed. Steps processed on different computers.

  • Transactions may fail. One or more steps may be unsuccessful.

  • Transaction systems must be recoverable. Data and “state” must be restored after failure.


Atm transaction database

ATM Transaction Database

ACCOUNT MASTER

STOLEN CARDS

RECENT ACTIVITY

POSTING LOG


Atm withdrawal

ATM Withdrawal

1 Check STOLEN If card is stolen, ABORT

2 Check PIN If wrong, retry 3 times, ABORT

3 Check RECENT v. BALANCE Too much activity, ABORT

4 Check ATM reserve If not enough money, ABORT

5 Update RECENT Indicate new activity

6 Update BALANCE Debit bank account

7 Write to Log Record transaction

8 Update ATM reserve Debit ATM balance

9 Tell ATM to dispense money Pay the man

10 Check dispensing status If failed, ABORT

11 Make updates permanent COMMIT the transaction

(Think: “to memory”)


Ecommerce technology 20 751 databases

ACID

  • Four minimum requirements of a transaction T in a transaction system:

  • Atomic. T executes completely or not at all.

  • Consistent. T preserves database consistency and integrity.

  • Isolated. T executes as if it were running alone. Not affected by other concurrent transactions.

  • Durable. T’s results preserved during failure.


Atomicity

Atomicity

  • Transaction: John pays Mary $100.

  • Take $100 out of John’s account.

  • Add $100 to Mary’s account

  • Problems:

    • John or Mary might not have an account

    • John might not have $100

    • System might fail after subtracting $100 from John

  • If failure occurs, must undo partial results

  • “Commit”: successful recording of a transaction

  • “Abort”: failure of a transaction.


Consistency

Consistency

  • Maintain database constraints

    • data validity

    • unique primary keys

    • referential integrity

    • conservation conditions (debits = credits, total cash = sub of cash in all accounts, etc.)


Isolation

Isolation

  • Two transactions T1, T2 are interleaved if some steps of one are performed after the other starts but before it completes.T1 has steps A B C D E F; T2 has steps P Q R SA BPC DQ R SE F is an interleaved schedule.P Q R SA B C D E F is not interleaved.

  • A sequence of transactions is isolated if their steps can be interleaved without affecting the result. Transactions are blind to simultaneous execution.


Durability

Durability

  • Results must survive failure.

  • Logging. Maintaining a record of all data updates so databases can be repaired if failure occurs.

  • Updates must be logged before they are performed. If failure occurs, transaction can complete from failure point.

  • If abort is necessary, can undo logged transactions.

  • Without logging, can’t recover from some types of failures.


Simultaneous transactions

Simultaneous Transactions

  • If all TP were done by one single-threaded process, it would be easy. Just execute one step at a time.

  • With just two threads (or processes) it’s complicated.

    2 transactions T1, T2: READ A; A = A+1; WRITE A;

  • Value of A is 6, but it should be 7!


Locking

Locking

  • A solution is LOCKING. Associate a variable with one process at a time. LOCK the others out.

  • LOCK A; READ A; A = A+1; WRITE A; UNLOCK A;

  • If T1 starts first, it locks A.

  • When T2 tries to lock A, it can’t. It has to wait.

  • T1 finishes completely before T2 can lock A.

  • After T1 finishes, A = 6

  • After T2 finishes, A = 7, the correct value

  • Locking achieves atomicity


Deadlock

Deadlock

T1: LOCK A; LOCK B; B=A+B; UNLOCK A; UNLOCK B;

T2: LOCK B; LOCK A; B=A-B; UNLOCK B; UNLOCK A;

This is deadlock. Neither transaction can complete.


Ways to eliminate deadlock

Ways to Eliminate Deadlock

1.Require each transaction to request all locks at the same time. System either grants them all or none.

  • Problem: very restrictive. Transactions cannot be interleaved. Essentially serial execution.

    2.Assign an ordering to the variables: A=1;B=2;C=3 …Require transactions to request locks in that order.

    3.Do nothing. Periodically check for deadlock. If it exists, cancel out a transaction.


Ecommerce technology 20 751 databases

Q

A

&


  • Login