6. 관계형 데이터베이스 설계

6. 관계형 데이터베이스 설계 서울시립대학교 전자전기컴퓨터공학부 김한준

Introduction • Database Design => 어떤 table을 만들 것인가? • 목표 : 불필요한 중복(redundancy)없이 필요한 정보를 모두 표현(저장)할 수 있는 schema => 어떤 attribute를 갖는 어떤 table을 둘 것인가? R = ( A B C D E ) <----- single relation schema 정의 DB1 = { R1, …… , Rn } <----- DB schema 정의 (set of relation schemas)

정보 중복 (information repetition) • ‘학년(YEAR)’ 속성이 학생 Table에서 수강 Table로 옮김. • 같은 학번의 Record가 반복될 때마다 ‘학년’이 반복됨. 수강 학번 과목번호 성적 학년 100 C413 A 4 100 E412 A 4 200 C123 B 3 300 C312 A 1 300 C324 C 1 300 C413 A 1 400 C312 A 4 400 C324 B 4 400 C413 C 4 500 C312 B 2 학생 수강 과목 학과 담당교수 성적 학번 학년 과목번호 학과 과목이름 이름 학점

Pitfalls in relational DB design • 기피해야 할 속성들 • Repetition of information • Inability to represent certain info. • Loss of information 예) Lending = (b_name, asset, b_city, loan#, c_name, amount) • Lending이 10,000개면 asset 값이 변할 때마다 같은 내용을 10,000번 바꾸어야 한다. • Key는 null이 될 수 없으므로 Loan이 하나도 없는 branch의 정보는 넣을 수 없다 (loan#가 없으므로). • 대출이 없으면 지점에 관한 정보를 저장할 수 없다 해결책 => Decomposition !!

Decomposition Lending = (b_name, asset, b_city, loan#, c_name, amount) • Anomalies (이상현상) • Insertion anomaly : loan# 없이 b_name, ~등 insert불가 • Deletion anomaly : 어떤 branch의 마지막 account delete • Update anomaly : 중복이 많은 경우 • 원인 => 정보의 중복 => 여러 개체가 하나의 table에 합쳐짐 => Decompose!

예제 테이블 정규화한 테이블 정규화하지 않은 테이블

정보의 중복에 따른 anomaly • 삽입이상 • 새로운 회사원 튜플을EMP_DEPT에 삽입하려면, 부서에 대한 attribute 값도 알아야 한다. 그렇지 않으면, NULL값이 들어가게 된다. • 사원이 아직 없는 새로운 부서에 대한 정보는 삽입하기가 곤란 • 주키가NULL ! • 삭제 이상 • 그 부서에서 일하는 마지막 회사원을 삭제했을 때에는 그 부서에 대한 정보자체도 사라짐. • 갱신이상 • 한 부서의 attribute중 하나의 값을 변경시키려 할 때 그 부서에서 일하는 모든 회사원에 대한 정보를 수정해야 함.

정보중복에 따른 Anomaly • 삭제 이상(deletion anomaly) • 만일 학번이 200인 학생이 과목 ‘C123’의 등록을 취소하다면 • 이 학생이 3학년이라는 정보까지 함께 삭제됨. • 한 투플을 삭제함으로써 유지해야 될 정보까지도 삭제되는 연속 삭제(triggered deletion) 현상 • 정보의 손실(loss of information)

정보중복에 따른 Anomaly • 삽입 이상(insertion anomaly) • 만일 새로 학번이 600인 학생이 학년이 2학년이라는 사실을 이 릴레이션에 입력하려면 ? • 과목을 등록하지 않는 한 이 삽입이 성공할 수 없음. 왜냐하면, ‘학번’과 ‘과목번호’가 이 릴레이션의 기본키임. • 가상의 임의의 과목번호를 함께 삽입 • 데이터를 삽입하려고 할 때, 불 필요한 데이터를 함께 삽입

정보중복에 따른 Anomaly • 갱신 이상(update anomaly) • 학번 300인 학생의 학년을 1에서 2로 변경 • 학번 300인 모든 투플을 변경 • 만약, • 여러 투플 중 일부만 갱신하면 정보의 모순성(inconsistency) 발생

특정 정보의 표현 불가 대부(지점명, 대부번호, 고객명, 대부금액) => R1(대부금액, 고객명) R2(지점명, 대부번호, 대부금액) • “Lee가 대부를 받는 지점명은?” -> 알 수 없음

Functional Dependencies (FD) • 어떤 attribute(또는 a set of attributes)의 값이 다른 attribute(a set of attributes)의 값을 결정 • 키의 개념의 일반화 • 정의 R: relation scheme. Let ⊆R, ⊆ R Functional dependency → holds on R, if in any legal relation r(R), for all pairs of tuples t1 and t2 in r, if t1[]=t2[] then t1[]=t2[].

Functional Dependencies (FD) • 예 ) 학생 학번 학과 학과장 친구 친구학과 1111 전산 박영희 2222 전산 홍길동 학과  학과장 친구  친구학과

Functional Dependencies (FD) • FD는 Integrity Constraint의 일종 • FD는 한 개의 instance를 보고 구할 수 있는 것이 아님 • Designer가 다른 Integrity constraint를 define하듯이 schema의 의미에 맞게 결정함 • Trivial FD • FD that holds in every possible instance • → is trivial iff⊇ • Superkey •   R is a super key if   R • Candidate key •   R is a candidate key if   R and A, (-A)  R

Functional Dependency의 예 Loan-info-schema =( branch-name,loan-number, customer-name,amount). loan-number  amount loan-number  branch-name loan-number  customer-name

Functional Dependency 관련 rules • Inference Rules for FD • Rules that can be used to derive new FD from existing ones • Armstrong’s Inference Rules A1. Reflexitivity rule: If ⊇, then → A2. Augmentation rule: If → holds and  is a set of attributes, then →  A3. Transitivity rule: If → and  → , then → 

Functional Dependency 관련 rules • [Proof of A3] (Transitivity rule: If → and  → , then → ) Given  →  and  →  Suppose t1 and t2 are tuples such that t1[ ] = t2[] then by  →  , t1[] = t2[] then by  → , t1[] = t2[] Thus,  →  holds

Functional Dependency 관련 rules • Additional inference Rules for FD • Union rule:    and   ,then    • Decomposition rule:   , then    and    • Pseudotransitivity rule:    and    , then   

Functional Dependency 관련 rules • Closure • Let F be a set of FD. The closure of F, F+, is the set of all FDs logically implied () by F • A set of inference rules is • sound if using them, we can’t derive (deduce) any FD not in F+ • complete if given a set F, the rules allow us to derive all members of F+ • Armstrong’s rules are sound and complete

Functional Dependency 예제 • R = ( A,B,C,G,H,I) • F = {A B A C CG H CG I B H} • F+ - A H - AG I, AG H - CG HI, A BC

Functional Dependency 알고리즘 • Closure of Attribute Sets •  : a set of attributes + ( -closure): set of all attributes functionally determined by  (under F) *  + R, then  is a superkey of R • An algorithm to compute + result := ; while (changes to result) do foreachFD  in F do begin if   result then result := result   ; end

Functional Dependency: closure예제 • R = ( A,B,C,G,H,I ) • F = {A  B • A  C • CG  H • CG  I • B  H} • (AG+ ) • 1. result = AG • 2. result = ABG (A  B 이고 A  AG) • 3. result = ABCG (A  C 이고 A  ABG) • 4. result = ABCGH (CG  H 이고 CG  ABCG) • 5. result = ABCGHI (CG  I 이고 CG  ABCGH) • AG 는 후보 키인가?

Functional Dependency: cover • A cover of F is any F’ such that F’+ = F+ • A FD gF is redundant if (F- {g})+ = F+ or g (F- {g})+ • F’ is a nonredundant (minimal) cover of F if F’+ = F+ and F’ contains no redundant FD

Functional Dependency: canonical cover • Canonical Cover • simplified minimal cover • Extraneous attribute • →  F • A   is extraneous if F (F - {→ })  {( - A)→ } • A is extraneous if F (F - {→ })  {→( - A)} • 예) A→ BC AD → B (D를 빼도 A → B) • Fc is a canonical cover for F if • Fc+ = F+ • No FD in Fc contains an extraneous attribute • Each left side of a FD in Fc is unique

Functional Dependency: canonical cover 예제 • R = (A, B, C) • F = {A  BC, B  C, A  B, AB  C} • A  BC and A  B  A  BC • B  C  AB  C • A는 AB  C에서 이질적이다. • A  BC  A  B , B  C • C는 A  BC에서 이질적이다. • F의 canonical cover Fc • A  B • B  C

Decomposition • 정의 R : relation scheme {R1, ..., Rn} : database scheme (set of relation schemes) {R1, ..., Rn} is a decomposition of R if R = R1 ... Rn (즉, R의 모든 attribute가 R1, … ,Rn에 존재) 학생(학번, 이름, 학과, 학과장, 학과전화, 학년) => 학생(학번, 이름, 학년, 학과) 학과(학과, 학과장, 학과전화) Lending = (b_name, asset, b_city, loan#, c_name, amount) => Branch = (b_name, asset, b_city) Loan = (loan#, c_name, amount)

Normalizationusing Functional Dependencies • Lossless Decomposition Lending = (b_name, asset, b_city, loan#, c_name, amount) : anomalies due to repetition of information => Branch = (b_name, asset, b_city) Loan = (loan#, c_name, amount) • 문제점 : 상관 관계가 없어짐 (연결 정보의 손실) loss of information => Branch = (b_name, asset, b_city) Loan = (loan#, c_name, amount, b_city) • natural join시 tuple은 증가, but information (relationship) 상실 • loss of information

Decomposition Careless decomposition leads to loss of information 잘못된 연결 정보 설정 (Lossy decomposition) 일반적으로 : r (R), ri=  Ri(r) 일때 r ⊆ r1▷◁▷◁rn Definition: Decomposition {R1, ..., Rn } is a lossless join decomposition of R if r = r1▷◁▷◁rn • 기준 information은 원래의 information in r

B 1 2 R1 = (A) A   A B  1  2  1 B(r) A B  1  2  1  2 A(r) r Decomposition • R = (A, B)의 분해 • A(r) B(r) R2 = (B)

Decomposition Lemma {R1,...,Rn} is a lossless decomposition if 1) R1∩R2 → R1, or 2) R1∩R2 → R2 • 즉, 분해한 두 개의 schema중 하나가 다른 하나의 key를 포함하면 연관 관계의 손실이 없다. r1 r2 r1 c a 1 c a a 1 d a 1 d a b 2 r e b 2 e b b 3 f b 3 f b r2

Dependency Preservation • Dependency Preservation 이름 → 시, 도 시 → 도 • dependency preserve하는 경우 이름 → 시 시 → 도 => 이름 → 시, 도

Dependency Preservation • dependency preserve하지 않는 경우 이름 → 시 이름 → 도 • lossless decomposition • but “시 → 도”를test 하려면 join필요 • => dependency를 preserve하지 못함.

Dependency Preservation Definition F : set of FD on R. {R1, ..., Rn} : decomposition of R. Fi : restriction of F to Ri ⇒ Set of all F.D.s in F+ that include only attributes of Ri Definition Let F'=F1∪…∪ Fn. The decomposition is dependency-preserving if F+ = F'+ 각각의 Ri에서 검증되는 FD만으로 원래의 FD를 보장

Dependency Preservation • Decomposition시 주의 사항 • lossless decomposition • 다음 종속중인 적어도 하나가 F+에 있도록 - R1  R2  R1 - R1  R2  R2 2) dependency preserving 각각의 Ri에서 검증되는 FD만으로 원래의 FD를 보장  그렇지 않으면, 데이터 갱신 時 FD 위배 검사 비용이 많이 든다. 3) reduce repetition of information 릴레이션R1과 R2가 BCNF 또는 3NF에 있도록 해야 한다.

예제 • R = (A , B, C) • F = {A  B, B  C} • R1 = (A, B), R2 = (B, C) - lossless join decomposition R1  R2 = {B} 이고 B  BC - dependency preserving • R1 = (A, B), R2 = (A, C) - lossless join decomposition R1  R2 = {A} 이고 A  AB - not dependency preserving (R1 R2를 계산해 보지 않고는 B  C를 체크할 수 없다)

Normal Form • Boyce-Codd Normal Form (BCNF) • For every nontrivial FD X→Y in F+, • Non-Trivial또는 Trivial의 의미? • Y  X 일 때, X→Y은 trivial • X is a superkey of R.

BCNF의 예제 • R = (A , B, C) • F = {A  B, B  C} Key = {A} • R은 BCNF가 아니다 • 분해 R1 = (A, B), R2 = (B, C) - R1과 R2는 BCNF이다

Normal Form • Third-Normal Form ∀ X→Y ∈ F+ at least one of the following is true ① X → Y is trivial. ② X is superkey for R. ③ Each attribute in Y is contained in a candidate key for R.

Normal Form 예제 Lending = (b_name, b_city, asset, c_name, loan#, amount) b_name → asset b_city loan# → amount b_name Branch = (b_name, b_city, asset) → BCNF Loan = (c_name, loan#, amount, b_name) → not BCNF Loan = (loan#, amount, b_name) → BCNF Borrower = (c_name, loan#) → BCNF Branch, Loan, Borrower → 전체가 BCNF

Normal Form 예제 R(street, city, zip) street city→ zip zip → city /* R is in 3NF but not in BCNF (nontrivial & zip is not key) */ R1(street, zip) R2(city, zip) /* violate 하는 FD의 attribute로 하나의 relation을 */ Now R1, R2 are in BCNF but not dependency-preserving. => cannot achieve both BCNF and dependency-preserving.

Normal Form 예제 • Relation R(A, B, C, D)에 대해, • FD F = {AB → C, C → D, D → A}가 성립 • 릴레이션R은 BCNF인가? 아니면 BCNF를 만족하도록 분해하시오. • 릴레이션R은 3NF인가? 아니면 3NF를 만족하도록 분해하시오. C B A R1 = (A B C) R2 = (C D) D ?

A B C C D E Normal Form • Normal Form 의 기능 또는 의미 • 단위 entity는 각각 독립된 relation으로! • attribute간의 dependency는 key에만 존재하도록! • BCNF vs. 3NF • 항상 dependency-preserving한 BCNF존재하는 것은 아님 • 3NF는 BCNF보다 약간 조건을 풀어 준 형태 • 항상 dependency preserving한 3NF를 구할 수 있다. • Why BCNF? => 3NF에도 redundant case 존재 • Why do we want dependency preserving decomposition ? => easy integrity constraint enforcement R1 R2 • FD: A → E • join해야 test가능 • dependency preserving일 경우 각 table을 따로 test해도 보장

관계형데이터베이스설계 • Design Goals 1. BCNF 2. lossless join 3. dependency preserving If not possible 1. 3NF 2. lossless join 3. dependency preserving

주요포인트 1. Not every set of FD's can be represented by a BCNF schema. 2. It is computationally very difficult to determine if a given relation is in BCNF. • It is always possible to obtain a dependency-preserving decomposition into 3NF • 3NF, BCNF 만드는 법 익힐 것. • Normal Form 의 기능 또는 의미 • 단위 entity는 각각 독립된 relation으로! • attribute간의 dependency는 key에만 존재하도록!

7장 보충

Exercise R(c_id, e_id, b_name, type) e_id b_name c_id, b_name e_id Key = (c_id, e_id) /* R is in 3NF but not in BCNF */ c_id, e_id R (c_id, b_name)+ = (c_id, b_name, e_id, b_name, type) so, (c_id, b_name) is a candidate key. => R is in 3NF

Exercise • R(a, b, c) • FD = {bc} • FD를 enforce하는 assertion 만들기 CREATE ASSERTION FD CHECK (NOT EXISTS (SELECT b FROM r GROUP BY b HAVING COUNT(DISTINCT c) > 1 ) )

Exercise • Candidate key 구하기 R = (A, B, C, D, E) F = { A->BC, CD->E, B->D, E->A} F+ = { A->B, A->C, A->D A->CD->E B->D => BC->CD, … } So, candidate key= A, CD, BC, E

Exercise R = (branch-name, branch-city, assets, customer-name, loan#, amount) F = {branch-name->assets branch-city loan#-> amount branch-name} Key = {loan#, customer-name} R1 = {branch-name, assets, branch-city} R2 = {customer-name, loan#, amount, branch-name}

Exercise R = (학번, 과목#, 과목명, 등급) F = { 과목#->과목명 학번,과목#->등급 } • BCNF결과 • R1 = (과목#, 과목명) • R2 = (학번, 등급, 과목#)

6. 관계형 데이터베이스 설계

6. 관계형 데이터베이스 설계

Presentation Transcript

Lab 6-6

6-6

Lab 6-6

6/6/2014

6/6/2014

8.11 6 6

6-6

6-6

6-6

6/6/14

Illustration 6/2-6/6 Livoti

6-2-6

Eph. 6:6

LE 6-6

6-6

P-6-6

6/6/14

Lesson 6-6

Lesson 6-6

6-6

6-6

6-6