Program 3

1 / 19

# Program 3 - PowerPoint PPT Presentation

BCNF vs 3NF Program 3 BCNF Written recursively Pseudo code: doBCNF(schema r , fds) Let fd = BCNF violating dependency in fds if fd is null then return else split into two tables table1 = fd[domain] U fd[range] table2 = r – fd[range] doBCNF(table2, fds) BCNF Results

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Program 3

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
BCNF
• Written recursively
• Pseudo code:

doBCNF(schema r , fds)

Let fd = BCNF violating dependency in fds

if fd is null then return

else

split into two tables

table1 = fd[domain] U fd[range]

table2 = r – fd[range]

doBCNF(table2, fds)

BCNF Results
• Average time taken to do decomposition
• 36ms
• Pros:
• Easy to code
• Fast
• Cons:
• Fds lost (see sample inputs)
BCNF Sample Inputs
• Example From Class
• Input:
• R = {A,B,C,D,E,G}
• F = {AB->C, C->A, BC->D, ACD->B, D->EG, BE->C,CG->BD,CE->AG}
• Output:
• { {C,A}, {D,E,G}, {B,C,D} }
• Lost Fds:
• ACD->B, AB->C,BE->C,GC->BD,CE->AG
Another BCNF Example
• From Class Also
• Input:
• R = {C,T,H,R,S,G}
• F = {CS->G, C->T, HR->C, HS->R, HT->R}
• Output:
• { {C,S,G}, {C,T}, {H,R,C}, {H,R,S} }
• Lost Fds
• HT->R
Beers Example (from book)
• Input:
• F = { {beer -> manf}, {bar -> bar_addr license}, {person -> person_addr, phone}, {bar, beer -> price } }
• Output:
• Lost Fds
• None
Movies Example
• Input:
• F = { {Movie,Year -> Director,Studio,Duration,BoxOfficeRevenue, Awards, Rating}, {Director, Studio, Year -> Movie}, {LeadActor,Year -> Movie} {LeadActress, Year -> Movie} }
• Output
• Lost Fds
3NF Synthesis
• Implementation
• Broke into modules that performed a separate step in the algorithm
• Average Time Taken: 200ms
• Pros:
• Lossless
• Cons:
• Slow and complex
3NF Example 1
• Example From Class
• Input:
• R = {A,B,C,D,E,G}
• F = {AB->C, C->A, BC->D, ACD->B, D->EG, BE->C,CG->BD,CE->AG}
• Output:
• { {A,B,C}, {B,C,D}, {D,E,G}, {B,E,C}, {C,G,B}, {C,E,G} }
3NF Example 2
• From Class Also
• Input:
• R = {C,T,H,R,S,G}
• F = {CS->G, C->T, HR->C, HS->R, HT->R}
• Output:
• { {S,C,G}, {C,T}, {R,H,C}, {S,H,R}, {T,H,R} }
3NF Example 3
• Input:
• F = { {beer -> manf}, {bar -> bar_addr license}, {person -> person_addr, phone}, {bar, beer -> price } }
• Output:
• Same as BCNF
3NF Example 4
• Input:
• F = { {Movie,Year -> Director,Studio,Duration,BoxOfficeRevenue, Awards, Rating}, {Director, Studio, Year -> Movie}, {LeadActor,Year -> Movie} {LeadActress, Year -> Movie} }
• Output
Hybrid Approach
• First use 3NF to generate temporary schemas
• Next, use BCNF on each of the temporary schemas to further decompose
• Eliminate possible redundancies
• Average Time: 227ms (which is to be expected)
Hybrid Example 1
• Example From Class
• Input:
• R = {A,B,C,D,E,G}
• F = {AB->C, C->A, BC->D, ACD->B, D->EG, BE->C,CG->BD,CE->AG}
• Output:
• { {C,A}, {B,C,D}, {D,E,G}, {E,B,C}, {G,C,B}, {E,C,G} }
• Lost Fds
• ?
Hybrid: Movies
• Input:
• F = { {Movie,Year -> Director,Studio,Duration,BoxOfficeRevenue, Awards, Rating}, {Director, Studio, Year -> Movie}, {LeadActor,Year -> Movie} {LeadActress, Year -> Movie} }
• Output
• Same as 3NF
Data Mining
• Artificial Neural Network
• Weka Implementation
• ANN with error back propagation
• Oracle-XE Database
• Java Connectivity
• Oracle Thin Driver
• Get database data with java and use Weka to build ANN and classify
Data Set
• Census Data
• Predict Income Class of US citizens given census data.
• Income Class: <=\$50,000 or >\$50,000
• For simplicity (difficult to predict exact income)
• > 30,000 tuples
• CSV:
• http://cs.uga.edu/~mcknight/nlp/data.csv
• http://128.192.101.74:9090/apex
Results
• ANN with 27 internal nodes, 55 input nodes
• Training Data:
• 66% of original data used for training
• the remaining 33% used as test set for classification
• Training Time: 2050.91 seconds
• Error:

Correctly classified: 84.0665 %

Mean absolute error: 0.1698

Root mean squared error: 0.3558

Relative absolute error: 46.6323 %

Root relative squared error: 83.9026 %