1 / 19

Advance Database Systems and Applications COMP 6521

Professor: Dr . Gosta Grahne Lab Instructor: A shkan azarnik Group 5 Deyvid William Romeo Honvo Venkatesh S R. Advance Database Systems and Applications COMP 6521. Contents. Project 1 External Sorting Algorithm, 2PMMS Implementation  Project 2

otylia
Download Presentation

Advance Database Systems and Applications COMP 6521

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Professor:Dr. Gosta Grahne Lab Instructor:Ashkan azarnikGroup 5 Deyvid William Romeo Honvo Venkatesh S R Advance Database Systems and Applications COMP 6521

  2. Contents • Project 1 External Sorting Algorithm, 2PMMS Implementation  • Project 2 Mining Frequent Itemsets from Secondary Memory Part 1: Problem Analysis & Algorithm consideration Part 2: Algorithm Description & Design principles

  3. Project 1External Sorting Algorithm2PMMS Implementation

  4. Problem Statement

  5. PROJECT 1 • Develop a program which sort numbers in ascending order using 2 Phase Multiway Merge Sort(2PMMS) with limitation of 5MB of virtual memory. • External sorting is required when the data being sorted do not fit into the main memory of a computing device and instead they must reside in slower external memory (usually hard drive).

  6. Two-Phase Multiway Merge-Sort (2PMMS) Solution Unsorted File Sorted File Sorted Runs Phase 2 Phase1

  7. Approach to the problem • In the 1ST Phase, chunks of data that fit in main • memory are read, sorted using the built-in • function from Arrays class (Java) and written out • to temporary files. • In the 2nd Phase (Merging), the sorted temporary • files are combined using 2 phase multiway • merge sort into a single larger file.

  8. Challenges Faced Which algorithm to choose ? • After a few tests, we decided to use the built-in sort function from Java that implements a tuned quicksort algorithm. • This algorithm offers n*log(n) performance on many data sets that cause other quicksort's to degrade to quadratic performance. • Efficient average case compared to other sort algorithms. • A buffer of size 750,000 was used for the 1st phase • newBufferedReader from Java 7 used to read files

  9. List of Data Structures • Primitive Types: Boolean, Integer, Long • Abstract Types: Array, String • Arrays (Linear Data Structure) Integer Array, Boolean Array, Long Array • I/O: newBufferedReader

  10. Project2 Mining Frequent Itemsets from Secondary Memory Develop an application that will compute the frequent itemsets of all sizes (Pairs, Triples, Quadruples, etc.) from a set of transactions based on input support threshold percentage.

  11. Algorithms Considered FP-Growth vs Eclat Eclat uses a purely vertical representation whereas FP-growth combines in its FP-tree structure both vertical and horizontal representations Fp-Growth takes lot of memory and difficult to implement compared to Eclat

  12. ECLAT Better Execution Time Memory Efficient Basic algorithm Very good for dense datasets Require less amount of memory compared to FP-growth Map of Bitsets

  13. ECLAT ImplementationList of Data Structures • Primitive Types • Boolean, Integer, Double • Abstract Types • Hash Map • String • Arrays • Array List (Dynamic) • Bit Set (Bit Array) • String Array

  14. ECLAT Implementation 1.Scan original file, find frequent items 2.Generate n partitions (files) that contain groups of frequent items 3.Read every file, register items/transactions, find and write items in the output file

  15. ECLAT Implementation Divide and conquer approach Algortihm based on the concepts of Diskmine and Projection described in Professor’s paper “Mining Frequent Itemsets from Secondary Memory” Large database is decomposed into a number of small databases to be processed Each database contains a percentage of frequent items and all greater items in the same transaction

  16. ECLAT Implementation 1.Scan original file, find frequent items 2.Generate n partitions (files) that contain groups of frequent items 3.Read every file, register items/transactions, find and write items in the output file

  17. ECLAT Implementation Improved 1.Scan original file, find frequent items 2.Generate n partitions that contain groups of frequent items based on the frequency 3.Read every file, register items/transactions, find and write items in the output file

  18. Thanks! Merci!

More Related