1 / 33

An Efficient External Sorting Algorithm for Flash Memory Embedded Devices

An Efficient External Sorting Algorithm for Flash Memory Embedded Devices. Tyler Cossentine - M.Sc. Thesis Defense. Overview. Introduction Previous work Flash MinSort Experimental Results Conclusions. Introduction. Embedded systems are devices that perform a few simple functions.

vevay
Download Presentation

An Efficient External Sorting Algorithm for Flash Memory Embedded Devices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Efficient External Sorting Algorithm for Flash Memory Embedded Devices Tyler Cossentine - M.Sc. Thesis Defense

  2. Overview Introduction Previous work Flash MinSort Experimental Results Conclusions Tyler Cossentine - M.Sc. Thesis Defense

  3. Introduction Embedded systems are devices that perform a few simple functions. Embedded devices typically have limited power, memory and computational resources. Many embedded systems applications involve storing and querying large datasets. Sorting algorithms are commonly used in query processing. Tyler Cossentine - M.Sc. Thesis Defense

  4. Embedded Devices Not designed to be general purpose devices. • Wireless sensor networks, smart cards, etc. Can communicate with other devices through wired or wireless interfaces. Hardware constraints: • Battery powered • Low-power microcontroller • Limited memory (as little as a 1kB) • Small amount of local storage (Flash or EEPROM) Tyler Cossentine - M.Sc. Thesis Defense

  5. Sensor Networks Sensor networks are used in military, environmental, agricultural and industrial applications. A wireless sensor node contains a microcontroller, sensing system, local storage, battery and wireless radio. Devices may process data locally or send it to a common collection point (sink)for processing. On-device data storage and query processing has the potential to reduce communication and energy use [6][8]. Tyler Cossentine - M.Sc. Thesis Defense

  6. Flash Memory A type of EEPROM • Available in higher capacities • Organized as pages of data • A page is erased before it is written • Erase unit is typically a block of pages Two types: NOR and NAND • NOR memory supports byte-level reads • NAND requires error-correcting code (ECC) Unique performance characteristics • Asymmetric read and write costs (10-100 times faster reads) • Low-cost random reads • Memory wear Tyler Cossentine - M.Sc. Thesis Defense

  7. Flash Memory Memory Array [1] Tyler Cossentine - M.Sc. Thesis Defense

  8. Flash Memory Block Diagram [1] Tyler Cossentine - M.Sc. Thesis Defense

  9. Relation Tyler Cossentine - M.Sc. Thesis Defense

  10. Sorting Algorithms Sorting is a fundamental class of algorithms because it allows for efficient ordering of results, joins, grouping and aggregation. An in-place sort can be performed when the entire dataset fits into memory: • Merge sort • Quicksort External sorting: • Use external memory (hard disk) to sort the dataset • External merge sort is the standard in databases Tyler Cossentine - M.Sc. Thesis Defense

  11. Previous Work The most memory efficient external sorting algorithm is one key scan [2]. • Performs D+1 scans, where D is the #of distinct sort key values. • Keeps track of: • current is the sort key value that is being output in this scan. • split is the next smallest sort key value encountered. • The algorithm needs an initial scan to determine the values of current and split. • Requires enough memory to store two sort key values. One Key Scan Tyler Cossentine - M.Sc. Thesis Defense

  12. Previous Work A heap sort algorithm, called FAST(1) [7], uses a binary heap of size N tuples to store the next smallest tuples encountered during a scan. • Performs T/N scans, where T is the # of tuples and Nis the number of tuples that fit into memory • Requires enough memory to store a tuple • May be slower than one key scan if there are few distinct sort key values, the tuple size is large or the dataset is large. Heap Sort Tyler Cossentine - M.Sc. Thesis Defense

  13. Previous Work The external merge sort [5]algorithm is the standard sorting algorithm used in databases. • An initial read pass constructs sorted sub lists the size of the amount of RAM allocated to the operator. • The merge phase can consist of multiple passes. • Each pass buffers one page from each of the sub lists, performs a merge and writes a temporary result to flash. • The algorithm requires at least three pages of memory. External Merge Sort Tyler Cossentine - M.Sc. Thesis Defense

  14. Previous Work External merge sort requires writing and a significant amount of memory that makes it non-executable in certain embedded applications. Existing sorting algorithms for datasets stored in flash memory favor reads over writes. Existing sorting algorithms do not take advantage of low-cost random reads. Performance depends on the properties of the input dataset. Data collected in applications such as sensor networks is often clustered spatially and temporally. Summary Tyler Cossentine - M.Sc. Thesis Defense

  15. Flash MinSort Flash MinSort [3]uses low-cost random reads to retrieve only required pages during a scan of the relation. It builds a dynamic index over the relation that stores the minimum value in eachregion. A region represents one or more pages of data. The algorithm maintains a current minimum value and next minimum value. During a pass, only pages located in a region that has a minimum value equal to the current minimum are read. Overview Tyler Cossentine - M.Sc. Thesis Defense

  16. Flash MinSort The algorithm keeps track of the next smallest value in a region as it is being read (nextIdx). After a region has been read, its minimum value in the index is updated. Adapts to the size of the input relation and caches pages when given additional memory. Overview Tyler Cossentine - M.Sc. Thesis Defense

  17. Flash MinSort Dataset Index Example Output #1 Scan Min index Find 1 in region #1 Search page #1 Output tuple #1 next = 9, nextIdx = 4 Output #4 Output tuple #4 Region Min set to 2 Output x 9 1 (from pg. 1, tuple 1) 1 (from pg. 1, tuple 4) Output #5 Find 1 in region #8 Search page #8 Output tuple #1 next = ∞, nextIdx = 2 1 (from pg. 7, tuple 2) 1 (from pg. 7, tuple 4) 1 (from pg. 8, tuple 1) 1 (from pg. 8, tuple 2) Output #2 Output tuple #4 Region Min set to 9 1 (from pg. 8, tuple 3) x 2 1 (from pg. 8, tuple 4) Output #6 Output tuple #2 next = ∞, nextIdx = 3 ∞ x 2 (from pg. 6, tuple 4) Output #3 Find 1 in region #7 Search page #7 Output tuple #2 next = 2, nextIdx = 4 2 (from pg. 7, tuple 1) 2 (from pg. 7, tuple 3) Output #7 Output tuple #3 next = ∞, nextIdx = 4 . . . . Page Buffer Tyler Cossentine - M.Sc. Thesis Defense

  18. Flash MinSort In the ideal case, each region represents a single page. The amount of memory required to store the minimum value of each page is LK * P, where LK is the size of the sort key and P is the number of pages. If there is not enough memory, each region represents two or more adjacent pages. The minimum amount of memory required is 4*LKfor two regions. Performance Tyler Cossentine - M.Sc. Thesis Defense

  19. Flash MinSort If the flash chip supports direct byte reads, Flash MinSortis even more efficient as it only needs to read the sort key values. Performance: • P = # of pages, T = # of tuples, NP = # of pages in a region • DR = average # of distinct values in a region, R = # of regions • LK = size of key in bytes, LT = size of tuple in bytes Direct Reads Tyler Cossentine - M.Sc. Thesis Defense

  20. Flash MinSort Considering only page reads Flash MinSortis: • Faster than one key sortin all cases. • Faster than heap sortunless input size is only a small multiple of the memory size (e.g. 2 to 5). • Faster than external merge sortfor a large spectrum of the possible configurations even while using less memory and performing no writes. Comparison Tyler Cossentine - M.Sc. Thesis Defense

  21. Experimental Evaluation Experimental evaluation compares: Flash MinSort,one key sort,heap sort,andexternal merge sort. 2kB of memory available to operators Sensor node hardware: • Atmel Mega644p (8 MHz) • 4KB SRAM • 2MB Atmel AT45DB161D serial flash (512 byte page size) • Node design was used for field measurement of soil moisture for use with an automated irrigation controller [4]. Dataset: • Three months of the live soil sensing data and generated ordered and random data sets. The real data set has 10,000 records (160KB) and 43 distinct values. • Record size is 16 bytes. Sort key is a 2 byte integer. Tyler Cossentine - M.Sc. Thesis Defense

  22. Raw Device Performance Time to read 50,000 tuples: 5.3 seconds Time to write 50,000 tuples: 23 seconds Write-to-read ratio: 4.7 Time to scan 50,000 sort keys: 2.1 seconds Notes: • Buffering a page in processor memory is more efficient than using on chip buffers due to bus communication and latency. • Bus speeds affect write-to-read ratio. Even though writing is considerably slower on the chip, this was masked due to the speed of the processor and bus. Tyler Cossentine - M.Sc. Thesis Defense

  23. Real Data Heap sort is not shown as time is order of magnitudes longer: • 100 bytes (5 tuple): 10,000 passes, 3,377 seconds • 1200 bytes (74 tuples): 302 seconds MinSortDR is a direct read version of MinSort. External merge: 1536 bytes (3 pages): 7 passes, 76 seconds Tyler Cossentine - M.Sc. Thesis Defense

  24. Random Data • Data set with 10,000 records and 500 distinct values (1 to 500). • Heap sortperforms the same number of passes regardless of the data set (random, real, or ordered). • External merge sorttook 78 seconds as the sorting during initial run generation took slightly more time. Tyler Cossentine - M.Sc. Thesis Defense

  25. Ordered Data • Sorted, real data set with 10,000 tuples and 43 distinct values. • MinSort did not detect sorted regions but still gets a benefit by detecting duplicates of the same value in a region. • External merge sorttook 75 seconds. Tyler Cossentine - M.Sc. Thesis Defense

  26. Results Summary MinSortis faster than one key sortand heap sortwith or without using direct byte reads from the device. • Especially good for sensor data that exhibits temporal clustering. • MinSort is a generalization of one key sort, and performance of both algorithms depends on the number of distinct values. Heap sortis not competitive for small memory sizes. • The ratio of available RAM versus dataset size is key. Tyler Cossentine - M.Sc. Thesis Defense

  27. Results Summary External merge sortperforms well, but requires at least three pages (1,536 bytes) of memory. • For the real data set on this platform, external merge sort will never be faster assuming at least two passes. • For wireless sensing applications, dealing with the additional space and wear leveling complicates system design and performance. Tyler Cossentine - M.Sc. Thesis Defense

  28. Solid State Drives Solid state drives (SSD) have sophisticated controllers that support wear leveling, address translation and buffer management. Test system: • AMD Operton 2.1GHz • 32GB DDR3 • Intel X25 SSD (1.6 write-to-read ratio) Data: • 5,000,000 tuples (80MB) • 16B tuples Experimental Setup Tyler Cossentine - M.Sc. Thesis Defense

  29. Solid State Drives Real Data 43 distinct sort key values Tyler Cossentine - M.Sc. Thesis Defense

  30. Solid State Drives Random Data 500 distinct sort key values Tyler Cossentine - M.Sc. Thesis Defense

  31. Conclusion Flash MinSortis a sorting algorithm designed for datasets stored in flash memory on computationally constrained embedded devices. Its performance is better than existing algorithms by exploiting low-cost random reads. Depending on the properties of the dataset, FlashMinSortcan outperform External Merge Sort on SSDs. Tyler Cossentine - M.Sc. Thesis Defense

  32. References Tyler Cossentine - M.Sc. Thesis Defense

  33. References Tyler Cossentine - M.Sc. Thesis Defense

More Related