1 / 68

A Configurable High-Throughput Linear Sorter System

A Configurable High-Throughput Linear Sorter System. Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS jorgeo@ku.edu.

matteo
Download Presentation

A Configurable High-Throughput Linear Sorter System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Configurable High-Throughput Linear Sorter System • Jorge OrtizInformation and Telecommunication Technology Center2335 Irving Hill RoadLawrence, KSjorgeo@ku.edu • David AndrewsComputer Science and Computer EngineeringThe University of Arkansas504 J.B. Hunt Building, Fayetteville, ARdandrews@uark.edu

  2. Introduction

  3. Introduction • Sorting an important system function Popular sorting algorithms not efficient or fast in hardware implementations • Linear sorters ideal for hardware, but sort at a rate of 1 value per cycle • Sorting networks better at throughput, but with high area and latency cost • Need a better solution for high throughput, low latency sorting

  4. Contributions • Expanding the linear sorter implementation and making it versatile, reconfigurable and better suited for streaming input and output • Parallelizing the linear sorter for increased throughput • Implementing the high-throughput linear sorter, and outmatching the performance of current linear sorter approaches

  5. Background

  6. Background • Software quicksort, mergesort and heapsort use divide-and-conquer techniques to achieve efficiency • Hardware sorting plagued with overhead from data movements, synchronization, bookkeeping and memory accesses • Need better use of concurrent data comparisons and swaps, rather than the extended execution of multiple assembly instructions like its software counterpart

  7. Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 3 2 2 5 5 4 4 1 1

  8. Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 2 2 3 5 5 4 4 1 1

  9. Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 2 2 3 5 4 4 5 1 1

  10. Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 2 2 3 5 4 4 1 1 5

  11. Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 2 2 3 5 1 4 4 1 5

  12. Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 2 2 1 5 3 4 4 1 5

  13. Sorting Networks • Swap comparators sort pairs of values • Sink lowest value, then operate on remaining Sn-1 items • Receive parallel data at inputs • High #PE and latency, resort with each new insertion Bubble Sort 3 1 2 2 5 3 4 4 1 5

  14. Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: Output:

  15. Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: 3 Output:

  16. Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: 2 3 Output:

  17. Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: 5 2 3 Output:

  18. Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: 4 2 3 5 Output:

  19. Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: 1 2 3 4 5 Output:

  20. Linear Sorters • Sorted insertions • Forwards incoming value to all nodes • Each node shifts autonomously depending on neighbors’ values • Single clock latency, small logic & regular structure • Streaming input & output • Serial input, need higher throughput Input: 1 2 3 4 5 Output: 1 2 3 4 5

  21. Configurable Linear Sorter

  22. Configurable Linear Sorter • Increase versatility for linear sorters • Configurable: • Linear sorter depth • Sorting direction • Sort on tags (for example, timestamps) rather than data • User-defined data and tag size

  23. Configurable Linear Sorter Increase functionality for linear sorters • Detect full conditions • Buffer input while full • Retrieve output serially for streaming • Delete top value, freeing nodes • Augment with left shift functionality • Test tags before deleting them

  24. Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0 5

  25. Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 0 1 5 7

  26. Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 5 Node 1 7 Node 2 Node 3 Node 4 Node 5 Node 6 0 1 2 5 7 6

  27. Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 5 Node 1 5 Node 2 6 7 7 Node 3 Node 4 Node 5 Node 6 3 0 1 2 5 7 6 2

  28. Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 2 Node 1 5 5 5 5 6 7 Node 2 Node 3 6 7 Node 4 7 Node 5 Node 6 0 1 2 3 4 5 7 6 2 1

  29. Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 5 1 Node 1 2 5 5 2 Node 2 6 7 5 Node 3 6 7 Node 4 6 7 7 Node 5 Node 6 3 4 2 1 0 5 2 1 7 6 5 9

  30. Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 5 Node 1 2 5 1 2 6 5 2 Node 2 5 7 6 6 5 7 Node 3 Node 4 7 6 7 7 9 Node 5 Node 6 4 0 1 2 3 5 6 9 1 2 3 7 5 6 1

  31. Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 Node 1 5 3 2 1 5 2 5 5 5 Node 2 6 2 7 6 7 6 Node 3 5 6 7 7 Node 4 7 6 9 Node 5 9 7 Node 6 1 0 2 3 6 4 7 5 6 2 7 1 5 9 3 8 1 2

  32. Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 2 5 5 Node 1 1 2 3 5 2 6 5 5 7 6 Node 2 5 7 6 Node 3 7 5 6 6 8 7 6 7 Node 4 7 Node 5 9 9 9 7 Node 6 0 1 4 3 2 6 7 8 5 2 1 7 9 3 5 8 6 4 1 3 2

  33. Interleaved Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 5 5 2 5 Node 1 1 2 4 3 2 Node 2 5 5 6 7 6 5 5 Node 3 7 6 6 7 6 6 5 8 7 7 7 Node 4 7 6 8 9 Node 5 9 7 9 Node 6 9 3 9 1 2 0 6 7 4 8 5 4 2 1 7 9 3 5 8 6 2 1 3

  34. Interleaved Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 2 5 5 5 1 2 3 5 4 Node 1 6 5 2 Node 2 5 6 5 6 5 7 7 7 Node 3 6 6 7 5 6 6 7 7 7 8 7 6 Node 4 8 9 9 7 Node 5 9 9 8 9 Node 6 1 3 2 10 0 4 8 5 9 7 6 3 6 1 9 2 8 5 4 7 3 1 2 4

  35. Interleaved Linear Sorter System Sorted Output Inserted Tags Clock Cycles 2 5 5 5 Node 1 1 5 3 5 4 6 2 5 6 Node 2 5 2 6 7 6 5 7 5 6 6 6 7 8 6 Node 3 5 7 7 6 7 7 8 7 7 8 9 Node 4 7 9 9 Node 5 8 9 9 Node 6 9 10 7 2 6 0 1 11 3 4 8 5 9 3 4 8 6 9 1 2 7 5 3 4 1 2 5

  36. Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 2 1 5 5 5 2 6 5 4 5 7 Node 1 3 7 5 8 Node 2 5 5 7 5 6 6 2 6 9 6 6 7 5 7 6 Node 3 7 8 6 8 7 8 6 7 7 Node 4 7 9 7 9 8 9 Node 5 9 9 9 Node 6 3 0 12 1 11 7 10 4 5 8 6 2 9 3 7 6 2 1 5 8 4 9 5 2 1 3 4 6

  37. Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles Node 1 5 5 2 5 5 2 8 3 7 5 6 4 1 5 2 6 Node 2 5 5 8 5 7 7 6 6 9 6 5 7 6 6 7 8 7 Node 3 9 6 8 9 7 8 7 6 7 7 Node 4 9 9 9 9 8 Node 5 7 Node 6 9 0 4 2 13 3 5 1 7 8 9 10 11 12 6 5 7 4 8 6 9 2 1 3 4 6 5 3 2 1 7

  38. Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 5 5 1 2 5 7 3 2 4 5 6 8 9 Node 1 5 7 5 Node 2 5 2 6 5 6 9 5 6 8 7 8 7 6 6 6 9 5 6 7 7 Node 3 7 7 8 7 7 6 Node 4 9 8 9 Node 5 9 9 7 9 8 9 Node 6 4 2 5 1 3 0 10 12 11 14 9 8 7 6 13 1 5 7 4 9 6 8 3 2 6 7 1 3 4 2 5 8

  39. Extended Linear Sorter System Sorted Output Inserted Tags Clock Cycles 1 5 5 5 Node 1 2 5 9 7 2 6 5 3 4 8 7 2 5 5 6 5 5 9 Node 2 6 7 6 8 8 6 7 7 6 6 7 5 9 Node 3 6 8 Node 4 9 7 7 8 7 7 6 9 8 9 9 Node 5 7 9 Node 6 9 6 12 2 3 1 0 5 4 7 15 8 14 9 10 11 13 3 9 5 2 4 1 8 6 7 6 8 7 2 5 4 3 1 9

  40. Sorter Node Architecture

  41. Interleaved Linear Sorter

  42. Interleaved Linear Sorter • Increase throughput by using multiple linear sorters with parallel inputs • Interleave parallel inputs into linear sorters through modulo arithmetic • Distribute data evenly among linear sorters to avoid full conditions • Service each linear sorter in round-robin fashion to resort their outputs

  43. Interleaved Linear Sorter • ILS width = 4 • Four parallel inputs • {13, 04, 03, 10} • After interleaving mod 4 • {04, 13, 10, 03} • [00, 01, 02, 03] • But, what happens for inputs • {07, 05, 01, 06} ?

  44. Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 0 13 4 3 10

  45. Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 4 13 10 3 1 0 13 7 5 4 3 1 10 6 Inserted

  46. Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 4 5 13 6 10 7 3 1 0 2 13 7 4 5 1 3 1 6 10 Preempted Inserted

  47. Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 4 1 13 5 6 10 3 7 1 2 3 0 2 7 13 5 4 9 1 3 1 12 10 6 15 Preempted Inserted

  48. Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 4 12 5 9 13 1 6 10 2 7 3 15 2 3 4 1 0 11 13 7 2 0 9 5 4 3 1 14 1 12 8 6 10 15 Preempted Inserted

  49. Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 12 0 4 9 13 1 5 2 14 10 6 15 11 7 3 0 1 2 3 5 4 7 13 11 2 9 5 0 4 14 1 12 1 3 6 8 8 10 15 Preempted Inserted

  50. Interleaved Linear Sorter Clock Cycles Inserted Tags LS 0 LS 1 LS 2 LS 3 8 4 12 0 9 13 5 1 14 2 6 10 3 15 7 11 2 6 5 0 3 1 4 11 2 7 13 5 4 9 0 1 12 1 14 3 6 15 8 8 10 Preempted Inserted

More Related