Yuanyuan Sun Feiteng Yang

Using SIMD Registers and instructions to Enable Instruction-Level Parallelism in Sorting Algorithms Yuanyuan Sun Feiteng Yang

Source • Source ACM Symposium on Parallel Algorithms and Architectures Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures • Authors Timothy Furtak José Nelson Amaral Robert Niewiadomski

Outline • Introduction • Sorting network • Sorting algorithms • Experimental evaluation • Contributions

Introduction • Use SIMD resources to improve the performance of sorting algorithms for short sequence. • Initial inspiration: need for Fast sorting of short sequences implementation of Graphics rendering in interactive video game • SIMD machineries

Introduction SIMD machineries • X86-64’s SSE2 (Streaming SIMD Extensions 2) • G5’s AltiVec AltiVec,SSE2: SIMD instruction sets, both feature 128-bit vector registers

Sorting network • a comparator network produces a sorted output for any possible input sequence. • COMP(a, b) — the inputs are two storage units: memory locations, registers, or vector-register elements — a and b, each containing a numerical input.

Sorting network • Size: the total number of comparators in the network. • Depth: the length of the critical path in its dependence graph.

Sorting network

Sorting network • A comparator moves the larger value to the left, and the smaller value to the right. • For instance, Figure1 size=5,width=3; Inputs: a = 7, b = 2, c = 5, d = 9 Output: a = 9, b = 7, c = 5, d = 2.

Supporting hardware for Sorting Network • The comparator required by a sorting network is easily constructed using these two operations, a copy instruction, and a temporary variable. • Min and max instructions min(a, b) = a : a ≤ b b : otherwise max(a, b) = a : a ≥ b b : otherwise

Supporting hardware for Sorting Network • x86-64 architectures supports the SSE2 min and max operations that return the minimum (maximum) packed single-precision floating-point values.

Supporting hardware for Sorting Network • Width: the number of vectors being sorted. x86-64 has 16 XMM vector registers, and each register can hold 4 floating-point values. Sorting the values in n XMM registers using a sorting network produces 4 sorted streams of data of length n. 1 ≤ n < 16, one register must be reserved as temporary storage for the swap of values.

Three sorting methods • Two pass sorting with insertion sorting • Two pass sorting with merge sorting • One pass sorting (Register sorting)

Tow pass sorting • In the first phase the SIMD registers and instructions are used to generate a partially-sorted output. • In the second phase a standard sorting algorithm — insertion sort and mergesort are investigated in this paper — finishes the sorting.

First phase: SIMD sort Vector registers …… After SIMD sort:

Second phase • Insertion sort • Merge sort A1<A5<A9 A2<A6<A10 A3<A7<A11 A4<A8<A12 A1<A2<A3 A4<A5<A6 A7<A8<A9 A10<A11<A12

One pass sorting (Register sorting) • Algorithm input • Initial state • Align a set of comparators • Write values back to memory

4-elements example • P1={comp(a,c) comp(b,d)} • P2={comp(a,b) comp(c,d)} • P3={comp(b,c)}

One concrete example

SSE2 instructions used

The method is also applied to sort Key-pointer pairs and D-heaps.

Evaluation

Contributions • Effectively use SIMD resources to improve performance of sorting short sequence through the reduction of memory references and increases in ILP.

Contributions • 1.three algorithms that use the SIMD machinery for efficient in-register sorting of short sequences • 2.a method to use iterative-deepening search to find fast instruction sequences to move data within the SIMD registers • 3.an extensive experimental study that indicates the elimination of loads, stores, branches correlates well with improvement performance.

Yuanyuan Sun Feiteng Yang

Yuanyuan Sun Feiteng Yang

Presentation Transcript

Sun

Sun Yang Department of International Cooperation National Energy Administration

Sun

Sun

Sun

SUN

SUN

Sun

Conica, Cui Yuanyuan

Yuanyuan Zhao, Chunyang He, Yang yang Beijing Normal University, Beijing, China, 100875

—— Yuhang High School, Shen Yuanyuan

Sun

SUN

Sun

sun

Sun, glorious sun!!!

Sun

SUN