1 / 47

SSE2

SSE2. with a focus on floating point. Supported data types. For floating point (i.e., real numbers), MASM supports: real4 single precision; IEEE standard; analogous to float real8 double precision; IEEE standard; analogous to double real10 double extended precision Not IEEE standard

guido
Download Presentation

SSE2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SSE2 with a focus on floating point

  2. Supported data types • For floating point (i.e., real numbers), MASM supports: • real4 • single precision; IEEE standard; analogous to float • real8 • double precision; IEEE standard; analogous to double • real10 • double extended precision • Not IEEE standard • NaN = Not a Number (see p. 4-14 of v1)

  3. IEEE Standard 754 • SSE2 supports 32 and 64 bit f.p. data • x87 supports 32, 64, and 80 bit f.p. data

  4. Note: These are 24-bit binary numbers. Here they are in base 10: 2.00000000000000 1.99999988079071

  5. SSE2

  6. SSE2 • SSE2 = Streaming SIMD Extensions 2 • SIMD = Single Instruction Multiple Data instructions • SSE2 introduced in 2000 on Pentium 4 and Intel Xeon processors.

  7. History of SSE • 1996 Intel MMX • 1998 AMD 3DNow! • 1999 Intel SSE on P3 • 2001 Intel SSE2 on P4 • 2003 Intel SSE3 (since Prescott P4) • 2006 Intel SupplementalSSE3 (since Woodcrest Xeons) • 2006 Intel SSE4 (4.1 and 4.2) • 2007 AMD SSE5 (proposed 2007, implemented 2011) • 2008 Intel AVX (proposed 2008, implemented 2011 in Intel Westmere and AMD Bulldozer) • XMM registers go from 128 bit to 256 bit, called YMM.

  8. SSE2 and MASM • You must use MASM v6.15 or newer for SIMD support. (MASM v6.15 is available from the course software web page.) • You must enable MASM support for these instructions with the following: .686 ;instructions for Pentium Pro (or better) .xmm ;allow simd instructions .model flat, stdcall ;no crazy segments!

  9. SSE2 • Each one of the 8 128-bit registers (xmm0...xmm7) can hold: • 16 packed 1 byte integers • 8 packed word (2 byte) integers • 4 packed doubleword (4 byte) integers • 2 packed quadword (8 byte) integers • 1 double quadword (16 byte) • 4 packed single precision (4 bytes each) floating point values • 2 packed double precision (8 bytes each) floating point values

  10. Packed instruction example

  11. Packed instruction example

  12. Scalar instruction example

  13. Scalar instruction example

  14. IA32 Registers: • 8 32-bit GPRs • Integer only • 8 80-bit fp regs • Floating point only • 8 64-bit mmx regs • Integer only • Re-uses fp regs • 8 128-bit xmm regs • Integer and fp

  15. IA32 Registers: • 8 32-bit GPRs • Integer only • 8 80-bit fp regs • Floating point only • 8 64-bit mmx regs • Integer only • Re-uses fp regs • 8 128-bit xmm regs • Integer and fp

  16. IA32 Registers: • 8 32-bit GPRs • Integer only • 8 80-bit fp regs • Floating point only • 8 64-bit mmx regs • Integer only • Re-uses fp regs • 8 128-bit xmm regs • Integer and fp

  17. IA32 Registers: • 8 32-bit GPRs • Integer only • 8 80-bit fp regs • Floating point only • 8 64-bit mmx regs • Integer only • Re-uses fp regs • 8 128-bit xmm regs • Integer and fp • These will be the focus of our discussion.

  18. XMM register formats

  19. Using the SSE2 registers • The utilities.asm MASM code (on the course’s software web page) contains a function that you can call to display the contents of the 8 xmm registers (dump) as pairs of 64 bit double precision fp values. call dumpXmm64

  20. Sample SSE2 instructions • Data movement • Arithmetic • Comparison • Conversion

  21. Sample SSE2 instructions • Data movement • Arithmetic • Comparison • Conversion

  22. SSE2 data movement instructions • movhpd • Move High Packed Double-Precision Floating-Point Value • movlpd • Move Low Packed Double-Precision Floating-Point Value • movsd • Move Scalar Double-Precision Floating-Point Value

  23. SSE2 data movement instructions • movhpd - Move High Packed Double-Precision Floating-Point Value • for memory to XMM move: • DEST[127-64] ← SRC; DEST[63-0] unchanged • Ex. movhpd xmm0, m64 • for XMM to memory move: • DEST ← SRC[127-64] • Ex. movhpd m64, xmm2

  24. SSE2 data movement instructions • movlpd - Move Low Packed Double-Precision Floating-Point Value • for memory to XMM move: • DEST[127-64] unchanged; DEST[63-0] ← SRC • Ex. movlpd xmm1, m64 • for XMM to memory move: • DEST ← SRC[63-0] • Ex. movlpd m64, xmm2

  25. SSE2 data movement instructions • movsd - Move Scalar Double-Precision Floating-Point Value • when source and destination operands are both XMM registers: • DEST[127-64] remains unchanged; DEST[63-0] ← SRC[63-0] • Ex. movsd xmm1, xmm3 • when source operand is XMM register and destination operand is memory location: • DEST ← SRC[63-0] • Ex. movsd m64, xmm2 • when source operand is memory location and destination operand is XMM register: • DEST[127-64] ← 0000000000000000H; DEST[63-0] ← SRC • Ex. movsd xmm1, m64

  26. Sample SSE2 instructions • Data movement • Arithmetic (scalar) • Comparison • Conversion

  27. SSE2 scalar arithmetic instructions • addsd - Add Scalar Double-Precision Floating-Point Values • subsd - Subtract Scalar Double-Precision Floating-Point Values • mulsd - Multiply Scalar Double-Precision Floating-Point Values • divsd - Divide Scalar Double-Precision Floating-Point Values • Also sqrtsd but no sin or cos SSE2 instructions! We have to use the x87 instructions for that!

  28. SSE2 scalar arithmetic instructions • addsd • DEST[63-0] ← DEST[63-0] + SRC[63-0] • DEST[127-64] remains unchanged

  29. SSE2 scalar arithmetic instructions • subsd • DEST[63-0] ← DEST[63-0] − SRC[63-0] • DEST[127-64] remains unchanged

  30. SSE2 scalar arithmetic instructions • mulsd • DEST[63-0] ← DEST[63-0] * xmm2/m64[63-0] • DEST[127-64] remains unchanged

  31. SSE2 scalar arithmetic instructions • divsd • DEST[63-0] ← DEST[63-0] / SRC[63-0] • DEST[127-64] remains unchanged

  32. Sample SSE2 instructions • Data movement • Arithmetic (packed) • Comparison • Conversion

  33. SSE2 packed arithmetic instructions • addpd - Add Packed Double-Precision Floating-Point Values • subpd - Subtract Packed Double-Precision Floating-Point Values • mulpd - Multiply Packed Double-Precision Floating-Point Values • divpd - Divide Packed Double-Precision Floating-Point Values

  34. SSE2 packed arithmetic instructions • addpd - Add Packed Double-Precision Floating-Point Values • DEST[63-0] ← DEST[63-0] + SRC[63-0] • DEST[127-64] ← DEST[127-64] + SRC[127-64]

  35. SSE2 packed arithmetic instructions • subpd - Subtract Packed Double-Precision Floating-Point Values • DEST[63-0] ← DEST[63-0] / (SRC[63-0]) • DEST[127-64] ← DEST[127-64] / (SRC[127-64])

  36. SSE2 packed arithmetic instructions • mulpd - Multiply Packed Double-Precision Floating-Point Values • DEST[63-0] ← DEST[63-0] / (SRC[63-0]) • DEST[127-64] ← DEST[127-64] / (SRC[127-64])

  37. SSE2 packed arithmetic instructions • divpd - Divide Packed Double-Precision Floating-Point Values • DEST[63-0] ← DEST[63-0] / (SRC[63-0]) • DEST[127-64] ← DEST[127-64] / (SRC[127-64])

  38. Sample SSE2 instructions • Data movement • Arithmetic • Comparison • Conversion

  39. SSE2 compare instruction • comisd • Compare Scalar Ordered Double-Precision Floating-Point Values and Set EFLAGS

  40. Sample SSE2 instructions • Data movement • Arithmetic • Comparison • Conversion

  41. SSE2 conversion instructions • cvtsd2si • Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer • cvtsi2sd • Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value

  42. SSE2 conversion instructions • cvtsd2si • Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer • DEST[31-0] ← Convert_Double_Precision_Floating_Point_To_Integer(SRC[63-0])

  43. SSE2 conversion instructions • cvtsi2sd • Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value • DEST[63-0] ← Convert_Integer_To_Double_Precision_Floating_Point(SRC[31-0]) • DEST[127-64] remains unchanged

  44. The end! (We are only “scratching the surface.”)

More Related