vector units and quaternions l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Vector Units and Quaternions PowerPoint Presentation
Download Presentation
Vector Units and Quaternions

Loading in 2 Seconds...

play fullscreen
1 / 47

Vector Units and Quaternions - PowerPoint PPT Presentation


  • 279 Views
  • Uploaded on

Vector Units and Quaternions. Jim Van Verth Red Storm Entertainment jimvv@redstorm.com. About This Talk. Will discuss how to do quaternion math on PS2 Assume that you already know and want to use quaternions Assume that you already know something about how the VU works. About Me.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Vector Units and Quaternions' - Olivia


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
vector units and quaternions

Vector Units and Quaternions

Jim Van Verth

Red Storm Entertainment

jimvv@redstorm.com

about this talk
About This Talk
  • Will discuss how to do quaternion math on PS2
  • Assume that you already know and want to use quaternions
  • Assume that you already know something about how the VU works
about me
About Me
  • Lead engineer at Red Storm Entertainment
  • Not a quaternion god
  • Not a vector unit god
  • Not really familiar with VCL
  • Just a 3D guy trying to get by…
about the code
About the code
  • Most examples written in macro mode (VU0)
  • Easy to translate to micro mode
  • Examples that would be faster in micro mode are discussed separately
matrices on ps2
Matrices on PS2
  • PS2 is really well set up to do matrices
  • Multiplies are highly parallel
  • Not so good for quaternions
matrix multiply
This is what we’re up against

Takes 4/7 cycles to transform a point

Takes 16/19 cycles to concat matrices (9/12 cycles for 3x3 matrix)

Matrix Multiply

vmulax ACC, vf2, vf1x

vmadday ACC, vf3, vf1y

vmaddaz ACC, vf4, vf1z

vmaddw vf6, vf5, vf1w

why quaternions
Why Quaternions?
  • Quaternions take up less space: 4 floats vs. 9 (best case)
  • Quaternions interpolate well
  • Avoid floating point drift (normalize vs. Gram-Schmidt orthogonalization)
quaternions on vu
Quaternions on VU
  • Fit very well
  • Four floats, aligned to 16-bit boundary
  • Work just like homogeneous point
  • Make sure stored (x,y,z,w) not (w,x,y,z)
quaternion multiplication
Quaternion Multiplication
  • If quaternion is (x, y, z,w) or (v, w) then
  • All standard vector operations
    • Add, scale, dot product, cross product
quaternion mult on ps2
Interleaves dot product and rest via accumulator

Takes advantage of linearity of cross product

Cycle count: 8/11

Less than matrix!

Quaternion Mult on PS2

vmul vf3, vf1, vf2

vopmula.xyz acc, vf1, vf2

vmaddaw.xyz acc, vf2, vf1w

vmaddaw.xyz acc, vf1, vf2w

vopmsub.xyz vf3, vf2, vf1vsubaz.w acc, vf3, vf3z

vmsubax.w acc, vf0, vf3x

vmsuby.w vf3, vf0, vf3y

w= w1·w2v1 • v2

v = w1·v2 + w2·v1 + v1v2

vector rotation
Vector Rotation
  • Formula for vector rotation:
  • Two mults takes 16 cycles, plus the inverse
  • Can do better
vector rotation take two
Vector Rotation, Take Two
  • If q is normalized, then can do:
  • This is faster than two straight multiplies on serial processor
  • Faster on vector processor, too!
vector rotation on vu
pin vf1, q in vf2Vector Rotation on VU

vmul vf11, vf1, vf2

vopmula.xyz acc, vf2, vf1

vopmsub.xyz vf5, vf1, vf2

vmul.w vf6w, vf2w, vf2w

vadd.w vf7w, vf2w, vf2wvmulax.w accw, vf0w, vf11x

vmadday.w accw, vf0w, vf11y

vmaddz.w vf11w, vf0w, vf11z

vopmula.xyz acc, vf2, vf5

vmaddaw.xyz acc, vf5, vf7w

vmaddaw.xyz acc, vf1, vf6w

vmaddaw.xyz acc, vf2, vf11w

vopmsub.xyz vf3, vf5, vf2

p =(vp)·v

+ w2·p

+ 2w·(vp)

+ v(vp)

vector rotation on vu14
First part builds all the pieces

Second part adds ‘em all together

Cycles: 13/16

Better than straight multiply

Worse than matrix

Vector Rotation on VU

vmul vf11, vf1, vf2

vopmula.xyz acc, vf2, vf1

vopmsub.xyz vf5, vf1, vf2

vmul.w vf6w, vf2w, vf2w

vadd.w vf7w, vf2w, vf2wvmulax.w accw, vf0w, vf11x

vmadday.w accw, vf0w, vf11y

vmaddz.w vf11w, vf0w, vf11z

vopmula.xyz acc, vf2, vf5

vmaddaw.xyz acc, vf5, vf7w

vmaddaw.xyz acc, vf1, vf6w

vmaddaw.xyz acc, vf2, vf11w

vopmsub.xyz vf3, vf5, vf2

full transforms
Full Transforms
  • Combination of translation vector t, quat r, 3 scale factors s
  • Once again, want to transform point
  • Basic formula:
point transformation
pin vf1, q in vf2

scale in vf3

translation in vf4

Takes four extra cycles for scale (including stalls), one extra for xlate

Cycle count: 18/21

Point Transformation

vmul vf1, vf1, vf3

vmul vf11, vf1, vf2

vopmula.xyz acc, vf2, vf1

vopmsub.xyz vf5, vf1, vf2

vmul.w vf6w, vf2w, vf2w

vadd.w vf7w, vf2w, vf2w vmulax.w accw, vf0w, vf11x

vmadday.w accw, vf0w, vf11y

vmaddz.w vf11w, vf0w, vf11z

vopmula.xyz acc, vf2, vf5

vmaddaw.xyz acc, vf5, vf7w

vmaddaw.xyz acc, vf1, vf6w

vmaddaw.xyz acc, vf2, vf11w

vmaddaw.xyz acc, vf4, vf0w

vopmsub.xyz vf3, vf5, vf2

transform concatenation
Transform Concatenation
  • Look at formula:
  • Have to transform point and multiply two quaternions and multiply scales
transform concatenation18
Transform Concatenation
  • Takes 8 cycles for quat multiply, 18 for transform, 1 for scale
  • Have three stall cycles available
  • Bottom line: 24/27 cycles
  • Much slower than matrix multiplication
  • Not recommended
matrix conversion
Matrix Conversion
  • Quat-vector transformation not as efficient as matrix-vector transformation (13 cycles vs. 4)
  • To do multiple points, want to convert quaternion to a 4x4 matrix
matrix conversion20
Matrix Conversion
  • Corresponding 4x4 matrix to normalized quat q = (x,y,z,w) is:
  • Not obvious how to do this efficiently
matrix conversion21
Matrix Conversion
  • Two approaches
  • One works well in macro mode
  • One in micro mode
    • uses Lower instructions to achieve better parallelism
matrix conversion macro
Matrix Conversion (macro)
  • Idea: matrix is built from two other matrices
matrix conversion macro23
Matrix Conversion (macro)
  • Simplification: matrix multiply is series of row vector multiplies
  • Create right matrix, generate left matrix via accumulator tricks
matrix conversion macro24
Matrix Conversion (macro)
  • Look at one row in matrix multiply:

vmulax ACC, vf5, vf1x

vmadday ACC, vf6, vf1y

vmaddaz ACC, vf7, vf1z

vmaddw vf9, vf8, vf1w

  • Or could just do:

vmulaw ACC, vf8, vf1w

vmadday ACC, vf6, vf1y

vmaddaz ACC, vf7, vf1z

vmaddx vf9, vf5, vf1x

  • Is linear, so order doesn’t matter
matrix conversion macro25
Matrix Conversion (macro)
  • Idea: all values we need for left matrix are in quaternion
  • Load accumulator with mula by w value (always positive)
  • vmadd or vmsub to multiply by positive or negative value and accumulate

vmulaw.xyz acc, vf2, vf5w

vmaddax.xyz acc, vf3, vf5x

vmadday.xyz acc, vf4, vf5y

vmsubz.xyz vf13, vf1, vf5z

matrix conversion macro26
Matrix Conversion (macro)
  • More simplification:
    • Last row of Mq always (0,0,0,1), don’t compute!
    • Last column always 0 too, don’t compute!
    • Last row of Rq just the quat in VU format
  • Just build:
matrix conversion macro27
vaddw.x vf1, vf0, vf4

vaddz.y vf1, vf0, vf4

vsuby.z vf1, vf0, vf4

vsubz.x vf2, vf0, vf4

vaddw.y vf2, vf0, vf4

vaddx.z vf2, vf0, vf4

vaddy.x vf3, vf0, vf4

vsubx.y vf3, vf0, vf4

vaddw.z vf3, vf0, vf4

vmr32.w vf12, vf0

vmr32.w vf13, vf0

vmr32.w vf14, vf0

Stage one:

Load quat in vf4

Build right matrix

Clear right column of result

vf1=(w,z,-y,~)

vf2=(-z,w,x,~)

vf3=(y,-x,w,~)

vf4=(x,y,z,w)

Matrix Conversion (macro)
matrix conversion macro28
vmulaw.xyz acc, vf1, vf4w

vmaddaz.xyz acc, vf2, vf4z

vmsubay.xyz acc, vf3, vf4y

vmaddx.xyz vf12, vf4, vf4x

vmulaw.xyz acc, vf2, vf4w

vmaddax.xyz acc, vf3, vf4x

vmadday.xyz acc, vf4, vf4y

vmsubz.xyz vf13, vf1, vf4z

vmulaw.xyz acc, vf3, vf4w

vmaddaz.xyz acc, vf4, vf4z

vmadday.xyz acc, vf1, vf4y

vmsubx.xyz vf14, vf2, vf4x

vmove.xyzw vf15, vf0

Stage two:

Matrix multiply to get first three rows

Clear bottom row

Note: accumulate only on xyz (w already cleared)

Cycles: 25/28

Matrix Conversion (macro)
matrix conversion micro
Matrix Conversion (micro)
  • Lots of duplicate calculations in matrix
  • Idea: calculate only what we need, use shifting and accumulator tricks to parallelize efficiently
  • Devised by Colin Hughes of SCEE
matrix conversion micro30
mula acc, vf1, vf1 loi SQRT_2

muli vf3, vf1, Imr32.w vf24, vf0

madd vf2, vf1, vf1 nop

addw vf4, vf0, vf0w nop

opmula acc, vf3, vf3move vf27, vf0

msubw vf5, vf3, vf3wmr32.w vf26, vf0

maddw vf6, vf3, vf3wmr32.w vf25, vf0

addaw.xyz acc, vf0, vf0w nop

msubax.yz acc, vf4, vf2x nop

msuby.z vf26, vf4, vf2ymr32 vf3, vf5

msubay.xz acc, vf4, vf2ymr32 vf7, vf6

msubz.y vf25, vf4, vf2z mr32.y vf24, vf5

msubz.x vf24, vf4, vf2z mr32.x vf26, vf5

addy.z vf24, vf0, vf6y mr32.z vf25, vf3

addx.y vf26, vf0, vf6x mr32.x vf25, vf7

Three parts

Calculate elements

Clear matrix

Shift, add and copy into place

16/19 cycles

Matrix Conversion (micro)
matrix conversion31
Matrix Conversion
  • If you’re converting a quaternion and going to use it immediately, can make some assumptions
  • Don’t create bottom row (just use vf0)
  • Don’t clear right column (just use xyz)
  • Saves four cycles in macro mode case
transform to matrix
Transform to Matrix
  • Use one of the quaternion matrix techniques
  • Scale first three rows by each scale factor
  • Replace last row with translation
  • Results:
    • 29/32 for macro mode
    • 20/23 for micro mode
normalization
Normalization
  • Need to normalize quaternion to keep it useful for rotation
    • (Also avoids floating point drift)
  • Fortunately PS2 has reciprocal square root instruction
  • Unfortunately it takes a while
normalization34
vmul vf2, vf1, vf1

vaddaz.w acc, vf2, vf2

vmaddax.w acc, vf0, vf2

vmaddy.w vf2, vf0, vf2

vrsqrt Q, vf0w, vf2w

vwaitq

vmulq vf1, vf1, Q

Compute dot product

Compute 1/length

Scale quaternion

With stalls, takes 24/27 cycles

Normalization
normalization35
Normalization
  • Another approach
    • From “The Inner Product”, March 2002 Game Developer by Jonathan Blow
    • Approximate 1/x via Newton-Raphson iteration
    • First iteration takes (looks like) 4/7 cycles on VU0
    • Second iteration takes as long as RSQRT
    • Recommend: if x > 0.91521198, use approx
    • Otherwise use RSQRT
interpolation
Interpolation
  • This is where it’s at
  • It would be great if it was fast
  • Um, well…
interpolation37
Interpolation
  • First look at spherical linear interp
  • That’s a lot of sines
  • Could precompute , 1/sin 
  • But at least 28 cycles for one of the other sines
  • We (RSE) don’t use slerp anyway
interpolation38
Interpolation
  • Lerp, then
  • is simply(q in vf1, r in vf2, t in vf3w)
    • vaddax acc, vf1, vf0x
    • vmsubaw acc, vf1, vf3w
    • vmaddw vf1, vf2,vf3w
  • Need to normalize afterwards
  • Makes 30/33 cycles
interpolation39
Interpolation
  • Not quite that simple
  • Problem: if q•r < 0, interpolation will take long way around sphere
  • Need to negate one quat
  • Gives the same orientation, but the interpolation takes the short route
linear interpolation
vmul vf4, vf1, vf2

vaddaz.w acc, vf04, vf4

vmaddax.w acc, vf00, vf4

vmaddy.w vf4, vf00, vf4

vnop

vnop

vnop

cfc2 t0,$16

and t0,t0,0x0002

vaddax acc, vf1, vf0x

beq t0,zero,Add

vmsubaw acc, vf2,vf3w

b Finish

Add:vmaddaw acc, vf2,vf3w

Finish:vmsubw vf1, vf1, vf3w

Compute dot product

Check for negative

Interpolate

Follow up with normalization

Takes 43/46 cycles

Linear Interpolation
linear interpolation41
Linear Interpolation
  • There’s more we can do
  • Jonathan Blow’s article, again
  • Use spline to correct error in lerp
  • More investigation needed
  • Initial results: takes about 24-26 more cycles
  • Looks faster than slerp, more accurate than lerp
how we re using all this
How We’re Using All This
  • A bit research-y at the moment
  • VU0-based math library
  • Optimization in specific routines
  • In particular, concatenation and interpolation for bones animation
  • More memory savings: store quat as 4.12 fixed-point shorts
conclusions
Conclusions
  • Quaternions useful on PS2
  • Cheaper to concatenate (alone)
  • Convert to matrix to transform
  • Use linear interpolation
  • Check out Jonathan Blow’s article
references
References
  • Shoemake, Ken, “Animating Rotation with Quaternion Curves,” Computer Graphics, Vol. 19, No. 3 (July 1985).
  • EE Core Instruction Set Manual
  • VU User’s Manual
  • Sony newsgroups
  • Blow, Jonathan, “Hacking Quaternions,” Game Developer, Vol. 9, No. 3 (March 2002). [get updated source from www.gdmag.com/code.htm]
slide46
Please hand in comment sheets
  • Slides available at:

http://obiwan.redstorm.com/~jimvv

vector units and quaternions47

Vector Units and Quaternions

Jim Van Verth

Red Storm Entertainment

jimvv@redstorm.com