1 / 26

# What is the WHT anyway, and why are there so many ways to compute it? - PowerPoint PPT Presentation

What is the WHT anyway, and why are there so many ways to compute it?. Jeremy Johnson. 1, 2, 6, 24, 112, 568, 3032, 16768,…. Walsh-Hadamard Transform. y = WHT N x, N = 2 n. WHT Algorithms. Factor WHT N into a product of sparse structured matrices Compute: y = (M 1 M 2 … M t )x

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' What is the WHT anyway, and why are there so many ways to compute it?' - francis-floyd

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### What is the WHT anyway, and why are there so many ways to compute it?

Jeremy Johnson

1, 2, 6, 24, 112, 568, 3032, 16768,…

• y = WHTN x, N = 2n

WHT Algorithms compute it?

• Factor WHTN into a product of sparse structured matrices

• Compute: y = (M1 M2 … Mt)x

yt = Mtx

y2 = M2 y3

y = M1 y2

Factoring the WHT Matrix compute it?

• AC Ä BD = (A Ä B)(C Ä D)

• A Ä B = (A Ä I)(I Ä B)

• A Ä (BÄ C) = (A Ä B)Ä C

• ImÄ In = Imn

WHT2Ä WHT2 = (WHT2 Ä I2)(I2 Ä WHT2)

Recursive and Iterative Factorization compute it?

WHT8= (WHT2 Ä I4)(I2 Ä WHT4)

• = (WHT2 Ä I4)(I2 Ä ((WHT2 ÄI2) (I2 Ä WHT2)))

• = (WHT2 Ä I4)(I2 Ä (WHT2 ÄI2)) (I2 Ä (I2 Ä WHT2))

• = (WHT2 Ä I4)(I2 Ä (WHT2 ÄI2)) ((I2 Ä I2) Ä WHT2)

• = (WHT2 Ä I4)(I2 Ä WHT2 ÄI2) ((I2 Ä I2) Ä WHT2)

= (WHT2 Ä I4)(I2 Ä WHT2 ÄI2) (I4 Ä WHT2)

Recursive Algorithm compute it?

WHT8 = (WHT2 I4)(I2  (WHT2  I2)(I2  WHT2))

é compute it?

ù

1

1

1

1

1

1

1

1

ê

ú

-

-

-

-

1

1

1

1

1

1

1

1

ê

ú

ê

ú

-

-

-

-

1

1

1

1

1

1

1

1

ê

ú

-

-

-

-

1

1

1

1

1

1

1

1

ê

ú

=

ê

ú

-

-

-

-

1

1

1

1

1

1

1

1

ê

ú

-

-

-

-

1

1

1

1

1

1

1

1

ê

ú

ê

ú

-

-

-

-

1

1

1

1

1

1

1

1

ê

ú

-

-

-

-

ê

ú

1

1

1

1

1

1

1

1

ë

û

é

ù

é

ù

é

ù

1

1

1

1

1

1

ê

ú

ê

ú

ê

ú

-

1

1

1

1

1

1

ê

ú

ê

ú

ê

ú

ê

ú

ê

ú

ê

ú

-

1

1

1

1

1

1

ê

ú

ê

ú

ê

ú

-

-

1

1

1

1

1

1

ê

ú

ê

ú

ê

ú

ê

ú

ê

ú

ê

ú

-

1

1

1

1

1

1

ê

ú

ê

ú

ê

ú

-

-

1

1

1

1

1

ê

ú

ê

ú

ê

ú

ê

ú

ê

ú

ê

ú

-

-

1

1

1

1

1

1

ê

ú

ê

ú

ê

ú

-

-

-

ê

ú

ê

ú

ê

ú

1

1

1

1

1

1

ë

û

ë

û

ë

û

Iterative Algorithm

WHT8 = (WHT2 I4)(I2  WHT2  I2)(I4  WHT2)

WHT Algorithms compute it?

• Recursive

• Iterative

• General

n compute it?

+ ··· +

n

n

+ ··· +

n

i

+1

t

1

i

-

1

WHT Implementation

• Definition/formula

• N=N1* N2NtNi=2ni

• x=WHTN*x x =(x(b),x(b+s),…x(b+(M-1)s))

• Implementation(nested loop)

R=N; S=1;

for i=t,…,1

R=R/Ni

forj=0,…,R-1

for k=0,…,S-1

S=S* Ni;

M

b,s

t

Õ

)

Ä

Ä

(

I

I

WHT

WHT

=

n

n

2

2

2

2

i

i

=

1

9 compute it?

4

4

3

4

2

1

1

3

3

4

1

2

1

1

1

2

2

1

1

1

1

4

1

1

1

1

1

1

2

2

1

1

1

1

Partition Trees

Left Recursive

Right Recursive

Balanced

Iterative

Ordered Partitions compute it?

• There is a 1-1 mapping from ordered partitions of n onto (n-1)-bit binary numbers.

• There are 2n-1 ordered partitions of n.

162 = 1 0 1 0 0 0 1 0

1|1 1|1 1 1 1|1 1  1+2+4+2 = 9

3 compute it?

2

1

1

1

Enumerating Partition Trees

00

01

01

3

3

3

2

1

2

1

1

1

10

10

11

3

3

1

2

1

1

1

Search Space compute it?

• Optimization of the WHT becomes a search, over the space of partition trees, for the fastest algorithm.

• The number of trees:

Size of Search Space compute it?

• Let T(z) be the generating function for Tn

Tn = (n/n3/2), where =4+8  6.8284

• Restricting to binary trees

Tn = (5n/n3/2)

WHT Package compute it?Püschel & Johnson (ICASSP ’00)

• Allows easy implementation of any of the possible

WHT algorithms

• Partition tree representation

W(n)=small[n] | split[W(n1),…W(nt)]

• Tools

• Measure runtime of any algorithm

• Measure hardware events

• Search for good implementation

• Dynamic programming

• Evolutionary algorithm

Histogram (n = 16, 10,000 samples) compute it?

• Wide range in performance despite equal number of arithmetic operations (n2n flops)

• Pentium III consumes more run time (more pipeline stages)

• Ultra SPARC II spans a larger range

Operation Count compute it?

Theorem. Let WN be a WHT algorithm of size N. Then the number of floating point operations (flops) used by WN is Nlg(N).

Proof. By induction.

Instruction Count Model compute it?

• A(n) = number of calls to WHT procedure

• = number of instructions outside loops

Al(n) = Number of calls to base case of size l

•  l = number of instructions in base case of size l

• Li = number of iterations of outer (i=1), middle (i=2), and

• outer (i=3) loop

• i = number of instructions in outer (i=1), middle (i=2), and

• outer (i=3) loop body

Small[1] compute it?

.file "s_1.c"

.version "01.01"

gcc2_compiled.:

.text

.align 4

.globl apply_small1

.type apply_small1,@function

apply_small1:

movl 8(%esp),%edx //load stride S to EDX

fldl (%eax) // st(0)=R7=x[0]

fldl (%eax,%edx,8) //st(0)=R6=x[S]

fld %st(1) //st(0)=R5=x[0]

fxch %st(2) //st(0)=R5=x[0],s(2)=R7=x[0]+x[S]

fsubp %st,%st(1) //st(0)=R6=x[S]-x[0] ?????

fxch %st(1) //st(0)=R6=x[0]+x[S],st(1)=R7=x[S]-x[0]

fstpl (%eax) //store x[0]=x[0]+x[S]

fstpl (%eax,%edx,8) //store x[0]=x[0]-x[S]

ret

Recurrences compute it?

Recurrences compute it?

Histogram using Instruction Model (P3) compute it?

 l = 12,  l = 34, and  l = 106

 = 27

1 = 18, 2 = 18, and 1 = 20

Dynamic Programming compute it?

n

Cost(

),

min

n1+… + nt= n

T

T

nt

n1

where Tn is the optimial tree of size n.

This depends on the assumption that Cost only depends on

the size of a tree and not where it is located.

(true for IC, but false for runtime).

For IC, the optimal tree is iterative with appropriate leaves.

For runtime, DP is a good heuristic (used with binary trees).

Optimal Formulas compute it?

UltraSPARC

[1], [2], [3], [4], [5], [6]

[[3],[4]]

[[4],[4]]

[[4],[5]]

[[5],[5]]

[[5],[6]]

[[4],[[4],[4]]]

[[[4],[5]],[4]]

[[4],[[5],[5]]]

[[[5],[5]],[5]]

[[[5],[5]],[6]]

[[4],[[[4],[5]],[4]]]

[[4],[[4],[[5],[5]]]]

[[4],[[[5],[5]],[5]]]

[[5],[[[5],[5]],[5]]]

Pentium

[1], [2], [3], [4], [5], [6]

[7]

[[4],[4]]

[[5],[4]]

[[5],[5]]

[[5],[6]]

[[2],[[5],[5]]]

[[2],[[5],[6]]]

[[2],[[2],[[5],[5]]]]

[[2],[[2],[[5],[6]]]]

[[2],[[2],[[2],[[5],[5]]]]]

[[2],[[2],[[2],[[5],[6]]]]]

[[2],[[2],[[2],[[2],[[5],[5]]]]]]

[[2],[[2],[[2],[[2],[[5],[6]]]]]]

[[2],[[2],[[2],[[2],[[2],[[5],[5]]]]]]]

Different Strides compute it?

• Dynamic programming assumption is not true. Execution time depends on stride.