1 / 19

# Calculating Stack Distances Efficiently - PowerPoint PPT Presentation

Calculating Stack Distances Efficiently. George Almasi,Calin Cascaval,David Padua {galmasi,cascaval,padua}@cs.uiuc.edu. This talk is about: Algorithms to calculate stack distance histograms Speed/memory optimization of trace analysis to create stack distance histogram.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Calculating Stack Distances Efficiently' - lawson

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Calculating Stack Distances Efficiently

Algorithms to calculate stack distance histograms

Speed/memory optimization of trace analysis to create stack distance histogram

why stack distance histograms are/are not useful

relative merits of inter-reference distance vs. stack distance

speed/memory optimization of applications

What this talk is, and is not, about

• Inter-reference distance:

• the number of other references between two references to the same address in the trace

• Stack distance:

• The number of distinct addresses referred between two references to the same address

Inter-ref distance = 7

stack distance = 4

a b c d b c d e a

hits(C) =  s()

=1

Inf

misses(C) =  s()

=C+1

Stack Distances As Cache Misses

• compute the number of cache hits and misses as follows:

• Given that at time t ref(t)=x

• find t0, time of last previous reference to x

• inter reference distance:

• Efficient implementation: a (hash)table

H(x) = t0, the trace index of the last reference to x;

Memory usage ~ 2x original program

Cost O(1) per reference

1

3

Depth(x)

a

h

x

x

b

b

a

h

h

x

b

c

a

a

h

d

c

b

b

a

e

d

c

c

c

e

f

d

d

d

f

e

e

...

e

f

f

...

f

x

...

...

...

y

x

y

z

y

y

y

u

z

z

z

z

v

u

u

u

u

v

v

v

v

• Simulates an infinite cache with LRU replacement policy

• nice properties (inclusion!)

• naïve implementation: stack as linked list/array

• m = 250,000 average maximum stack depth

• list traversal/array updates; O(m) per trace element

Insight: stack is contained in trace

Time

Trace

a

b

b

g

g

e

d

f

z

f

c

e

b

c

d

a

g

Time=t

Stack

g

z

f

e

b

c

d

a

g

Stack top

• Index tx in the trace is a hole if ref(tx) has already been referenced again at a later time ty < t.

• Using holes, we can say

• stackdist(t) = refdist(t) - #holes(t0 to t)

• How many holes are there between t0 and t?

t

t0

...

o

o

o

o

a

o

o

o

a

ref to a

Prev. ref to a

k:k

k+4:k+5

k+2:k+3

• Single tree operation: count_and_add (t0)

• Determines # of holes between t0 and t; adds a new hole at t0

• Adding a hole can create a new interval - or fuse two existing ones

p=n+1

Create new interval:

p > n+1

Join two intervals:

p = n+1

k:n

k:n

k:n

n+2:p

k:n+1

k:n+1

k:p

p:p

tree is pre-allocated

binary, balanced

each node contains a number: the number of holes in its right subtree

memory used by node depends on node’s depth

a modified version of the B&K algorithm:

better memory usage

Pre-allocated hole trees

a

b

b

g

e

d

f

z

f

c

e

b

c

d

a

1

0

1

0

1

0

0

0

1

1

0

0

3

0

1

n

n

n=n+1

count += n

A: Holes need 1/2 the maintenance of stack elements.

Q: Will the interval tree grow to ?

A: No. Intervals fuse together spontaneously.

Q: How big will the tree be?

A: #of intervals = O(stack depth)

Depth of a tree of stack elements would be the same size

Q: Will the tree be unbalanced?

A: Yes, because it tends to grow on one side.

Many Questions

A: RB and AVL

Q: Which is better?

A: AVL is better.

Q: Why?

A:

shorter average tree height: h+1 vs. 2h

not all operations change the tree structure

More questions

exec time O(log(m))

memory usage O(m)

AVL better than RB

Pre-allocated trees:

exec time O(log(n))

memory usage O(n)

hits practical limit

holes are better

reduced maintenance

no pointer chasing, good locality

Comparisons

• Stack distances with holes:

• using RB/AVL interval trees

• using pre-allocated trees

• Using holes reduces linear overhead by 20-40% for both kinds of algorithms.