Finding Reference Affinity Groups in Traces Using Sampling

Finding Reference Affinity Groups in Traces Using Sampling Chengliang Zhang Yutao Zhong, Chen Ding, Mitsunori Ogihara University of Rochester 08/22/2004 TDM'2004

Computer Memory Hierarchy TDM'2004

Why Reference Affinity • Reference affinity measures how closely data in a group are accessed together in an execution • Previous research gets 12%speedup on Pentium IV PC machines using structure splitting and array regrouping based on the result of strict reference affinity analysis. • Weak reference affinity is a generalized probabilistic version TDM'2004

Outline • Weak Reference affinity model • Data element, Trace, Reuse distance • Weak reference affinity and its properties • A sampling method • Comparisons with k-distance analysis TDM'2004

Trace and Reuse Distance • A data element: a memory cell, file or block • A trace: a sequence of accesses to data elements. • π : Logical time→data element • Inverse function Г: data element →logical times • Reuse distance δ(i,j) : Number of distinct data elements between two logical times. • …ABBBC…DEFGH…IJJKL 2 4 3 TDM'2004

Weak Reference Affinity • Given a trace, k andθ, a group G of data elements is a weak reference affinity group if and only if: • 1: For any x, y∈G, • (a) Either at least θ proportion of the occurrences of x is k-linked to one occurrence of y relative to G. • (b) Or they are connected by the transitive closure of (a). • 2: No proper superset of G has this property. TDM'2004

Properties of Weak Reference Affinity • <k,θ> is a unique partition of data elements • Different <k,θ> form a lattice of finer partition relationships. Finer partition TDM'2004

…x…y…x…y…y…x…x… Properties of Weak Reference Affinity • Given a trace and a weak affinity group G at link length k and thresholdθ, for any x∈G, there exists a y∈G, such that there are more than |Г(x)|θsections of trace such that: • these sections include x and y at the two ends; • the reuse distance of every section is within k(|G|-1). • This property is the basis of our sampling method. ≤ k(|G|-1) TDM'2004

Verification Section • Verification section for x and y: • x and y are at its two ends; • its reuse distance is within k(|G|-1). • Critical verification section from x to y: the shortest one among those verification sections that include t and y at their two ends, andπ(t)=x. • There are more than |Г(x)|θ critical verification sections from x to y . TDM'2004

trace Critical verification section window Sampling Method • Sliding window with size n: a section of trace containing accesses to at most n distinct data elements. • Sampling method: • Estimate the upper bound for group size, suppose g. • Pick up sliding windows of size 2gk by sampling. • Compute confidence(x,y)= #windows having x and y min(#windows having x, #windows having y) • If confidence(x,y)>θ/2, then x and y are in the same group. TDM'2004

k-distance Analysis [Zhong+’04] • Get the reuse signature as histogram • Compute the Manhattan distance for every two data elements • If the distance is smaller than kB, then they belong to the same group x y TDM'2004

Experiment Setup • Synthetic trace generator TDM'2004

#correctly predicted groups #actual groups Evaluation Criteria • Match Rate: • Match rate = • Accuracy: • Suppose group G is separated into parts P1, P2, …, Pn and scattered into algorithm detected groups G1, G2, …, Gn. • Define the accuracy to be the average of accuracy of each group TDM'2004

Comparison - Weakness TDM'2004

Comparison - k TDM'2004

Comparison - Scalability TDM'2004

Related Work • Compiler Analysis [Thabit’81][Chilimbi’01][Ding&Zhong’03][Ding&Zhong’04] • Web system [Chinen&Yamaguchi’97][Duchamp’99][Pitkow&Pirolli’99][Su+’00] [Yang&Zhang’01] • File system [Zhou+’01][Jiang&Zhang02][Jiang&Zhang04][Chen+’04] • Frequent sequence mining [Agrawal&Srikant’95][Mannila+’97] [Han+’00] [Chudova&Smyth’02] [Pei+’02] [Hirao+’03] TDM'2004

Summary • A weak reference affinity model • <k,θ> is a unique partition • Different <k,θ> forms a lattice • A sampling method • Better in term of accuracy and scalability. TDM'2004

Thanks!

Example of Strict Reference Affinity w x w x uy z…z y z yv w x w x … w x w x u y z…z y z y v w x w x … • k = 2, affinity group {w, x, y, z} • k = 1, affinity groups {w, x} and {y, z} TDM'2004

Difference between RAA and FSM • Reference affinity analysis allows: • Repeated cases within the patterns • ABBBC…ABC…ABCC… • variations within the patterns • …ADBEC…AFBC…ABGC… • The order of the sequence does not matter • …ABC…ACB…BAC… TDM'2004

Strong Reference Affinity[Zhong+’04] • Given a trace and k, a group G of data elements is a strict reference affinity group if and only if: • 1. For any x and y in G and for any i∈Г(x), there exists a j∈Г(y), where i and j are k-linked relative to the group G. • 2. There does not exist G', such that G G' and G' also satisfies condition (1). TDM'2004

Properties of Strong Reference Affinity • <k> is a unique partition of data elements • <k> is a finer partition of <k'> when k<k'. • Strong reference affinity gives a hierarchical partition of the data elements. Finer partition TDM'2004

k-link path relative to G • There is a k-linked path between logical time i and j(suppose i<j) relative to data element group G iff there exists a list of logical time t1, t2, …, tn, such that • i< t1< t2<…<tn <j; • π(i), π(t1),…π(tn),π(j) ∈ G are distinct • δ(i, t1), δ(t1 , t2), …,δ(tn , j) ≤ k TDM'2004

Example of k-linked path • Logical time: • Trace : 2 1 2 2 k=2, G={A, C, D, F, H}, i=1, j=8 TDM'2004

Effect of Sample size TDM'2004

Strict vs. Weak Reference Affinity • Given a k, strict reference affinity groups forms a finer partition of any weak reference affinity groups atθand k. • Even whenθ =1,weak reference affinity groups may not be strict reference affinity groups. • Example: …XAWBYCWDZ…XEWFYGWHZ Whenθ =1 and k=2, {XWYZ} belongs to the same weak reference affinity group, but not a strict one. Strict reference affinity groups: {X}, {Z},{W,Y}. TDM'2004

Finding Reference Affinity Groups in Traces Using Sampling

Finding Reference Affinity Groups in Traces Using Sampling

Presentation Transcript

Major Consumer Reference Groups

Hubs and Affinity Groups

Finding Orthologous Groups

Reference Groups and Family Influences

Using Groups

A Hierarchical Model of Reference Affinity

A Stable System Clock Generator Using Reference Clock Sampling

Using Groups

Multiple Species Gene Finding using Gibbs Sampling

Gibbs sampling for motif finding

Finding Word Groups …

Using Groups in LMS Courses

Cluster Analysis Finding Groups in Data

Using Data from Digital Traces

MEGAN POLICY REFERENCE GROUPS

Motif finding with Gibbs sampling

Gibbs Sampling in Motif Finding

Role of State Reference Groups

Using Groups in Academic Advising

Using Groups

Using Groups

Using Groups