1 / 19

Recursive Data Structure Profiling

Recursive Data Structure Profiling. Easwaran Raman David I. August Princeton University. CPU. DRAM. 1. 1996. 1988. 1990. 1992. 1994. 1998. 2000. 1986. 1980. 1982. 1984. Motivation. Huge processor-memory performance gap Latency > 100 cycles

happy
Download Presentation

Recursive Data Structure Profiling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recursive Data Structure Profiling Easwaran Raman David I. August Princeton University

  2. CPU DRAM 1 1996 1988 1990 1992 1994 1998 2000 1986 1980 1982 1984 Motivation • Huge processor-memory performance gap • Latency > 100 cycles • significant fraction of memory operations in typical programs • In many applications, Recursive Data Structures (RDS) constitute a large fraction of memory usage 1000 100 10 Year

  3. Motivation • Techniques to minimize the performance impact of this gap • Caching, prefetching, out-of-order execution • Not very successful for RDS • Difficult to statically determine many RDS properties • Accesses are irregular and usually lie in critical path of execution Short loop body prevents efficient OoO execution Non-contiguous layout results in irregular access patterns while (valid(node)){ //do something //with node->data node = next(node) } 0x1000 0x2000 0x3000 0x4000 Traversal Code An RDS layout example

  4. Motivation • Linearization[Clark76, Luk99] • Speculation recovery costs outweighs benefits if the next pointer field gets overwritten frequently • Information on the dynamic behavior of entire RDS structure is important head 1000 1008 1012 1004 1016 head pos index = 0; head = pos[index] while(head){ foo(head) head = pos[index++] check(head) } Placement of the nodes in the figure correspond to their placement in memory

  5. RDS Profile • RDS profiling gives a ‘logical’ understanding of runtime behavior • ‘Application creates 100 trees’ instead of ‘application allocates 2MB in heap’ • ‘Linked list traversed 10 times’ instead of ‘Address 0x10004000 accessed 200 times’ • Profile for linearization: next pointer field in list L is modified n times

  6. 1 2 3 RDS Discovery node *tree_create(){ node *n = (node *)malloc(…); … n->left = tree_create(…); n->right = tree_create(…); } call malloc ;id = 1 mov r10 = r8 … call tree_create … call malloc ;id = 2 … mov r11 = r8 store r10[offset1] = r11; create 1->2 call tree_create … call malloc ;id = 3 … mov r12 = r8 store r10[offset2] = r12;create 1->3 C function for creating a tree • Assign unique id for value returned by malloc and create a node labeled by that id • Connect nodes by a directed edge if both the address and the value of a store have valid ids Dynamic Shape Graph Execution trace in (pseudo) assembly

  7. 1 array = malloc(…); for (i=…) array[i] = create_tree(…); … 2 5 3 4 6 7 RDS Discovery • Multiple RDS instances can be connected together in the DSG! • To separate them, we use properties of the static code • Use another graph called Static Shape Graph (SSG)

  8. RDS discovery Execution trace in (pseudo) assembly call malloc; id = 1 Mov r20 = r8 …call malloc ;id = 2 …mov r10 = r8 …… …call tree_create …… call malloc ;id = 3 …… mov r11 = r8 …store r10[offset1] = r11; create 2->3 …call tree_create …… call malloc ;id = 4 …… mov r12 = r8 … store r10[offset2] = r12;create 2->4 store r20[0] = r10 ; create 1->2 • For every static call to malloc, create a node with unique id in the Static Shape Graph (SSG) • If a store creates an edge, connect the corresponding static nodes • Check for SCCs in the SSG • Connect two dynamic nodes only if their corresponding static nodes are in same SCC 1 A 5 2 T 6 7 3 4 SSG DSG

  9. Experimental setup • Uses Pin, a dynamic instrumentation tool for Itanium • Mapping between address ranges and dynamic ids are stored in an AVL tree • Most recent mapping is cached • A mix of benchmarks from SPEC, Olden and other pointer intensive applications • Dynamic instruction count varies from a few million (ks) to over 300 billion (mesa) • All experiments run on a 900MHz Itanium 2 with 2 GB RAM running RH 7.1

  10. Profiler Performance • Profile: RDS size, lifetime, access count • Memory: <16 MB for all but 3 applications Baseline: Execution using Pin (~ 10 times slower than native)

  11. RDS usage statistics • SCCs in static shape graph (RDS types) • Usually a few(<5) per benchmark, a maximum of 31 in parser • #RDS instances (connected components in DSG) • Exhibits a wide range (1 in mcf to around million in parser) • Tend to be live for long if the program creates only a few of them • Sizes of RDS instances • Varies from a single node self-loop (parser) to a few hundred thousand nodes (mcf, parser) • #pointer chasing loads • Significant in many benchmarks • Applications show vast diversity in RDS usage • A good reason for profiling them!

  12. Temporal distribution

  13. Cumulative distribution of RDS lifetimes

  14. RDS Stability • Stability of an RDS : A notion of how 'array-like' an RDS is • Stability index : an attempt to quantify this notion • Identify the time instances (alteration points) when changes occur to the RDS structure (by stores that replace existing pointers) • Count the traversals between successive alteration points • Stability index = #intervals that account for ‘most’ of the traversals • Lower index means higher stability

  15. Cumulative distribution of stability index

  16. Conclusion • Aggressive data structure level optimization techniques for RDS need profile information for improved performance • RDS profiling gives a better understanding of the runtime behavior of RDS • RDS usage varies widely across benchmarks

  17. Extra Slides

  18. RDS Profiling: Definitions • RDS type: The abstract form of the logical data structure that is manipulated by the program • Examples: list, binary tree, graph, etc. • Can be mutually recursive (nodes point to their incident edges and vice versa to form a graph) • RDS instance: A concrete realization of the RDS type • Example: the tree created in function foo, the list pointed to by the first entry of the hash table.

More Related