1 / 38

Clustered Indexing for Conditional Branch Predictors

Clustered Indexing for Conditional Branch Predictors. Veerle Desmet Ghent University Belgium. Clustered Indexing for Conditional Branch Predictors. Veerle Desmet Ghent University Belgium. Conditional Branches. for (i=0; i<50; i++) { /* a loop... */ } /* next statements */.

ecouch
Download Presentation

Clustered Indexing for Conditional Branch Predictors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustered Indexingfor Conditional Branch Predictors Veerle Desmet Ghent University Belgium

  2. Clustered Indexingfor Conditional Branch Predictors Veerle Desmet Ghent University Belgium

  3. Conditional Branches for (i=0; i<50; i++) { /* a loop... */ } /* next statements */ How frequent do conditional branches occur? if (i > 0) /* something */ else /* something else */ 1/8

  4. Program Execution • Fetch = take next instruction • Decode = analyze type and read operands • Execute • Write Back = write result Fetch Decode Execute Write Back R1=R2+R3 addition 4 3 computation R1 contains 7

  5. R5=R2+1 R1=R2+R3 R4=R3-1 R5=R2+1 R1=R2+R3 R7=2*R1 R4=R3-1 R5=R2+1 R1=R2+R3 R5=R6 R7=2*R1 R4=R3-1 R5=R2+1 R1>0 R5=R6 R7=2*R1 R4=R3-1 Pipelined architectures Parallel versus sequential: • Constant flow of instructions possible • Faster applications • Limitation due to conditional branches Fetch Decode Execute Write Back R1=R2+R3

  6. R1=R2+R3 R5=R6 R5=R2+1 if R1>0 R7=2*R1 then R2=R2-1 R7=0 else ? R5=R2+1 R4=R3-1 R7=2*R1 R5=R6 if R1>0 R1=R2+R3 R5=R2+1 R5=R6 R1=R2+R3 if R1>0 R5=R2+1 R5=R6 ? ? ifR1>0 R5=R2+1 R7=2*R1 if R1>0 Problem: Branches • Branches introduce bubbles • Affects pipeline throughput Fetch Decode Execute Write Back

  7. R1=R2+R3 R5=R6 R5=R2+1 if R1>0 R7=2*R1 then R2=R2-1 R7=0 else R4=R3-1 R7=2*R1 R5=R6 if R1>0 R1=R2+R3 R5=R2+1 R5=R2+1 R5=R6 R1=R2+R3 R7=2*R1 if R1>0 R5=R2+1 R5=R6 R2=R2-1 R7=2*R1 ifR1>0 R5=R2+1 Solution: Prediction • Fetch those instructions that are likely to be executed Fetch Decode Execute Write Back correct prediction = gain misprediction = penalty

  8. Branch predictor Nowaday’s Architecture functional unit functional unit instruction cache fetch decode register rename dispatch instruction window functional unit register file functional unit re- order logic IPC

  9. Clustered Indexingfor Conditional Branch Predictors Veerle Desmet Ghent University Belgium

  10. Predict outcome of condition e.g. if or else based on unique branch address Update prediction table Bimodal Branch Predictor prediction table Branch address k

  11. Global History Branch Predictor prediction table • Predict outcome of condition • e.g. for loop • based on global history • 111101111011110 • Update prediction table and global history Global history k

  12. Gshare Branch Predictor prediction table [McFarling] Global history XOR Branch address Original index k

  13. Misprediction rate: gshare SPEC INT 2000 25 20 15 misprediction rate 10 5 better 0 10 100 1000 10000 100000 1000000 predictor size (bytes)

  14. Aliasing prediction table • Resource limitations: • 8 entries, index = 3 bits • index 101 • Two different branches using the same prediction information 3 bit index A Index=101 B Index=101

  15. 50 destructive 45 40 constructive 35 neutral 30 25 alias rate (%) 20 15 10 5 0 16 32 64 256 512 128 1024 2048 4096 8192 16384 26214 52428 32768 13107 65536 predictor size (bytes) Aliasing SPEC INT 2000

  16. ClusteredIndexingfor Conditional Branch Predictors Veerle Desmet Ghent University Belgium

  17. Basic Observations • Branches with similar behavior can share prediction information • 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1 • 1 1 1 1 0 0 0 0 1 1 1 1 0 1 0 1 • Branches can use same table entry, e.g. • 1 1 1 1 0 0 0 0 • 1 1 1 1 0 1 0 time

  18. Time Varying Behavior A: B: C: D: 1 1 1 10 0 00 1 11 10 1 0 1 1 1 11 000 0 1 11 10 1 0 1 1 1 11 0 0 10 0 1 11 10 1 0 phase phase phase phase 100% 0% 100% 50% 100%0% 100% 60% 100%25% 0% NE NE NE 100% 33% A: B: C: D: NE = not executed

  19. Each branch represents a point in N-dim space Clusters formed by k-means algorithm Branch Clustering 100% 0% 100% 50% 100%0% 100% 60% 100%25% 0% NE NE NE 100% 33% A: B: C: D:

  20. X X X X 1. initial centers 2. calculate nearest center X X X X X X 4. Restart with new centers 3. redefine centers k-Means Cluster Algorithm

  21. X X X X X X Stable solution k-Means Cluster Algorithm X X 1. initial centers 2. calculate nearest centers 3. redefine centers

  22. X X X X X Stable solution with k=2 Stable solution with k=3 Determining k of k-Means k is chosen by BIC-score (Bayesian Information Criterion) • Tradeoff between k and goodness of a clustering best?

  23. SPEC INT 2000 from 8 to 33 clusters mcf: 8 gcc, parser: 33 Each branch belongs to exactly one cluster Branch Clustering 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% A: B: C: D: Cluster Cluster Cluster Cluster

  24. Clustered Indexingfor Conditional Branch Predictors Veerle Desmet Ghent University Belgium

  25. Subtables prediction table • Example • 8 entries, index = 3 bits • 4 clusters, 2 bits • Original index 101 Cluster Index = 1 3

  26. Subtables prediction table • Example • 8 entries, index = 3 bits • 4 clusters, 2 bits • Original index 101 Cluster Index = 1 3

  27. Subtables prediction table • Example • 8 entries, index = 3 bits • 4 clusters, 2 bits • Original index 101 • 3 to 6 bits for cluster [SPECint2000] • can be used in every predictor scheme Cluster Index = 1 3

  28. 25 bimodal original bimodal clustered 20 15 misprediction rate 10 5 0 10 100 1000 10000 100000 1000000 predictor size (bytes) Subtables for Bimodal prediction table Cluster Branch addr

  29. 25 gshare original gshare clustered 20 15 misprediction rate 10 5 0 10 100 1000 10000 100000 1000000 predictor size (bytes) Subtables for Gshare Global history prediction table Cluster Branch addr 19% better for SMALL predictors

  30. Why Clustered Indexing Works • Subtabling • Uses smaller predictors • More aliasing expected… but • More constructive aliasing

  31. Hashing: Alternative to Subtables prediction table • Keeps original global history length Global history Branch addr Cluster Gshare ix index

  32. Hashing for Gshare 25 gshare original 20 gshare clustered: subtables gshare clustered: hashed 15 5% better for LARGE predictors misprediction rate 10 5 7,5 gshare original 7 0 gshare clustered: subtables 10 100 1000 10000 100000 1000000 6,5 predictor size (bytes) gshare clustered: hashed 6 5,5 misprediction rate 5 4,5 4 3,5 1000 10000 100000 1000000 predictor size (bytes)

  33. Self Profile-Based Clustering A: B: C: D: • Limit study • Identified clusters optimal for given execution 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% Cluster Cluster Cluster Cluster

  34. additional cluster for unseen branches Cluster Cross Profile-Based Clustering A: B: C: D: 100% 0% 100% 50% 100% 0% 100% 60% 100% 25% 0% NE NE NE 100% 33% Cluster SELF Cluster Cluster Cluster SPEC-train inputs A: B: C: D: E: 90% 10% 100% 60% NE NE NENE 100% 25% NE NE NE NE 100% 33% 0% 0% 10% 20% Cluster OK Cluster Cluster Cluster

  35. 25 25 bimodal original gshare original bimodal self clustered gshare self clustered 20 20 bimodal cross clustered gshare cross clustered 15 15 misprediction rate misprediction rate 10 10 5 5 0 0 10 100 1000 10000 100000 1000000 10 100 1000 10000 100000 1000000 predictor size (bytes) predictor size (bytes) 7,5 gshare original 7 gshare self clustered: subtables gshare self clustered: hashed 6,5 gshare cross clustered: subtables gshare cross clustered: hashed 6 5,5 misprediction rate 5 4,5 4 3,5 1000 10000 100000 1000000 predictor size (bytes) Cross Profile-Based Clustering cross clustered still good GSHARE @ small budgets: subtables 12.3% less mispredictions (19% self clustered) @ large budgets: hashing 3% better (5% self clustered)

  36. Conclusion • Small branch predictors suffer from aliasing • frequently destructive • Exploit constructive aliasing • by clustering branches • Implementation • subtables (can be used in all branch prediction schemes) • hashing (specific for gshare) • Gshare misprediction rate @ 1KiB: reduced by 19% (self), 12.3% (cross) @ 256KiB: reduced by 5% (self), 3% (cross)

  37. Questions?

  38. The End

More Related