Modification of a Copy Function to Reduce Average Cycles Per Element Using the Y86 Processor
E N D
Presentation Transcript
Modification of a Copy Function to Reduce Average Cycles Per Element Using the Y86 Processor By Jake Coogle And Doris Marley
Implementing iaddl • Most significant modification • Replaced numerous instructions • Lowered CPE by 2.93
Better Branch Prediction • Second most significant modification. • Reordered code to jump more often. • Reducing number of mispredicted branches means less mispredicted branch recovery. • Duplication of code necessary for functionality. • Lowered CPE by 1.85.
Eliminating Bubble • Next significant modification. • Two back-to-back memory accesses. • Remedied by inserting another instruction in between. • Eliminated one instruction though each loop iteration • Lowered CPE by 1.0
Check If Positive • Tied with last significant modification. • Earlier instruction set condition codes, so use that instruction to determine if jump. • Eliminated one instruction through each loop iteration. • Lowered CPE by 1.0
From Count++ to Count-- • Fifth most significant modification • Start the count at length instead of 0. • Decrement when negative. • Count register updated less frequently. • CPE lowered by 0.78.
Implementing ileave • Least significant modification. • Replaced one instruction per function call. • Reduced CPE by 0.07
Total CPE reduction – 7.63 Average CPE reduced to 10.52