Enhancing Y86 Pipeline for Reduced CPE Time

Reducing Average CPE Time On A Y86 Pipelined Processor Darren Stikes DS58062

Y86 Processor • Has A pipeline architecture. • Has 5 Stages, FDEMW • Its Pipeline allows 5 instructions at a time.

The Five Stages(Fetch) • Gets the address at the current program counter and reads the instruction there • Fetches the address of the next instruction

Decode Stage • Reads in registers used in the instruction • Places them in the correct processor registers so they can be used by the execute stage

Execute stage • Computes memory address • uses the ALU to make a computation to registers

Memory Stage • Reads and writes data needed to be used for the current instruction

Write stage • Saves the computed value from the other stages into a register or memory address.

E M W F F D E M W Problems? • Since more than one instruction is being ran though the pipeline at one time, problems occur when data needed from the previous instruction isn’t computed yet. • Some of the ways these problems can be fixed is by adding bubbles or stalling F D E M W F D E M W F D E M W D D E M W F F D E M W

Common Problems • Most Problems in Pipeline architecture can be handled by forwarding, which is using pipeline registers to obtain a value before it is written normally

Unsolved Problems • Load/use Data hazards. • Branch-Missprediction

Happens when something is needed from memory of the previous instruction. This doesn’t work because it simply needs more cycles in between the two. Put instructions between the two that uses registers independent of the one with a problem. This will give time for the information to be collected. Load/Use Data Hazards Solutions

With the Y86 Pipeline Architecture, if a condition branch is in the pipeline, it assumes that the branch will be taken. The problem Arises when the branch ends up not taken and falls through. All the instructions that have started down the pipeline would then not be the correct instructions, and they would need to be removed. If you have an idea of which way the condition codes will be at the time of the branch, you can rearrange your code to where your branch would be taken most of the time. Branch - Miss prediction Solution

Hardware Added instruction iaddl which will allow a number added to a register. This saves lines of code used to simply place a number in a register, just to add it once. Added instruction leave, which takes the place of the two instructions: rrmovl %ebp, %esp popl %ebp with: leave Software Replaced old hardware instructions with the new ones. Rearranged the conditional branch inside the loop to be taken most of the time. Placed a line of code between a use/load hazard. Rearranged the place where I subtracted the length, then removed the and instruction that was specifically designed to set condition codes. Enhancements!

BEFORE: Loop: mrmovl (%ebx), %eax rmmovl %eax, (%ecx) andl %eax, %eax jle Npos irmovl $1, %edi addl %edi, %esi Npos: irmovl $1, %edi subl %edi, %edx irmovl $4, %edi addl %edi, %ebx addl %edi, %ecx andl %edx,%edx jg Loop AFTER: Loop: mrmovl (%ebx), %eax rmmovl %eax, (%ecx) andl %eax, %eax jle Npos addl $1, %esi Npos: iaddl $-1, %edx iaddl $4, %ebx iaddl $4, %ecx andl %edx,%edx jg Loop Also added the leave instruction at bottom of code Changes #1

Before: Loop: mrmovl (%ebx), %eax rmmovl %eax, (%ecx) andl %eax, %eax jle Npos addl $1, %esi Npos: iaddl $-1, %edx iaddl $4, %ebx iaddl $4, %ecx andl %edx,%edx jg Loop AFTER: Loop: mrmovl (%ebx), %eax rmmovl %eax, (%ecx) iaddl $1, %esi andl %eax, %eax jg pos iaddl $-1, %esi pos: iaddl $-1, %edx iaddl $4, %ebx iaddl $4, %ecx andl %edx, %edx jg Loop Changes#2

Before: Loop: mrmovl (%ebx), %eax rmmovl %eax, (%ecx) iaddl $1, %esi andl %eax, %eax jg pos iaddl $-1, %esi pos: iaddl $-1, %edx iaddl $4, %ebx iaddl $4, %ecx andl %edx, %edx jg Loop AFTER: Loop: mrmovl (%ebx), %eax iaddl $1, %esi rmmovl %eax, (%ecx) andl %eax, %eax jg pos iaddl $-1, %esi pos: iaddl $-1, %edx iaddl $4, %ebx iaddl $4, %ecx andl %edx, %edx jg Loop Changes #3

Before: Loop: mrmovl (%ebx), %eax iaddl $1, %esi rmmovl %eax, (%ecx) andl %eax, %eax jg pos iaddl $-1, %esi pos: iaddl $-1, %edx iaddl $4, %ebx iaddl $4, %ecx andl %edx, %edx jg Loop AFTER: Loop: mrmovl (%ebx), %eax iaddl $1, %esi rmmovl %eax, (%ecx) andl %eax, %eax jg pos iaddl $-1, %esi pos: iaddl $4, %ebx iaddl $4, %ecx iaddl $-1, %edx jg Loop Changes #4

Results • The results after each change are as follows: • This resulted in lowering my CPE time by 36%, from 18.15 to 11.59.

Results in order of the way I changed them

Enhancing Y86 Pipeline for Reduced CPE Time

Enhancing Y86 Pipeline for Reduced CPE Time

Presentation Transcript

Chapter 4 Processor Architecture Pipelined Implementation

Processor Architecture/Y86 CSCi 2021: Computer Architecture and Organization

Pipelined Processor Design

5 Pipelined Processor

A Pipelined Processor

Frame-Level Pipelined Motion Estimation Array Processor

Pipelined Processor II (cont’d) CPSC 321

CpE 442 Designing a Pipeline Processor (lect. II)

designKilla: The 32-bit pipelined processor

32-bit Pipelined RISC Processor

Pipelined Processor Design

Lecture 9. MIPS Processor Design – Pipelined Processor Design #2

Sequential Implementation of Y86 Processor

Branch Hazards in the Pipelined Processor

32-bit Pipelined RISC Processor

32-bit Pipelined RISC Processor

Pipelined Processor Design

CpE 442 Designing a Pipeline Processor (lect. II)

Pipelined Processor Design

Pipelined Processor Design