slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Out-of-Order OpenRISC 2 semesters project PowerPoint Presentation
Download Presentation
Out-of-Order OpenRISC 2 semesters project

Loading in 2 Seconds...

play fullscreen
1 / 22

Out-of-Order OpenRISC 2 semesters project - PowerPoint PPT Presentation


  • 226 Views
  • Uploaded on

Out-of-Order OpenRISC 2 semesters project . Semester B: OR1200 ISA Extension Final B Presentation. 10.3.14 . By: Vova Menis-Lurie Sonia Gershkovich Advisor: Mony Orbach. Spring 2013. Content:. 1 . Project Overview a. Background

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Out-of-Order OpenRISC 2 semesters project' - fawzi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Out-of-Order OpenRISC2 semesters project

Semester B: OR1200 ISA ExtensionFinal B Presentation

10.3.14

By: Vova Menis-Lurie

Sonia Gershkovich

Advisor: MonyOrbach

Spring 2013

content
Content:

1. Project Overview

a. Background

b. Goals

2. The System: OR1200

Project Flow

a. Simulation Environment

b. Out-of-Order Implementation

c. Super Scalar implementation

d. ISA Extension

4. Conclusions

project overview background
ProjectOverviewBackground
  • OpenRISC 1200 is an open source Verilog implementation of OR1000 ISA
  • As a part A, we created basic working environment on XUPV5 board and SoC with OR1200 CPU
project overview project goal
ProjectOverviewProjectGoal
  • Initial Goal:
  • Out-of-Order execution processor implementation based on OR1200 implementation
  • Changedgoal:
  • Super Scalar processor implementation based on OR1200 implementation
  • FinalGoal
  • ISA Extension Implementation for OR1200
slide6

OR1200 top

WBI

Cache

MMU

QMEM

CPU

Instruction

WBIU

ICache

IMMU

WB bus

32

32

32

32

Data

WBIU

DMMU

DCache

32

32

32

32

32

WB bus

Store

Buffer

slide7

Project Flow

  • Cache initialization function in assembly to enable cache. (WB Interface protocol require 3 cycles for each transaction – not effective for rtl analyze and implementation improvements )
  • Simulation Environment Creation (Testbench)
  • Out-of-Order implementation – try
  • Super-Scalar implementation – try
  • ISA extension of current implementation
slide8

Simulation Environment

Environment features:

  • UART interface emulation
  • Waveform generation
  • One Makefile to:
              • RTL Compilation
              • Testbench instantiation
              • C program compilation
              • Run simulation
              • Assembly code file creation
              • XILINX ram initialization file
slide9

Simulation Environment

Environment features:

  • Advanced monitor:
              • Monitoring all data and control transactions of SoC
              • Monitoring states and SPRS values
              • Creates log files with desired information:
                • States of register file after each command
                • Execution time analysis
slide10

Out of Order implementation – try

OR1200 IF

OR1200 top

OR1200 CTRL

Freeze

GenPC

FPU

Fundamental statements (based on Tomasulu algorithm):

  • Execution parallelism should be implemented !!
  • Non-arch shadow registers implementation.
  • In order commitment. (SW executes in order)

Except

ALU

OR1200 top

  • For LSU instruction parallelism
  • –multiple ports memory and wider bus
  • -multiple port Cache, QMEM and MMU

MAC

  • Branch prediction is not necessary –
      • delay slot at compiler level

LSU

OR1200 top

  • Multiple ALU – not effective solution
    • ALU instructions executed in one cycle

OR1200

RF

Operand MUX

CFGR

SPRS

WB MUX

CPU

PC

Next PC

slide11

Super Scalar implementation – try

Fundamental statements :.

  • Still in-order commitment. Multiple execution should not affect SW in-order execution
  • Non-parallel Fetch and Decode to avoid instructions dependencies.
  • Not all dependencies can be seen at fetch/decode stage LSU results may be required
  • Fetch and Decode units should be completely rewritten based on current implementation
  • Exception engine should support 2 pipes – requires exception unit complete redesign
  • Multiple port SPRS should be implemented.
  • Parallel LSU instruction execution in 2 pipes requires multiple port memories and wider bus
isa extension final goal
ISA Extension – final goal
  • gcc OR1000 compiler and assembler support empty slots for custom ISA extension
          • 8 non-parameter commands:
            • l.cust1
            • l.cust2
            • l.cust3
            • l.cust4
            • l.cust6
            • l.cust7
            • l.cust8
          • 1 highly parameterized command
            • l.cust5 Rd , Ra , Rb , L immediate[5:0] , K immediate [4:0]
            • Allows 2048 !! commands which operates on 3 registers.
  • ISA extension will not be used by compiler to generate assembly code from given C code, but gcc allows assembly commands use aside C code.
l cust commands implementation
l.cust Commands Implementation

4 Non parameterized commands

  • l.cust1
            • Set flag (unconditioned)
  • l.cust2
            • Unset flag (unconditioned)
  • l.cust3
            • Set carry (unconditioned)
  • l.cust4
            • Unset carry (unconditioned)
l cust commands implementation1
l.cust Commands Implementation

l.cust5 parameterized command : K immediate defines command, L immediate defines options

  • K=0x1
            • Replaces A[L_byte] with B[0_byte] and put result in D
  • K=0x2
            • SET bit A[L] (Result in D)
  • K=0x3
            • UNSET bit A[L] (Result in D)
l cust commands implementation2
l.cust Commands Implementation

l.cust5 parameterized command : K immediate defines command, L immediate defines options

  • K=0x4
            • Slice A(MSB’s) and B(LSB’s) and put result in D >> D = {A[32-L:L] , B[L-1:0]}
  • K=0x5
            • Slice B(MSB’s) and A(LSB’s) and put result in D >> D = {B[32-L:L] , A[L-1:0]}
  • K=0x6
            • Rotate A >> D = A[0:31]
l cust commands implementation3
l.cust Commands Implementation

l.cust5 parameterized command : K immediate defines command, L immediate defines options

  • K=0x7
            • Rotate A by bit- Hword-wise >> D = {A[16:31] , A[0:15]}
  • K=0x8
            • Rotate A by bit- byte-wise >> D = {A[24:31] , A[16:23] , A[8:15] , A[0:7]}
  • K=0xa
            • Check if A is even. If true D=1 and set flag else D=0
  • K=0xb
            • Check if A is odd. If true D=1 and set flag else D=0
l cust commands implementation4
l.cust Commands Implementation

l.cust5 parameterized command : K immediate defines command, L immediate defines options

  • K=0xe
              • L=2: Rotate A 2bytes MSB’s with 2bytes LSB’s >> D = {A[15:0] , A[31:16]}
              • L=4: Rotate A byte-wise >> D = {A[7:0] , A[15:8] , A[23:16] , A[31:24]}
              • L=8: Rotate A Hbyte-wise >> D = {A[3:0] , A[7:4] , A[11:8] , A[15:12] , A[19:16] , A[23:20] , A[27:24] ,A[31:28]};
  • K=0xf
              • L=0: Mirror LSB’s >> D = {A[0:15] , A[15:0]}
              • L=1: Mirror MSB’s >> D = {A[31:16] , A[16:31]}
slide20

FPGA Utilization

New RTL

Old RTL

~1% change

slide21

Conclusions

  • Given implementation is not suitable for any significant u-Arch improvements
  • Out-of-Order / Super-Scalar OR1200 implementations are possible but should be done from scratch.
  • Written in assembly software can be easily optimized for specific application due to l.cust instructions (2048 instructions with 5 operands)