Generating a software loop with memory accesses
Download
1 / 38

Generating a software loop with memory accesses - PowerPoint PPT Presentation


  • 47 Views
  • Uploaded on

Generating a software loop with memory accesses. TigerSHARC assembly syntax. Concepts. Learning just enough TigerSHARC assembly code to make a software loop “work” Comparing the timings for rectification of integer and floating point arrays, using debug C++ code, Release C++ code

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Generating a software loop with memory accesses' - sonya-berg


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Concepts
Concepts

  • Learning just enough TigerSHARC assembly code to make a software loop “work”

  • Comparing the timings for rectification of integer and floating point arrays, using

    • debug C++ code,

    • Release C++ code

    • Our FIRST_ASM code

  • Looking in “MIXED mode” at the code generated by the compiler

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Test driven development
Test Driven Development

Work with customer to check that the tests properly express what the customer wants done. Iterative process with customer “heavily involved” – “Agile” methodology.

CUSTOMER

DEVELOPER

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Generating a software loop with memory accesses

Note

Special marker

Compiler optimization

FLOATS 927  304 -- THREE FOLD

INTS 960  150 – SIX FOLD

Why the difference, and can we do better, and do we want to?

Note the failures – what are they

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Write tests about passing values back from an assembly code routine
Write tests about passing values back from an assembly code routine

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


More detailed look at the code
More detailed look at the code routine

As with 68K and Blackfin needs a .section

But name and format different

As with 68K need .align statement

Is the “4” in bytes (8 bits)or words (32 bits)

As with 68K need .globalto tell other code that this function

exists

Single semi-colons

Double semi-colons

Start function label

End function label

Used for

“profiling code”

Label format similar to 68K

Needs leading underscore and final colon

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Return registers
Return registers routine

  • There are many, depending on what you need to return

  • Here we need to use J8 as the return register to pass back “integer” pointer

  • Many registers available – need ability to control usage

    • J0 to J31 – registers (integers and pointers) (SISD mode)

    • XR0 to XR31 – registers (integers) (SISD mode)

    • XFR0 to XFR31 – registers (floats) (SISD mode)

  • Did I also mention

    • I0 to I31 – registers (integers and pointers) (SISD mode)

    • YR0 to YR31 , YFR0 to YFR31 (SIMD mode)

    • XYR, YXR and R registers (SIMD mode)

    • And also the MIMD modes

    • And the double registers and the quad registers …….

      #define return_pt_J8 J8 // J8 is a VOLATILE, NON-PRESERVED register

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Parameter passing
Parameter passing routine

  • SPACES for first four parameters ARE ALWAYS present on the stack (as with 68K)

  • But the first four parameters are passed in registers (J4, J5, J6 and J7 most of the time) (as with MIPS and Blackfin)

  • The parameters passed in registers are often stored into the spaces on the stack (like the MIPS) as the first step when assembly code functions call assembly code functions

  • J4, J5, J6 and J7 are volatile, non-preserved registers

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Can we pass back the start of the final array
Can we pass back the start of the final array routine

Still passing tests byaccident and this needs to be conditional returnvalue

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


What we need to know based on experiences from other processors
What we need to know based on experiences from other processors

  • Can we return from an assembly language routine without crashing the processor?

  • Return a parameter from assembly language routine

    • (Is it same for ints and floats?)

  • Pass parameters into assembly language

    • (Is it same for ints and floats?)

  • Do IF THEN ELSE statements

  • Read and write values to memory

  • Read and write values in a loop

  • Do some mathematics on the values fetched from memory

    All this stuff is demonstrated by coding HalfWaveRectifyASM( )

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Why is else a keyword
Why is ELSE a keyword processors

FOUR PART ELSE INSTRUCTION IS LEGAL

IF JLT; ELSE, J1 = J2 + J3; // Conditional execution – if true ELSE, XR1 = XR2 + XR3; // Conditional – if true YFR1 = YFR2 + YFR3;; // Unconditional -- always

IF JLT; DO, J1 = J2 + J3; // Conditional execution -- if true DO, XR1 = XR2 + XR3; // Conditional -- if true YFR1 = YFR2 + YFR3;; // Unconditional -- always

Having this sort of format means that the instruction pipeline is not disrupted when we do IF statements

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Label name is not the problem
Label name is not the problem processors

NOTE:

This is “C-like” syntax,

But it is not “C”

Statement must end in ;;

Not ;

ONE semicolon = end of instructionTWO semicolons = end of parallel instruction line

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Add dual semicolons everywhere worry about multiple issues later
Add dual-semicolons everywhere processorsWorry about “multiple issues” later

This dual semi-colon

Is so important that you

MUST code review for it all

the time or else you waste

so much time in the

Lab. Key in exams / quizzes

At last an error I know how to fix

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Well i thought i understood it
Well I thought I understood it !!! processors

  • Speed issue – JUMP instructions can’t be too close together when stored in memory

    • Not normally a problem when “if” code is larger

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Add a single instruction of 4 nops nop nop nop nop temporary
Add a single instruction of 4 NOPs processorsnop; nop; nop; nop;; TEMPORARY

  • Fix the last error as part of Assignment 1

Fix the remaining error

In handling the IF THEN ELSE

as part of assignment 1

Worry about code efficiency later

(refactor) when all code working

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


What we need to know based on experiences from other processors1
What we need to know based on experiences from other processors

  • Can we return from an assembly language routine without crashing the processor?

  • Return a parameter from assembly language routine

    • (Is it same for ints and floats?)

  • Pass parameters into assembly language

    • (Is it same for ints and floats?)

  • Do IF THEN ELSE statements

  • Read and write values to memory

  • Read and write values in a loop

  • Do some mathematics on the values fetched from memory

    All this stuff is demonstrated by coding HalfWaveRectifyASM( )

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Target changing this c code into assembly to get more speed
Target. processors Changing this C++ code into assembly (to get “more” speed)

  • Code we generated yesterday was similar to parts of this, but not equivalent.

  • Re-factor the code to make the assembly code and C++ functionality equivalent

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


The code was not exactly what we designed c equivalent re factor and retest after the re factoring
The code was not exactly what we designed (C++ equivalent) – re-factor and retest after the re-factoring

NEXT STEP

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Refactored c code
Refactored C++ code – re-factor and retest after the re-factoring

I THINK I UNDERSTANDENOUGH TO CHANGE THEFORMAT OF THE

IF-THEN-ELSE

TO OPTIMIZE THIS PARTICULAR CODE BIT

USE : IF TRUE EXECUTE THIS STATEMENT – SINGLE LINE

Avoiding JUMPS in the mainflow of the code will speedthe flow of the code

Almost right. SYNTAX ERROR

Look in the manual to findthe correct syntax

IF NJLE; DO, J8 = 0

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


No syntax errors no code errors code does not work code defects
No syntax errors – re-factor and retest after the re-factoring(No CODE ERRORS). Code does not work (CODE DEFECTS)

We don’t haveenough code topass all the testsbut we are failingtests we did notexpect to fail

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Run forensic tests to find out where defect is being introduced
Run “forensic tests” to find out where DEFECT is being introduced

Identify mistake byremoving “codesections”

Without the IF

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Add another line to the code can now spot the error
Add another line to the code introducedCan now spot the error

New format of

IF-THEN-ELSE

Is doing exactly the opposite of what we want

IF NOT TRUE return NULL (0)

Need JLE not NJLE

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Assignment 1 code the following as a software loop follow mips blackfin approach
Assignment 1 – code the following as a software loop – follow MIPS / Blackfin approach

DONE DURING TUTOTIAL

int CalculateSum(void) {

int sum = 0;

for (int count = 0; count < 6; count++) {

sum = sum + count;

}

return sum;

}

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Reminder software for loop becomes while loop with initial test
Reminder – software for-loop follow MIPS / Blackfin approachbecomes “while loop” with initial test

int CalculateSum(void) {

int sum = 0;

int count = 0;

while (count < 6) {

sum = sum + count;

count++;

}

return sum;

}

Do line by line translation intoassembly code

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Use software loop here do loop control first
USE SOFTWARE LOOP HERE follow MIPS / Blackfin approachDo loop control first

  • Have some jumps too close together

NOTEJGE is ILLEGALUSE NJLT

Customize?#define JGE NJLT

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Run the tests with 4 nop padding to check that get out of loop as expected
Run the tests with 4 nop padding to check that get out of loop as expected

Adding 4 nops-- lose 1 cyclegain an hour not trying tosolve the problem

If need the 1 cyclerefactor the code later

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Accessing memory
Accessing memory loop as expected

  • Basic mode

    • Special register J31 – acts as zero when used in additions

    • Pt_J5 is a pointer register into an array

    • Value_J1 is being used as a data register

    • J registers like MIPS registers (used as pointer and data).NOT like 68K or Blackfin registers – those can be used as either data or address registers but not both

    • NOTE: Later we will find that using TigerSHARC registers for data operations is a BAD idea

  • Value_J1 = [Pt_J5];; read value from memory location pointed to by J5 -- Compare to Blackfin Value_R0 = [Pt_P0];;

  • Value_J1 = [Pt_J5 + J31];; read value from memory location pointed to by J5 – but read somewhere that this CAN be faster than just Value_J1 = [Pt_J5];; -- NEED TO CONFIRM

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Accessing memory step 2
Accessing memory – step 2 loop as expected

  • Basic mode

    • Pt_J5 is a pointer register into an array

    • Offset_J4 is used as an offset

    • Value_J1 is being used as a data register to receive the memory value – load / store architecture

  • Read_J1 = [Pt_J5 + Offset_J4];; read value from memory location pointed to by (J5 + J4)

    PRE-MODIFY – address used J5 + J4, no change in J5

  • Read_J1 = [Pt_J5 += Offset_J4];; read value from memory location pointed to by J5, and then perform add operation on the J5 register (points to NEXT location)

    POST-MODIFY – address used J5, then perform J5 = J5 + J4

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Add in the memory accesses forget tigersharc risc processor
Add in the memory accesses loop as expectedFORGET TigerSHARC = RISC PROCESSOR

LOAD/STORE ONLYLike MIPS and Blackfin

Must place value intoregister, and then copyregister to memory

NO [J5 +J0] = 0;

NO J3 = 0;[J5 + J0] = J3; Uses wrong J3 – Remember TigerSHARCcan handle parallel instructions

YESJ3 = 0;;[J5 + J0] = J3;

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Understand the error message too many j resource usage missing
Understand the error message loop as expectedToo many J resource usage = missing ;;

Unintentionally doing theparallel instruction line

[J5 + J0] = J2; J0 = J0 + 1;;

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Note missing label is not an assembler error it s a linker error
Note: Missing label is not an assembler error, it’s a linker error

Fix warningsDEFECTmay be days before try to linkthen hard to find

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Generating a software loop with memory accesses
NOW the assembler know where “CONTINUE” is, then it can tell you that you have two JUMP instructions too close together

  • Fix with magic 4 nops; and lose one cycle / loop

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Not getting expected test results something is logically wrong defect
Not getting expected Test results tell you that you have two JUMP instructions too close togetherSomething is logically wrong (DEFECT)

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Generating a software loop with memory accesses
Obvious question – are we even getting into the loop. Add BREAKPOINT to TEST code flow.(We don’t add BREAKPOINTS to code follow in detail)

CODE NEVER GOT TOBREAKPOINT meanscode never entered loop

Forgot to do count = 0

So not even getting into loop as there isa garbage value already inCount_J0 fromcode we executedearlier -- DEFECT

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Not bad for a first effort faster than compiler in debug mode
Not bad for a first effort BREAKPOINT to TEST code flow.Faster than compiler in debug mode

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Where did the float asm code suddenly appear from
Where did the float ASM code suddenly appear from? BREAKPOINT to TEST code flow.

  • Integer 0 has bit pattern 0x0000 0000

  • Float 0.0 has bit pattern 0x0000 0000

  • Integer +6 has format b 0??? ???? ???? ???? ???? ???? ???? ????

  • Float +6.0 has format b 0??? ???????? ???? ???? ???? ???? ????

  • Integer -6 has format b 1??? ???? ???? ???? ???? ???? ???? ????

  • Float -6.0 has format b 1??? ???????? ???? ???? ???? ???? ????

  • Format’s are very different, but the sign bit is in the same place

  • Float algorithm - if S == 1 (negative) set to zero

    Otherwise leave unchanged – same as integer algorithm

  • Just re-use integer algorithm with a change of name

EXPONENT

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


Final code float rectify code just has a different name
Final code – Float rectify code just has a different name BREAKPOINT to TEST code flow.

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada


What we now know
What we NOW KNOW BREAKPOINT to TEST code flow.

  • Can we return from an assembly language routine without crashing the processor?

  • Return a parameter from assembly language routine

    • (Is it same for ints and floats?)

  • Pass parameters into assembly language

    • (Is it same for ints and floats?)

  • Do IF THEN ELSE statements

  • Read and write values to memory

  • Read and write values in a loop

  • Do some mathematics on the values fetched from memory

    All this stuff is demonstrated by coding HalfWaveRectifyASM( )

TigerSHARC assemble code 2, M. Smith, ECE, University of Calgary, Canada