70 likes | 138 Views
Learn about optimal assembly coding, speed, and compiler optimization in embedded systems. Utilize registers intelligently for function arguments and data processing with simple and complex examples.
E N D
Why Assembly? • Speed • Not affected by compiler optimization
Registers that can be used without saving • r0 • r18-r25 • r25-r27 (X) • r30-r31 (Z) • r1 (must be cleared before returning)
Assembler function arguments • Arguments allocated left to right (r25 to r18) • Even register aligned
Simple assembler example uint32_t subit(uint32_t ul, uint8_t b){ return(ul-b);} #include <avr/io.h> .text .global subitsubit: sub r22, r20 ; subtract b (r20) from ul (r25-r22)sbc r23, r1 ; .. NOTE: gcc makes sure r1 is always 0sbc r24, r1 ; ..sbc r25, r1 ; .. ret .end
More complex example: #include <avr/io.h>; defines the # of cpu cycles of overhead; (includes the ldi r16,byte0; ldi r17,byte1; ldi r18, byte2, ; ldi r19, byte3, and the call _delay_cycles)OVERHEAD = 24; some register aliasescycles0 = 22cycles1 = 23cycles2 = 24cycles3 = 25temp = 19 .text .global delay_cyclesdelay_cycles:;; subtract the overheadsubi cycles0,OVERHEAD ; subtract the overheadsbc cycles1,r1 ; ..sbc cycles2,r1 ; ..sbc cycles3,r1 ; ..brcsdcx ; return if req’d delay too short ;; delay the lsbmov r30,cycles0 ; Z = jtable offset to delay 0-7 cycles com r30 ; ..andi r30,7 ; ..clr r31 ; ..subi r30,lo8 (-(gs(jtable))) ; add the table offsetsbci r31,hi8 (-(gs(jtable))) ; ..ijmp ; vector into table for partial delayjtable: nopnopnopnopnopnopnop;; delay the remaining delayloop: subi cycles0,8 ; decrement the count (8 cycles per loop)sbc cycles1,r1 ; ..sbc cycles2,r1 ; ..sbc cycles3,r1 ; ..brcsdcx ; exit if donenop ; .. add delay to make 8 cycles per looprjmp loop ; ..dcx: ret .end void delay_cycles(uint32_t cpucycles);