1 / 85

Optimizing User Code in Allegro CL 5.0

Optimizing User Code in Allegro CL 5.0. by Duane Rettig. Optimizing User Code in Allegro CL 5.0. Introduction Optimization-related lisp architecture Undocumented tools in Allegro CL Optimization methodology Speed optimizations Space optimizations Speed vs space tradeoffs

libby
Download Presentation

Optimizing User Code in Allegro CL 5.0

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimizing User Code in Allegro CL 5.0 by Duane Rettig

  2. Optimizing User Code in Allegro CL 5.0 • Introduction • Optimization-related lisp architecture • Undocumented tools in Allegro CL • Optimization methodology • Speed optimizations • Space optimizations • Speed vs space tradeoffs • Lisp heap management

  3. Optimizing User Code in Allegro CL 5.0 • Introduction • Optimization-related lisp architecture • Undocumented tools in Allegro CL • Optimization methodology • Speed optimizations • Space optimizations • Speed vs space tradeoffs • Lisp heap management

  4. Optimization-related Lisp Architecture • Static vs dynamic - Function dispatch • Closure structure • Foreign functions - entry-vec struct • Disassemble extensions

  5. Architecture: Static Vs Dynamic • Pure static: absolute • Relocatable • Shared libraries • Dynamic shared libraries • Dynamic functions

  6. Static Programs • Absolute addresses • Fast startup • Fast running • Large • Not reconfigurable 0 Code Data sbrk

  7. Relocatable Programs • Not tied to a base address • Slightly longer startup times • Fast running • Large • Not reconfigurable Code Data Reloc

  8. Programs that Use Shared Libraries • Usually need relocation • Smaller than non-shared libraries • Faster startup times • Medium speed; may start slow and gain speed after first use • Not reconfigurable Main Lib 1 Lib 2 Lib 3

  9. Programs that Use Dynamic Shared Libraries • May be absolute or relocatable • May be very small • Very fast startup • Medium speed, amortized over lib loading • Reconfigurable Main Lib 1 Lib 2 Lib 3

  10. Programs that Dynamically Define Functions • May be absolute or relocatable • May be very small • Very fast startup • Medium speed, amortized over function definitions • Extremely reconfigurable Main Lisp lib Heap Lib 1 Lib 2 Functions

  11. Lisp Data Availability func pc Glob table nil function codevector

  12. C Data Availability lib1 lib2

  13. Registers

  14. Caller: Store caller-saves registers set up arguments and count load name register call trampoline * restore caller-saves registers Callee: establish stack save function Execute body restore stack restore caller’s function return Calling Sequence: Lisp

  15. Caller: Store caller-saves registers set up args (no count) store caller’s context call function, function desc, or stub restore caller’s context restore caller-saves registers Callee: setup callee’s context establish stack store callee-saves registers Execute body restore callee-saves registers restore stack return Calling Sequence: C

  16. Required: get function register from name register get start address from function jump to start Optional: save argument registers check for stack overflow jump to call-count code jump to single-step code Lisp’s Symbol Trampoline

  17. Architecture: Closures External vec Internal vec

  18. Architecture: Foreign Functions Entry-vec struct (in-package :excl) (defstruct (entry-vec (:type (vector excl::foreign (*))) (:constructor make-entry-vec-boa ())) name ; entry point name (address 0) ; jump address for foreign code (handle 0) ; shared-lib handle (flags 0) ; ep-* flags (alt-address 0) ; sometimes holds the real func addr )

  19. Architecture: Foreign Functions Entry-vec flags ;; Entry-point constants: (defconstant excl::ep-call-semidirect 1) ; Real address stored in alt-address slot (defconstant excl::ep-never-release 2) ; Never release the heap (defconstant excl::ep-always-release 4) ; Always release the heap (defconstant excl::ep-release-when-ok 8) ; Release the heap unless without-interrupts (defconstant excl::ep-tramp-calls #x70) ; Make calls through special trampolines (defconstant excl::ep-tramp-shift 4) (defconstant excl::ep-variable-address #x100) ; Entry-point contains address of C var

  20. Architecture: Foreign Functions Entry vec “foo” missing_entry_point bind_and_call call_semidirect foo()

  21. Architecture: Foreign Functions Excl::.saved-entry-points. table “foo” Entry vec Entry vec “bar” Entry vec “bas” Entry vec Entry vec

  22. Architecture: Disassemble • Extensions • non-lisp names • :absolute • :addr-list • :find-callee • :find-pc • :references-only • :recurse • :target-class

  23. Disassembling non-lisp names • A string representing a C entry point • Allows for viewing of non-lisp assembler code • Some instructions are interpreted automatically

  24. (disassemble "qcons") ;; disassembly of #("qcons" 1074935746) ;; code start: #x401237c2: 0: 8b 8f ff fd movl ecx,[edi-513] ; C_GSGC_NEWCONSLOC ff ff 6: 3b 8f 03 fe cmpl ecx,[edi-509] ; C_GSGC_NEWCONSEND ff ff 12: 0f 84 3c 1e jz 7758 ; cons+0 00 00 18: 89 41 0f movl [ecx+15],eax 21: 89 c8 movl eax,ecx 23: 89 50 13 movl [eax+19],edx 26: 83 87 ff fd addl [edi-513],$8 ; C_GSGC_NEWCONSLOC ff ff 08 33: c3 ret

  25. excl::*c-symbol-table* build: • dirty (excl::*rebuild-c-symbol-table-p* is non-nil): • at lisp start • after load or unload of shared library • rebuilt: • for disassemble of a string • for profiler analysis • for “:zoom :all t :verbose t” invocation

  26. (inspect excl::*c-symbol-table*) A simple T vector (3538) @ #x2039c352 0-> cstruct (2) = #("unidentified" 0) 1-> cstruct (2) = #("_init" 134514576) 2-> cstruct (2) = #("strcpy" 134514600) 3-> cstruct (2) = #("dlerror" 134514616) 4-> cstruct (2) = #("getenv" 134514632) 5-> cstruct (2) = #("fgets" 134514648) 6-> cstruct (2) = #("perror" 134514664) 7-> cstruct (2) = #("readlink" 134514680) 8-> cstruct (2) = #("malloc" 134514696) 9-> cstruct (2) = #("malloc" 134514696) 10-> cstruct (2) = #("_lxstat" 134514712) 11-> cstruct (2) = #("isspace" 134514728) 12-> cstruct (2) = #("_xstat" 134514744) 13-> cstruct (2) = #("__libc_init" 134514760) 14-> cstruct (2) = #("strrchr" 134514776) 15-> cstruct (2) = #("fprintf" 134514792) 16-> cstruct (2) = #("fprintf" 134514792) 17-> cstruct (2) = #("strcat" 134514808) 18-> cstruct (2) = #("chdir" 134514824) 19-> cstruct (2) = #("strncmp" 134514840) ... 3537-> cstruct (2) = #("__bss_start" 1075102200)

  27. (simple function for next examples) USER(1): (defun foo (x) (list (bar x))) FOO USER(2): (compile 'foo) Warning: While compiling these undefined functions were referenced: BAR. FOO NIL NIL USER(3):

  28. (disassemble 'foo) ;; disassembly of #<Function FOO> ;; formals: X ;; constant vector: 0: BAR ;; code start: #x203dcddc: 0: 55 pushl ebp 1: 8b ec movl ebp,esp 3: 56 pushl esi 4: 83 ec 24 subl esp,$36 7: 83 f9 01 cmpl ecx,$1 10: 74 02 jz 14 12: cd 61 int $97 ; trap-argerr 14: d0 7f a3 sarb [edi-93],$1 ; C_INTERRUPT 17: 74 02 jz 21 19: cd 64 int $100 ; trap-signal-hit 21: 8b 5e 32 movl ebx,[esi+50] ; BAR 24: b1 01 movb cl,$1 26: ff d7 call *edi 28: 8b d7 movl edx,edi 30: ff 57 2b call *[edi+43] ; QCONS 33: c9 leave 34: 8b 75 fc movl esi,[ebp-4] 37: c3 ret

  29. Disassembling with absolute addresses • :absolute • Allows debug at absolute addresses • Warning: addresses may not be in sync after gc, though per-disassemble consistency is maintained

  30. (disassemble 'foo :absolute t) ;; disassembly of #<Function FOO> ;; formals: X ;; constant vector: 0: BAR 204cb5a4: 55 pushl ebp 204cb5a5: 8b ec movl ebp,esp 204cb5a7: 56 pushl esi 204cb5a8: 83 ec 24 subl esp,$36 204cb5ab: 83 f9 01 cmpl ecx,$1 204cb5ae: 74 02 jz 0x204cb5b2 204cb5b0: cd 61 int $97 ; trap-argerr 204cb5b2: d0 7f a3 sarb [edi-93],$1 ; C_INTERRUPT 204cb5b5: 74 02 jz 0x204cb5b9 204cb5b7: cd 64 int $100 ; trap-signal-hit 204cb5b9: 8b 5e 32 movl ebx,[esi+50] ; BAR 204cb5bc: b1 01 movb cl,$1 204cb5be: ff d7 call *edi 204cb5c0: 8b d7 movl edx,edi 204cb5c2: ff 57 2b call *[edi+43] ; QCONS 204cb5c5: c9 leave 204cb5c6: 8b 75 fc movl esi,[ebp-4] 204cb5c9: c3 ret

  31. Disassemble support for the profiler • addr-list • Marks a specific instruction • Allows for exact profiler hits to be recorded

  32. (disassemble 'foo :addr-list -10) ;; disassembly of #<Function FOO> ;; formals: X ;; constant vector: 0: BAR ;; code start: #x204cb5a4: 0: 55 pushl ebp 1: 8b ec movl ebp,esp 3: 56 pushl esi 4: 83 ec 24 subl esp,$36 7: 83 f9 01 cmpl ecx,$1 stopped --> 10: 74 02 jz 14 12: cd 61 int $97 ; trap-argerr 14: d0 7f a3 sarb [edi-93],$1; C_INTERRUPT 17: 74 02 jz 21 19: cd 64 int $100 ; trap-signal-hit 21: 8b 5e 32 movl ebx,[esi+50] ; BAR 24: b1 01 movb cl,$1 26: ff d7 call *edi 28: 8b d7 movl edx,edi 30: ff 57 2b call *[edi+43] ; QCONS 33: c9 leave 34: 8b 75 fc movl esi,[ebp-4] 37: c3 ret

  33. (disassemble 'foo :addr-list '(11 (#x204cb5ae . 4) (#x204cb5b9 . 4) (#x204cb5c5 . 3))) ;; disassembly of #<Function FOO> ;; formals: X ;; constant vector: 0: BAR ;; code start: #x204cb5a4: 0: 55 pushl ebp 1: 8b ec movl ebp,esp 3: 56 pushl esi 4: 83 ec 24 subl esp,$36 7: 83 f9 01 cmpl ecx,$1 4 (36%) 10: 74 02 jz 14 12: cd 61 int $97 ; trap-argerr 14: d0 7f a3 sarb [edi-93],$1; C_INTERRUPT 17: 74 02 jz 21 19: cd 64 int $100 ; trap-signal-hit 4 (36%) 21: 8b 5e 32 movl ebx,[esi+50] ; BAR 24: b1 01 movb cl,$1 26: ff d7 call *edi 28: 8b d7 movl edx,edi 30: ff 57 2b call *[edi+43] ; QCONS 3 (27%) 33: c9 leave 34: 8b 75 fc movl esi,[ebp-4] 37: c3 ret

  34. Disassemble support for the debugger • :find-callee • Returns information given a relative pc • :find-pc • Returns information about instruction sequencing, or prints an instruction • :references-only • Returns references from function or glob table

  35. USER(22): (disassemble 'foo :find-callee 26) BAR :CONST -1 USER(23): (disassemble 'foo :find-callee 28) BAR :CALL 0 USER(24): (disassemble 'foo) ;; disassembly of #<Function FOO> ... 14: d0 7f a3 sarb [edi-93],$1 ; C_INTERRUPT 17: 74 02 jz 21 19: cd 64 int $100 ; trap-signal-hit 21: 8b 5e 32 movl ebx,[esi+50] ; BAR 24: b1 01 movb cl,$1 26: ff d7 call *edi 28: 8b d7 movl edx,edi 30: ff 57 2b call *[edi+43] ; QCONS 33: c9 leave 34: 8b 75 fc movl esi,[ebp-4] 37: c3 ret USER(25):

  36. USER(28): (disassemble 'foo :find-pc 14) 14 17 NIL NIL USER(29): (disassemble 'foo :find-pc 17) 17 19 21 :BCC USER(30): (disassemble 'foo :find-pc '(:print 17)) 17: 74 02 jz 21 USER(31): (disassemble 'foo :find-pc '(:print 21)) 21: 8b 5e 32 movl ebx,[esi+50] ; BAR USER(32):

  37. USER(26): (disassemble 'foo :references-only t) (SYSTEM::QCONS BAR SYSTEM::C_INTERRUPT) USER(27):

  38. Miscellaneous Disassembler modes • :recurse • Useful to control the amount of output • :target-class • Used only in cross-porting

  39. Optimizing User Code in Allegro CL 5.0 • Introduction • Optimization-related lisp architecture • Undocumented tools in Allegro CL • Optimization methodology • Speed optimizations • Space optimizations • Speed vs space tradeoffs • Lisp heap management

  40. Undocumented Tools in Allegro CL • excl::get-objects (att #42-1) • excl::get-references (typo in your notes) • excl::create-box/excl::box-value(att #42-2) • excl::atomically • allows compiler to guarantee atomic body • Autoloading facilities (described later)

  41. Atomic forms • Generally a form is atomic if it has • no interrupt-checks • no consing • no non-atomic forms or calls • Use excl::atomically like progn; if it compiles, the body is atomic • Atomic primcalls: • gsgc-setf-protect gsgc-set-protect fd-stack-real qcar qcdr • Atomic calls: • error excl::.error excl::eq-hash-fcn excl::eql-not-eq excl::get_2op-atomic excl::sxhash-if-fast excl::symbol-hash-fcn

  42. Optimizing User Code in Allegro CL 5.0 • Introduction • Optimization-related lisp architecture • Undocumented tools in Allegro CL • Optimization methodology • Speed optimizations • Space optimizations • Speed vs space tradeoffs • Lisp heap management

  43. Optimization Methodology • Get it right first • Profile it • The time macro • The Allegro CL profiler • Hit the high cost items • Implementations • Algorithms

  44. Optimizing User Code in Allegro CL 5.0 • Introduction • Optimization-related lisp architecture • Undocumented tools in Allegro CL • Optimization methodology • Speed optimizations • Space optimizations • Speed vs space tradeoffs • Lisp heap management

  45. Speed Optimizations • Profiling • Efficient compilation • Immediate compilation • Foreign function optimizations • Hash tables • CLOS optimizations • Miscellaneous optimizations

  46. Speed Optimizations: Profiling • Always compile top-level test functions • Example profile run (att #48-1) • Do not use time macro with profiler • Avoid simultaneous time/call-count profiles • When using time macro, beware of new closures

  47. Time macro: extra closures This driver is not as simple as it looks! (defun test-driver (n) (time (dotimes (i n) (test-it)))

  48. Speed Optimizations: Efficient Compilation • :explain • excl::atomically • excl:add-typep-transformer (att #50-1,2)

  49. Speed Optimizations: Immediate Compilation • Inlining and unboxing • Immediate-args • defun-immediate (att #51-1,2,3)

  50. Speed Optimizations: Foreign Functions • Call-direct (att #52-1,2) • comp:list-call-direct-possibilities

More Related