1 / 47

Design, Implementierung und Evaluierung einer virtuellen Maschine für Oz

Design, Implementierung und Evaluierung einer virtuellen Maschine für Oz. Ralf Scheidhauer PS Lab, DFKI May 18, 1999. Oz. Developed at DFKI since 1991 DFKI Oz 1.0 (1995), DFKI Oz 2.0 (1998) Mozart 1.0 (1999) 180 000 lines of C++ 140 000 lines of Oz 65 000 lines documentation

druce
Download Presentation

Design, Implementierung und Evaluierung einer virtuellen Maschine für Oz

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Design, Implementierung und Evaluierung einer virtuellen Maschine für Oz Ralf Scheidhauer PS Lab, DFKI May 18, 1999

  2. Oz • Developed at DFKI since 1991 • DFKI Oz 1.0 (1995), DFKI Oz 2.0 (1998) • Mozart 1.0 (1999) • 180 000 lines of C++ • 140 000 lines of Oz • 65 000 lines documentation • Since 1996 collaboration with SICS and UCL • Application strength system:multi agents (DFKI, SICS), computer-bus scheduling (Daimler), gate scheduling (Singapore), NL (SFB), comp. biology (LMU),...

  3. Related Work • LP, CLP [Warren 77], [Jaffer Lassez 86] • Concurrency [Saraswat 93] • AKL [Janson Haridi 90, Janson 94] • FP [Appel 92]

  4. Overview • Language L • Virtual machine • Implementation • Evaluation

  5. The Language L • Core language of Oz • Presentation as extension of a sub language of SML • Logic variables • Threads • Synchronization • Dynamic type system • Extensions via predefined functionslvar()logic variableunify(x,y)unificationspawn(f)thread creation

  6. TUPLE INT/3 TUPLE CELL INT/5 CON Graph Model • Integers • Tuples • Functions • Cells (references) • Constructors • Strict evaluation of expressionse0  e1  ...

  7. Why Logic Variables? • Programming techniques: backpatching, difference lists, ... • Cyclic data structures • Tail recursive definition of many functions (append, map, ...) • Synchronization of threads • Search

  8. TUPLE INT/4 VAR INT/23 Logic Variables: Creation and Representation let val x = lvar()in (4,x,23)end

  9. TUPLE TUPLE INT/3 INT/2 INT/3 INT/5 Logic Variables: Unification unify( , ) TUPLE TUPLE INT/3 VAR INT/2 INT/3 INT/5 VAR

  10. threadn+1 f() Threads thread1 threadn . . . • Creationspawn(f) e1 en store • Synchronization: logic variables (x+y) • Fairness

  11. Virtual Machine

  12. Model X-regs stack threads heap ...move Y3 X0move G5 X1apply G2 2return... scheduler code

  13. V-Addressing • Address toplevel variables via V-registers • Loader builds data on the heap code contains direct references into heap • Examplefun f(l,u) = map(fn(x)=>h(x)+g(x)+u, l) • h and g in V-register  reduced memory consumption

  14. specApply V3 2 fastApply V3 Dynamic Code Specialization apply V3 2

  15. TUPLE TUPLE INT/3 REF INT/2 INT/3 INT/5 REF Unification in the Machine Model unify( , ) TUPLE TUPLE INT/3 VAR INT/2 INT/3 INT/5 VAR

  16. suspension x: VAR . . . y: VAR Synchronization = Suspension + Wakeup (x+y) ... thread

  17. to the scheduler (x+y) ... INT/23 x: REF thread . . . y: VAR Synchronization = Suspension + Wakeup • Wakeup: unify(x,23)

  18. Implementation

  19. Emulator vs. Native Code • portable • flexible virtual machine implementation emulator native code • fast (?)

  20. Threads • X registers: once per machine, not per thread • Save live X registers upon preemption/suspension:pessimistic guess per function • Exact determination during GC by code interpretation

  21. INT 23 Representation of the Graph: Naiv register heap type ... ...

  22. Representation of the Graph: Optimized heap register INT 23 type ... ... PTR ...

  23. Representation of the Graph: Logic Variables register heap INT 23 VAR ... PTR REF ... PTR

  24. WAM REF REF Logic Variables: Optimized register heap INT 23 type ... ... PTR ... ... VAR REF register REF

  25. Moving More Tags register heap INT 23 type PTR ... ... REF ... TPL ... ...

  26. Evaluation

  27. Comparison with Emulators • Mozart is one of the fastest emulators • Competitive with OCAML and Java • Significantly faster than Moscow ML • Twice as fast as Sicstus Prolog and Erlang

  28. Comparison with Native Code Systems • Few memory accesses (i.e. arithmetics) Mozart is easily one order of magnitude slower • Memory intensive (symbolic computation) • Difference only approx. factor 2-3 • Mozart in single cases faster than native ML or C++

  29. Threads • Threads in Mozart are very light weight • Leading position both for creation and communication • Up to nearly 2 orders of magnitude faster than Java (creation)

  30. Summary • Extended sub language of SML by logic variables and threads • Machine model • V - registers • Dynamic code specialization • Synchronization • Implementation • Efficient implementation of threads • Tagging scheme • Evaluation • Mozart is one of the fastest emulators • Compares well with native code systems on its target applications • Mozart has very light weight threads

  31. Backup Slides for the Discussion

  32. Logic Variables vs. Functions • Runtime fibonacci takeushi speedup 1.18 1.45 • Memory (large scale applications) • Use approx. 18 % of heap memory • Approx. twice as much as objects • Approx. as much as records

  33. Memory Profile

  34. Mandelbrot (Floats) 1.00 2.65 1/1.11 1/1.58 1/8.77 1/11.23 1.37 1/39.24

  35. Quicksort with Lists 1.00 2.43 1.57 5.19 1/2.59 1/3.69 1/2.99 1/3.46

  36. Quicksort with Arrays 1.00 1.25 1/1.48 1/4.01 1/7.92 1/1.52 1/20.86

  37. Naiv Reverse 1.00 1.81 1.59 1.51 11.82 1.04 1/1.60 2.05 1.70

  38. Threads: Creation

  39. Threads: fib(20) 1.0 1.09 4.73 708.06 1/1.14

  40. Tagging Scheme of Mozart • 4 bit tag, but only 2 bit loss for address space (=1GB):align structures on word boundaries • Lists, tuples: no need to unmask before type test • REF - tag • no unmask before test necessary • no unmask before deref

  41. Threads move Y3 X0move G5 X1apply G2 2... PC task L G X thread

  42. Emulators: Optimization Techniques • Threaded code • Instruction collapsing • Register access • Specialization • Examplemove Y5 X3 move Y6 X1 34 11 (SPARC)

  43. Address Modes (Registers) name liveness notation usage X thread Xi temp. values, parameters local fct-body Li local variablesglobal function Gi free variablesvirtual program Vi constants

  44. Threads • Fairness: status-registercheck on every function call (and return) .... PRE GC IO

  45. L e ::= x variable| n integer|(e1,...,en) tuple|fn(x1,...,xn) => e function| e0(e1,...,en) application|letval x = e in e endvariable declaration|letcon x in e endconstructor declaration |case e of p1=> e1| ... | pn=>en pattern matching lvar : () -> logic variableunify :  -> () unificationspawn : (() -> ) -> () thread creation Operators

  46. add Xi Xk Xn Tagged Xi = X[*(PC+1)]; 2 0 (2) DEREF(Xi); 2 0 if (isInt(getTag(Xi))) { 1+2 0 Tagged Xk = X[*(PC+2)]; 2 2 DEREF(Xk); 2 0 if (isInt(getTag(Xk))) { 1+2 0 int aux = intValue(Xi)+intValue(Xk); 1+1+1 2 XPC(3) = oz_int(aux); ovflw+shifttag+store 3+2+2 0 (2) DISPATCH(4); 3 3 } --------------- } 27 7 (11) no derefs 23 no type tests 17overflow 6

  47. Java: JIT vs. Emulator speedup quicksort (array) 18.8 fib (int) 14.2 fib (float) 4.9 queens 6.1 nrev 2.0 quicksort (list) 2.3 fib (thread) 1.1 mandelbrot 5.4 deriv (virtual) 1.9

More Related