1 / 44

Koichi Kodama (Tokyo Institute of Technology) Kohei Suenaga (University of Tokyo)

Translation of Tree-processing Programs into Stream-processing Programs based on Ordered Linear Type. Koichi Kodama (Tokyo Institute of Technology) Kohei Suenaga (University of Tokyo) Naoki Kobayashi (Tohoku University). Two main methods of XML processing. Tree-processing

alishac
Download Presentation

Koichi Kodama (Tokyo Institute of Technology) Kohei Suenaga (University of Tokyo)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Translation of Tree-processing Programs into Stream-processing Programs based on Ordered Linear Type Koichi Kodama (Tokyo Institute of Technology) Kohei Suenaga (University of Tokyo) Naoki Kobayashi (Tohoku University)

  2. Two main methods of XML processing • Tree-processing • Views XML documents as tree structures • Manipulates tree structure of input data • Trees are kept on memory in many implementations • e.g., DOM API, XDuce, CDuce • Stream-processing • Views XML documents as stream of tokens • Read/Write tokens from/to streams • e.g., SAX The Second Asian Symposium on Programming Language and Systems

  3. node node leaf leaf 4 3 leaf leaf 3 2 Tree-processing Programmers write here memory disk, network <node> <leaf>2</leaf> <leaf>3</leaf></node> <node> <leaf>3</leaf> <leaf>4</leaf></node> The Second Asian Symposium on Programming Language and Systems

  4. Stream-processing memory Programmers write here disk, network <node> <leaf>2</leaf> <leaf>3</leaf></node> <node> <leaf>3</leaf> <leaf>4</leaf></node> The Second Asian Symposium on Programming Language and Systems

  5. Example of programs in each style • Incrementing the value of each leaf • Tree-processing fix f. λt. case t of leaf x → leaf (x + 1) | node x1 x2 → node (f x1) (f x2) • Stream-processing fix f. λt. case read() of leaf → write(leaf); write(read() + 1) | node → write(node); f (); f () The Second Asian Symposium on Programming Language and Systems

  6. Comparison of two styles The Second Asian Symposium on Programming Language and Systems

  7. Our goal • Taking the best of both world • Readability and writability of tree-processing • High memory efficiency of stream-processing • Approach • Automatic translation of tree-processing programs into stream-processing programs The Second Asian Symposium on Programming Language and Systems

  8. Key observation (1/2) • In order for stream-processing to be effective, a program should access an input tree fromleft to right, in the depth-first order. fix f. λt. case t of leaf x → leaf (x + 1) | node x1 x2 → node (f x2) (f x1) The Second Asian Symposium on Programming Language and Systems

  9. Key observation (2/2) • If a program accesses an input tree in that order, then there isa simple, structure-preserving translation into an equivalent stream-processing program. fix f. λt.case t of leaf x → leaf (x + 1)| node x1 x2 → node (f x1) (f x2) fix f. λt.case read() of leaf → write(leaf); write(read() + 1) | node → write(node); f (); f () The Second Asian Symposium on Programming Language and Systems

  10. Our solution • Use the idea of ordered linear types [Polakow 2000] • Guarantee “correct” access order by assigningordered linear types to input trees The Second Asian Symposium on Programming Language and Systems

  11. Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems

  12. Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems

  13. Languages • Source language • Call-by-value λ-calculus + primitives for binary tree processing • Target language • Call-by-value λ-calculus + primitives for stream processing The Second Asian Symposium on Programming Language and Systems

  14. Source language M (terms) ::= i (integer) | M1 + M2 (addition) | x (variable) | λx.M (abstraction) | M1 M2 (application) | fix f. M (recursive function)| leaf M (leaf with an integer) | node M1 M2 (branch) | (case M of (case analysis) leaf x → M1| node x1 x2 → M2) The Second Asian Symposium on Programming Language and Systems

  15. Target language e (terms) ::= i (integer) | e1 + e2 (addition) | () (unit) | x (variable) |λx.e (abstraction) | e1 e2 (application) | fix f. e (recursive function)| leaf | node (tag) | read () | write e (stream manipulation) | (case e of (case analysis) leaf → e1 | node → e2) The Second Asian Symposium on Programming Language and Systems

  16. leaf leaf 1 2 Representation of trees in each language node • Source language • node (leaf 1) (leaf 2) • Target language • node; leaf; 1; leaf; 2 We do not consider closing tags because we only focus on binary trees for now The Second Asian Symposium on Programming Language and Systems

  17. Example of programs • Incrementing the value of each leaf • Source language fix f. λt. case t of leaf x → leaf (x + 1) | node x1 x2 → node (f x1) (f x2) • Target language fix f. λt. case read() of leaf → write(leaf); write(read() + 1) | node → write(node); f (); f () The Second Asian Symposium on Programming Language and Systems

  18. Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems

  19. Type system • Utilizes ordered linear type • Properties of well-typed programs: • access each node of the input tree exactly oncein left-to-right, depth-first order • do not construct trees on memory The Second Asian Symposium on Programming Language and Systems

  20. Types τ (types) ::= Int (integer) | τ1→τ2 (function) | InTree (input tree) | OutTree (output tree) The Second Asian Symposium on Programming Language and Systems

  21. Type judgment • (Non-ordered) type environment: Γ • { x1:τ1, x2:τ2, ..., xn:τn } (τi ∈ {InTree, OutTree} ) • Set of bindings • Ordered linear type environment: Δ • x1:InTree, x2:InTree, ..., xn:InTree • Sequence of bindings Γ | Δ├ M :τ Represents that each of x1, x2, ..., xn is accessed exactly once in the order The Second Asian Symposium on Programming Language and Systems

  22. Example of type judgment Γ = f : InTree → OutTree Δ = x1 : InTree, x2 : InTree Γ | Δ├ node (f x1) (f x2) : OutTreeΓ | Δ├ node (f x2) (f x1) : OutTreeΓ | Δ├ node (f x1) (f x1) : OutTree The Second Asian Symposium on Programming Language and Systems

  23. Typing rules Γ|Δ1├M1 :OutTree Γ|Δ2├M2 :OutTree (T-NODE) Γ|Δ1,Δ2├ node M1 M2 :OutTree Γ|Δ1├M :InTree Γ,x:Int|Δ2├M1 :τ Γ|x1:InTree, x2:InTree, Δ2├M2 :τ (T-CASE) Γ|Δ1,Δ2├ case M of leaf x → M1 |node x1 x2 → M2 :τ The Second Asian Symposium on Programming Language and Systems

  24. Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems

  25. Translation algorithm (1/2) • A(i) = i • A(x) = x • A(M1 + M2) = A(M1) + A(M2) • A(λx.M) = λx. A(M) • A(M1 M2) = A(M1) A(M2) • A(fix f.M) = fix f. A(M) The Second Asian Symposium on Programming Language and Systems

  26. Translation algorithm (2/2) • A(leaf M) = write(leaf);write(A(M)) • A(node M1 M2) = write(node);A(M1);A(M2) • A(case M of leaf x → M1 | node x1 x2 → M2) = (case A(M);read() of leaf → let x=read() in A(M1) | node → [()/x1,()/x2]A(M2)) The Second Asian Symposium on Programming Language and Systems

  27. input stream Correctness of the translation If φ | x : InTree├ M : OutTree (M, {x=V}) * W iff (A(M), <V>, ) * ( ( ), , <W>) output stream The Second Asian Symposium on Programming Language and Systems

  28. Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems

  29. Extensions • Introduce buffering of trees • Introduce primitives for buffered tree construction and destruction • Extend typing rules and translation algorithm for those primitives The Second Asian Symposium on Programming Language and Systems

  30. Source language M (terms) ::= ... | mleaf M (buffered leaf) | mnode M1 M2 (buffered branch) | (mcase M of (case analysis mleaf x → M1 for buffered tree) | mnode x1 x2 → M2) The Second Asian Symposium on Programming Language and Systems

  31. Target language e (terms) ::= ... | mleaf M (buffered leaf) | mnode M1 M2 (buffered branch) | (mcase M of (case analysis mleaf x → M1 for buffered tree) | mnode x1 x2 → M2) The Second Asian Symposium on Programming Language and Systems

  32. Types τ (types) ::= ... | MTree Type of buffered trees (kept in non-ordered environment) The Second Asian Symposium on Programming Language and Systems

  33. Example of type judgment Γ = f : MTree → OutTree Γ’ = g : InTree → OutTree Γ, x : MTree | φ├ node (f x) (f x) : OutTree Γ’ | x: InTree├ node (g x) (g x) : OutTree The Second Asian Symposium on Programming Language and Systems

  34. Γ|Δ1├M : MTree Γ,x:Int|Δ2├M1 :τ Γ, x1:MTree ,x2:MTree | Δ2├M2 :τ (T-MCASE) Γ|Δ1,Δ2├ mcase M of mleaf x → M1 |mnode x1 x2 → M2 :τ Typing rules The Second Asian Symposium on Programming Language and Systems

  35. Translation algorithm • A(mleaf M) = mleaf A(M) • A(mnode M1 M2) = mnode A(M1) A(M2) • A(mcase M of mleaf x → M1| mnode x1 x2 → M2) = (mcase A(M) of leaf x → A(M1) | node x1 x2 → A(M2)) The Second Asian Symposium on Programming Language and Systems

  36. Example of programs • fix s2m. λt. case t of leaf x → mleaf x | node x1 x2 → mnode (s2m x1) (s2m x2) • fix m2s. λt. mcase t of mleaf x → leaf x | mnode x1 x2 → node (m2s x1) (m2s x2) The Second Asian Symposium on Programming Language and Systems

  37. Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems

  38. Related work • Attribute grammar based approach [Nakano et al. 2004] • Translates tree-processing attribute grammars into stream-processing attribute grammars • Pros: need not be conscious which part of stream to be buffered • Cons: cannot specify evaluation order • Introduction of side-effect may be problematic The Second Asian Symposium on Programming Language and Systems

  39. Related work • Deforestation [Wadler 1988] • Remove intermediate trees from programs • Put syntactic restrictions (treelessness) on programs • Variables have to occur only once • Only variables can be passed to functions • Do not require order restrictions • Translated programs may not suitable for stream-processing The Second Asian Symposium on Programming Language and Systems

  40. Related work • Ordered linear logic [Polakow 2000] • Formalizes ordered linear logic • Type system for memory allocation and data layout [Petersen et al. 2003] • Uses ordered linear type to express spatial order on memory The Second Asian Symposium on Programming Language and Systems

  41. Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems

  42. Conclusion • Proposed a method of translating tree-processing programs to stream-processing ones utilizing ordered linear type The Second Asian Symposium on Programming Language and Systems

  43. Future work • Providing automatic insertion of buffering primitives • Extension to general XML documents • Dealing with rose trees • Considering closing tags The Second Asian Symposium on Programming Language and Systems

  44. Fin

More Related