440 likes | 442 Views
Translation of Tree-processing Programs into Stream-processing Programs based on Ordered Linear Type. Koichi Kodama (Tokyo Institute of Technology) Kohei Suenaga (University of Tokyo) Naoki Kobayashi (Tohoku University). Two main methods of XML processing. Tree-processing
E N D
Translation of Tree-processing Programs into Stream-processing Programs based on Ordered Linear Type Koichi Kodama (Tokyo Institute of Technology) Kohei Suenaga (University of Tokyo) Naoki Kobayashi (Tohoku University)
Two main methods of XML processing • Tree-processing • Views XML documents as tree structures • Manipulates tree structure of input data • Trees are kept on memory in many implementations • e.g., DOM API, XDuce, CDuce • Stream-processing • Views XML documents as stream of tokens • Read/Write tokens from/to streams • e.g., SAX The Second Asian Symposium on Programming Language and Systems
node node leaf leaf 4 3 leaf leaf 3 2 Tree-processing Programmers write here memory disk, network <node> <leaf>2</leaf> <leaf>3</leaf></node> <node> <leaf>3</leaf> <leaf>4</leaf></node> The Second Asian Symposium on Programming Language and Systems
Stream-processing memory Programmers write here disk, network <node> <leaf>2</leaf> <leaf>3</leaf></node> <node> <leaf>3</leaf> <leaf>4</leaf></node> The Second Asian Symposium on Programming Language and Systems
Example of programs in each style • Incrementing the value of each leaf • Tree-processing fix f. λt. case t of leaf x → leaf (x + 1) | node x1 x2 → node (f x1) (f x2) • Stream-processing fix f. λt. case read() of leaf → write(leaf); write(read() + 1) | node → write(node); f (); f () The Second Asian Symposium on Programming Language and Systems
Comparison of two styles The Second Asian Symposium on Programming Language and Systems
Our goal • Taking the best of both world • Readability and writability of tree-processing • High memory efficiency of stream-processing • Approach • Automatic translation of tree-processing programs into stream-processing programs The Second Asian Symposium on Programming Language and Systems
Key observation (1/2) • In order for stream-processing to be effective, a program should access an input tree fromleft to right, in the depth-first order. fix f. λt. case t of leaf x → leaf (x + 1) | node x1 x2 → node (f x2) (f x1) The Second Asian Symposium on Programming Language and Systems
Key observation (2/2) • If a program accesses an input tree in that order, then there isa simple, structure-preserving translation into an equivalent stream-processing program. fix f. λt.case t of leaf x → leaf (x + 1)| node x1 x2 → node (f x1) (f x2) fix f. λt.case read() of leaf → write(leaf); write(read() + 1) | node → write(node); f (); f () The Second Asian Symposium on Programming Language and Systems
Our solution • Use the idea of ordered linear types [Polakow 2000] • Guarantee “correct” access order by assigningordered linear types to input trees The Second Asian Symposium on Programming Language and Systems
Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems
Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems
Languages • Source language • Call-by-value λ-calculus + primitives for binary tree processing • Target language • Call-by-value λ-calculus + primitives for stream processing The Second Asian Symposium on Programming Language and Systems
Source language M (terms) ::= i (integer) | M1 + M2 (addition) | x (variable) | λx.M (abstraction) | M1 M2 (application) | fix f. M (recursive function)| leaf M (leaf with an integer) | node M1 M2 (branch) | (case M of (case analysis) leaf x → M1| node x1 x2 → M2) The Second Asian Symposium on Programming Language and Systems
Target language e (terms) ::= i (integer) | e1 + e2 (addition) | () (unit) | x (variable) |λx.e (abstraction) | e1 e2 (application) | fix f. e (recursive function)| leaf | node (tag) | read () | write e (stream manipulation) | (case e of (case analysis) leaf → e1 | node → e2) The Second Asian Symposium on Programming Language and Systems
leaf leaf 1 2 Representation of trees in each language node • Source language • node (leaf 1) (leaf 2) • Target language • node; leaf; 1; leaf; 2 We do not consider closing tags because we only focus on binary trees for now The Second Asian Symposium on Programming Language and Systems
Example of programs • Incrementing the value of each leaf • Source language fix f. λt. case t of leaf x → leaf (x + 1) | node x1 x2 → node (f x1) (f x2) • Target language fix f. λt. case read() of leaf → write(leaf); write(read() + 1) | node → write(node); f (); f () The Second Asian Symposium on Programming Language and Systems
Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems
Type system • Utilizes ordered linear type • Properties of well-typed programs: • access each node of the input tree exactly oncein left-to-right, depth-first order • do not construct trees on memory The Second Asian Symposium on Programming Language and Systems
Types τ (types) ::= Int (integer) | τ1→τ2 (function) | InTree (input tree) | OutTree (output tree) The Second Asian Symposium on Programming Language and Systems
Type judgment • (Non-ordered) type environment: Γ • { x1:τ1, x2:τ2, ..., xn:τn } (τi ∈ {InTree, OutTree} ) • Set of bindings • Ordered linear type environment: Δ • x1:InTree, x2:InTree, ..., xn:InTree • Sequence of bindings Γ | Δ├ M :τ Represents that each of x1, x2, ..., xn is accessed exactly once in the order The Second Asian Symposium on Programming Language and Systems
Example of type judgment Γ = f : InTree → OutTree Δ = x1 : InTree, x2 : InTree Γ | Δ├ node (f x1) (f x2) : OutTreeΓ | Δ├ node (f x2) (f x1) : OutTreeΓ | Δ├ node (f x1) (f x1) : OutTree The Second Asian Symposium on Programming Language and Systems
Typing rules Γ|Δ1├M1 :OutTree Γ|Δ2├M2 :OutTree (T-NODE) Γ|Δ1,Δ2├ node M1 M2 :OutTree Γ|Δ1├M :InTree Γ,x:Int|Δ2├M1 :τ Γ|x1:InTree, x2:InTree, Δ2├M2 :τ (T-CASE) Γ|Δ1,Δ2├ case M of leaf x → M1 |node x1 x2 → M2 :τ The Second Asian Symposium on Programming Language and Systems
Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems
Translation algorithm (1/2) • A(i) = i • A(x) = x • A(M1 + M2) = A(M1) + A(M2) • A(λx.M) = λx. A(M) • A(M1 M2) = A(M1) A(M2) • A(fix f.M) = fix f. A(M) The Second Asian Symposium on Programming Language and Systems
Translation algorithm (2/2) • A(leaf M) = write(leaf);write(A(M)) • A(node M1 M2) = write(node);A(M1);A(M2) • A(case M of leaf x → M1 | node x1 x2 → M2) = (case A(M);read() of leaf → let x=read() in A(M1) | node → [()/x1,()/x2]A(M2)) The Second Asian Symposium on Programming Language and Systems
input stream Correctness of the translation If φ | x : InTree├ M : OutTree (M, {x=V}) * W iff (A(M), <V>, ) * ( ( ), , <W>) output stream The Second Asian Symposium on Programming Language and Systems
Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems
Extensions • Introduce buffering of trees • Introduce primitives for buffered tree construction and destruction • Extend typing rules and translation algorithm for those primitives The Second Asian Symposium on Programming Language and Systems
Source language M (terms) ::= ... | mleaf M (buffered leaf) | mnode M1 M2 (buffered branch) | (mcase M of (case analysis mleaf x → M1 for buffered tree) | mnode x1 x2 → M2) The Second Asian Symposium on Programming Language and Systems
Target language e (terms) ::= ... | mleaf M (buffered leaf) | mnode M1 M2 (buffered branch) | (mcase M of (case analysis mleaf x → M1 for buffered tree) | mnode x1 x2 → M2) The Second Asian Symposium on Programming Language and Systems
Types τ (types) ::= ... | MTree Type of buffered trees (kept in non-ordered environment) The Second Asian Symposium on Programming Language and Systems
Example of type judgment Γ = f : MTree → OutTree Γ’ = g : InTree → OutTree Γ, x : MTree | φ├ node (f x) (f x) : OutTree Γ’ | x: InTree├ node (g x) (g x) : OutTree The Second Asian Symposium on Programming Language and Systems
Γ|Δ1├M : MTree Γ,x:Int|Δ2├M1 :τ Γ, x1:MTree ,x2:MTree | Δ2├M2 :τ (T-MCASE) Γ|Δ1,Δ2├ mcase M of mleaf x → M1 |mnode x1 x2 → M2 :τ Typing rules The Second Asian Symposium on Programming Language and Systems
Translation algorithm • A(mleaf M) = mleaf A(M) • A(mnode M1 M2) = mnode A(M1) A(M2) • A(mcase M of mleaf x → M1| mnode x1 x2 → M2) = (mcase A(M) of leaf x → A(M1) | node x1 x2 → A(M2)) The Second Asian Symposium on Programming Language and Systems
Example of programs • fix s2m. λt. case t of leaf x → mleaf x | node x1 x2 → mnode (s2m x1) (s2m x2) • fix m2s. λt. mcase t of mleaf x → leaf x | mnode x1 x2 → node (m2s x1) (m2s x2) The Second Asian Symposium on Programming Language and Systems
Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems
Related work • Attribute grammar based approach [Nakano et al. 2004] • Translates tree-processing attribute grammars into stream-processing attribute grammars • Pros: need not be conscious which part of stream to be buffered • Cons: cannot specify evaluation order • Introduction of side-effect may be problematic The Second Asian Symposium on Programming Language and Systems
Related work • Deforestation [Wadler 1988] • Remove intermediate trees from programs • Put syntactic restrictions (treelessness) on programs • Variables have to occur only once • Only variables can be passed to functions • Do not require order restrictions • Translated programs may not suitable for stream-processing The Second Asian Symposium on Programming Language and Systems
Related work • Ordered linear logic [Polakow 2000] • Formalizes ordered linear logic • Type system for memory allocation and data layout [Petersen et al. 2003] • Uses ordered linear type to express spatial order on memory The Second Asian Symposium on Programming Language and Systems
Outline • Syntax of languages • Type system • Translation algorithm • Extension • Related work • Conclusion The Second Asian Symposium on Programming Language and Systems
Conclusion • Proposed a method of translating tree-processing programs to stream-processing ones utilizing ordered linear type The Second Asian Symposium on Programming Language and Systems
Future work • Providing automatic insertion of buffering primitives • Extension to general XML documents • Dealing with rose trees • Considering closing tags The Second Asian Symposium on Programming Language and Systems