The Complexity of Tree Transducer Output Languages

The Complexity of Tree Transducer Output Languages ToPS 2009/02/20 Kazuhiro Inaba@ U-TokyoJoint work with Sebastian Maneth@ NICTA&UNSW

“Complexity of Output Languages” • Given… • A tree languageL ⊆ TΣ • A (nondeterministic) tree-to-tree translationτ ⊆ TΣ×TΔ • How complex is the languageτ(L)⊆ TΔ?(i.e., for a tree t ∈ TΔ, how is it computationally hard to determine whether t ∈τ(L)or not ?)

Classic Results • τ: Program of the Turing-Machine  Undecidable • L : Regular String Language • τ: Nondet. Finite State Transduction  τ(L)is regular! •  The membership of τ(L)issolved in O(n) time, O(1) space • Corollary: for τ∈ Finitely Many Compositionsof Nondet-FST, τ(L) is regular b/bb b/b a/ a/a

Today’s Topic • L : Regular Tree Language • τ: Composition of NondeterministicMacro Tree Transducers •  the membership problem ofτ(L) is NP-complete and in DSPACE(n). • Macro Tree Transducers? • A very powerful translation model containing most known classes of string/tree translations! • Yet still has many decidable properties

Outline • What is/Why Macro Tree Transducers? • Review on Previous Work • Output Language of Deterministic MTTs • The Proof • “Translation Membership” Problem • “Garbage-Free Form”

Brief Introduction toMacro Tree Transducers (MTTs)

RHS ::= F( RHS, … , RHS ) | q(xi, RHS, …, RHS) | yi MTT start( A(x1) ) → double( x1, double(x1, E) ) double( A(x1), y1) → double( x1, double(x1, y1) ) double( B, y1 ) → F( y1, y1) double( B, y1 ) → G( y1, y1) • An MTT M = (Q, start, Σ, Δ, R) is a set of first-order functions of type Tree(Σ) × Tree(Δ)k Tree(Δ) • Defined by mutual induction on the1st parameter (the input tree) • Application in the right-hand side is restricted only to the direct children of the current node • Can take output tree fragments via other parameters, but not allowed to inspect or decompose them Nondeterminism!

Evaluation Strategy • Two Evaluation Strategies(analogous to the λ-calculus…) • Call-by-Value (IO : Inside-Out) • Call-by-Name (OI : Outside-In)

Example (IO / call-by-value) double( A(x1), y1) → double( x1, double(x2, y1) ) double( B, y1 ) → F( y1, y1) double( B, y1 ) → G( y1, y1) start(A(B))  double( B, double(B, E) )  double( B, F(E, E) ) F( F(E,E), F(E,E) ) or G( F(E,E), F(E,E) ) or  double( B, G(E, E) )  F( G(E, E), G(E, E) ) or  G( G(E, E), G(E, E) )

Example (OI / call-by-name) double( A(x1), y1) → double( x1, double(x2, y1) ) double( B, y1 ) → F( y1, y1) double( B, y1 ) → G( y1, y1) start(A(B))  double( B, double(B, E) ) F( double(B, E), double(B, E) ) F( F(E,E), double(B, E) ) F( F(E,E), F(E,E) ) F( F(E,E), G(E,E) ) !! F( G(E,E), double(B, E) ) F( G(E,E), F(E,E) ) !! F( G(E,E), G(E,E) ) G( double(B, E), double(B, E) ) G( F(E,E), double(B, E) ) G( F(E,E), F(E,E) ) G( F(E,E), G(E,E) ) !! G( G(E,E), double(B, E) ) G( G(E,E), F(E,E) ) !! G( G(E,E), G(E,E) )

Evaluation Strategy • Two Strategies(analogous to the λ-calculus…) • Call-by-Value (IO : Inside-Out) • Call-by-Name (OI : Outside-In) • Today, we consider OI evaluation only. • Why? • MTTIO* = MTTOI* (even though MTTIO≠ MTTOI) • MTTOI has better compositionality (as shown later)

Importance of the class MTT* • MTT* subsumes almost allwell-studied tree transducer models • Equals to • (Attributed Tree Transducer)* • (Pebble Tree Transducer)* • (Higher-Order Tree Transducer)* • Properly Contains • (MSO Definable Translation)* • (Top-Down Tree Transducer)* • (Bottom-Up Tree Transducer)*

Importance of the class MTT* • MTT*(REGT) subsumes the“IO/OI-hierarchy” • Context-Free String/Tree Language • Indexed Language [Aho 1968] • Macro Language [Fischer 1968] • CFG with each nonterminals are parameterized by other nonterminals • Level-n Language [Damm 1982] • ＝ Higher-order Macro Language

[Our Work]⊆ DSPACE(n)NP-complete OI-hierarchy [Damm82] [Kuroda64]= NSPACE(n) MTT*(REGT) Context-Sensitive Language Level-n OI Language Level-3 OI Language Level-2 OI Language Level-1 OI Language ＝ Macro Language ≒ CF Tree Language CFG where nonterminals are parameterized by other nonterminals [Aho68] ⊆NSPACE(n) [Rounds73] NP-complete Context-Free Language [CYK, Earley, …] ⊆PTIME

Review on Previous Work (= the Basis of Our Approach)

Related Work • T*(REGT) is in DSPACE(n) [Baker1978] • T : the class of translations realizable by MTTs with no accumulating params. • DMTT*(REGT) is in DSPACE(n) [Maneth2002] • DMTT : that of deterministic MTTs

DSPACE(n) Membership for DMTT*(REGT)≪Problem Statement≫ • Given • A regular tree language L and • A composition sequence τ1; … ; τnof deterministic MTTs • and a tree t, • How can we testt ∈ (τ1 ; … ; τn)( L )in linear space with respect to |t|?

DSPACE(n) Membership for DMTT*(REGT)≪Approach: Generate & Test ≫ • Guess the input s ∈ L • Calculate (τ1 ; … ; τn )(s) • If (τ1 ; … ; τn )(s) = t, then t is in the output language! • Otherwise, try another input tree s Is this a possible output from τ1 ; … ; τn ? Guess an input: s Compute s1:=τ1(s) Compute s2:=τ2(s) Compute sn-1 Compute sn τ1 τ2 τn t s s1 s2 Sn-1 sn =?

DSPACE(n) Membership for DMTT*(REGT)≪Approach: Generate & Test ≫ • In order to carry out the algorithm in DSPACE(|t|) … • The sizes |s|, |s1|, |s2|, …, |sn| must be linearly bounded by |t| • i.e., there must be a constant c independent from tsuch that |s| ≦ c|t| • Each step τiof the computation must be done in linear space The translation must have ‘no garbage’! τ1 τ2 τn t s s1 s2 Sn-1 sn =?

DSPACE(n) Membership for DMTT*(REGT)≪Approach: Generate & Test≫ • Lemma 1: Linear Time Computation (Folklore) • For a tree s and a deterministic MTT τ,we can computeτ(s) in time O( |s| + |τ(s)| )if: • τ is linear, or • τ is a top-down tree transducer • Lemma 2: DMTT = DT ; DLMTT [EV85] • Corollary: DMTT* = (DT ∪ DLMTT)*

DSPACE(n) Membership for DMTT*(REGT)≪Approach: Generate & Test≫ • Lemma 3: “Garbage-Free Form” • For anyL∈REGT and τ1; …;τn∈DMTT*,there existsL’∈REGT and ρ1, …, ρ2n∈(T∪DLMTT)*such that(τ1 ; … ; τn)( L ) ＝ (ρ1 ; … ; ρ2n)( L’)and every ρi is “non-deleting”, i.e, ∀s: |s| ≦ 2|ρi(s)|

Main Result:DSPACE(n) Membership of MTT*(REGT)

? ? ? ∈ ∈ ∈ ≪Approach: Generate & Test≫ • Guess the input s ∈ Land all the intermediate trees s1, …, sn-1 • Check whether(s,s1)∈τ1, (s1,s2)∈τ2, …, (sn-1, t) ∈τn • If it is, then t is in the output language! • Otherwise, try another s, s2, …, sn-1 τ1 τ2 τn s s1 s2 Sn-1 t

As you might expect… • Two Lemmas are needed • Nondeterminstic versionof the ”Garbage-Free” Form • Linear space “Translation Membership”for a single (T ∪ LMTT) translation • We already have MTT* = (T ∪ LMTT)* [EV85]

Linear Space“Translation Membership” Actually, we’ve shown the lemma for any “path-linear” MTT. Linear: f(x1, g(x2), h(x3)) Path-linear(but not linear): f(x1, g(x2), h(x2)) Not path-linear: f(x1, g(x1), h(x2)) • Lemma 4: Let τ∈ (T ∪ LMTT). For trees s and t, we can check whether t∈τ(s) or not within O(|s|+|t|) space. • Proof: Basically, try all nondeterministic computation by backtracking. (Some clever stack-sharing is required)

“Garbage-Free Form” • Lemma 5: • For any L∈REGT and τ1; …;τn∈MTT*,there existsL’∈REGT and ρ1, …, ρ2n∈(PL-MTT)*such that(τ1 ; … ; τn)( L ) ＝ (ρ1 ; … ; ρ2n)( L’)and every ρi is “non-deleting”, i.e,∀t∈out(ρ):∃s1,..,s2n:si+1∈ρi(si), t∈ρ2n(s2n), |si|≦ 2|si+1|, |s2n|≦2|t|

Proof Sketch ofthe Garbage-Free Form Decompose τ2 to ‘deleting part’ D and ‘nondeleting’ τ’2 • “Factor out” the deletion • τ1 ; τ2== τ1 ; (D ; ρ2)== (τ1 ; D) ; ρ2== τ’1 ; ρ2 Associativity Compose τ1 with D (Right-Compositionality of OI MTTs)

Three Types of Deletion • “Input-Deletion” • E.g., f( A(x1, x2) )  B( f(x1) ) • Discarding the “x2” subtree! • “Skipping” • E.g., f( A(x1) )  f(x1) • No output is generated at the unary node A. • “Erasure” • E.g., f( L, y1 )  y1 • (Mainly at leaf nodes) No new output symbol is generated at the node.

Three Types of Deletion • If there is no input-deletion, skipping, and erasure during the computation,|in| ≦ 2|out| • Intuition: • No input-deletion  visits all nodes • No skipping  outputs at least one node for each input unary node • No erasure  outputs at least one node for each input leaf node • For any tree, 2*(#unary + #leaf) ≧ #nodes(cf., the number of matches in a knockout tournament)

How to eliminate“erasing” rules • Look-ahead + Inline-Expansion • Decompose τ into τ = E ; τne • E : Bottom-up translation that annotates each input tree with • τne: τ without erasing rules

A A B B B C C B C C C C How to eliminate“erasing” rules : τ = E ; τne • Example τ: f( C, y1 ) → y1 g( B(x1,x2) ) → X( f(x1, Y) ) Applying f to the 1st child invokes erasure… E Applying f to the 1st child invokes erasure…

A A B B B C C B C C C C How to eliminate“erasing” rules : τ = E ; τne τne: f( C, y1 ) → y1 g( B (x1,x2) ) → X( Y) Applying f to the 1st child invokes erasure… Applying f to the 1st child invokes erasure… E Applying f to the 1st child invokes erasure…

A A B B B C C B C C C C How to eliminate“erasing” rules : τ = E ; τne • More complex case τ: f( C, y1 ) → y1 h( B(x1,x2), y1 ) → f(x1, y1) h(1st)  erasure f(1st)  erasureh(2nd) erasure E f(1st)  erasure

Problem! • “Inline-Expansion” is not always possible for nondeterministic MTTs τ: f( C, y1, y2 )  y1 f( C, y1, y2 )  y2 g( B (x1,x2) )  h(x1, f(x1, Y, Z) ) h( C, y1 )  X( y1, y1 ) er… τne: g( B (x1,x2) )  h(x1, Y ) g( B (x1,x2) )  h(x1, Z ) h( C, y1 )  X( y1, y1 ) DifferentTranslation! er… er…

Solution! • Extend MTTs with “inline-nondeterminism” τ: f( C, y1, y2 )  y1 f( C, y1, y2 )  y2 g( B (x1,x2) )  h(x1, f(x1, Y, Z) ) h( C, y1 )  X( y1, y1 ) er… SameTranslation! τne: g( B (x1,x2) )  h(x1, +(Y,Z) ) h( C, y1 )  X( y1, y1 ) er…

Solution:“MTT with Choice and Failure” • MTT + inline-nondeterminism + inline-partiality • Same expressiveness: MTT = MTTCF • Much more flexible syntax • Allows inline-expansion for free! RHS ::= F( RHS, … , RHS ) | q(xi, RHS, …, RHS) | yi | +(RHS, RHS) | θ

Note on Path-linearity • Inline-Expansion • …does not preserve linearity. • …does preserve path-linearity. This is the reason for using PL-MTTs. τ: f( C, y1 ) X( y1, y1 ) g( B(x1,x2) )  f(x1, h(x2, Y) ) τne: g( B(x1,x2) ) X( h(x2, Y), h(x2, Y) )

A B C B C C How to eliminate“Input-Deletion”: τne = I ; τnei • Exhaustively trying all deletion by using nondeterminism A1 A0 or or A1 or B00 B10 A0 C I B01 or… B11 C C

How to eliminate“Input-Deletion”: τne = I ; τnei • Example τne: f( B(x1,x2) )  g(x1, g(x1, Y) ) f( B(x1,x2) )  g(x1, g(x2, Y) ) f( B(x1,x2) )  g(x2, g(x1, Y) ) τnei: f( B10(x1) )  g(x1, g(x1, Y) ) f( B10(x1) )  g(x1, θ ) f( B10(x1) )  g( x2, g(x1, Y) ) …

A A B BA B C CBB CB How to eliminate“Skipping”: τnei = S ; τneis • Same as for the “input-deletion” • Try all possible deletion nondeterminisitically… or or... S or CABB

Proof Sketch ofthe Garbage-Free Form Decompose τ2 to ‘deleting part’ E;I;S and ‘nondeleting’ τ’2 • “Factor out” the deletion • τ1 ; τ2== τ1 ; (E;I;S ; ρ2)== (τ1 ; E;I;S) ; ρ2== τ’1 ; ρ2 Associativity Compose τ1 with D (Right-Compositionality of OI MTTs with linear top-down transducers)

Summary • Membership problem of MTT*(REGT) is in DSPACE(n) • Note: Almost the same proof shows that it is in NP • Note: NP-hardness was already known [Rounds73] • Corollary: OI-Hierarchy, PTT*(REGT), ATT*(REGT), etc is in DSPACE(n) and NP-complete • Corollary: MTT*(＊), MTT*(CFTG), etcis in DSPACE(n) and NP-complete

Summary • Key Lemma: “Garbage-Free Form” • Any composition sequence of MTTscan be transformed to the one withno deletion τ1 τ1 τ2 τ2 τk τk s1 s1 s2 s2 t t s s Sk-1 Sk-1

A B C B C C Future Work • “Strong” Garbage-Free Form • ↓ This is tooooo wasteful A1 A0 or or A1 or B00 B10 A0 C I B01 or… B11 C C

Future Work • “Strong” Garbage-Free Form • (τ1 ; … ; τn)( L ) = (ρ1 ; … ; ρ2n)( L’)and range(ρi-1) ⊆ domain(ρi) • Application to … • Finiteness Problem of the out(MTT*) • Nondeterministic Execution of MTT*

The Complexity of Tree Transducer Output Languages