Dynamic Document Generation with XMLambda in a Haskell-like Language

XMλ

Contents • What is the problem? • Hosoya’s approach • Shields’ approach • XMLambda and the UHConclusion

What is the problem? XML, a standard language of first-order, tree-like datatypes XML works well for describing static documents, but documents are typically dynamic, generated by a server Implementing a server for dynamic documents in conventional languages is hard: • no direct support for XML or scripting language syntax • no compile-time checks to ensure valid documents Can custom languages developed for XML be embedded as combinatory libraries within a Haskell-like language?

XML element Msg = ( ( (To|Bcc)* & From), Body) element To = String element Bcc = String element From = String element Body = P* element P = String <Msg> <To>jrommes@cs.uu.nl</To> <Bcc>doaitse@cs.uu.nl</Bcc> <From>joep@geevers.com</From> <Body> <P>Our presentation is finished!</P> </Body> </Msg>

XML element Msg = ( ( (To|Bcc)* & From), Body) element To = String element Bcc = String element From = String element Body = P* element P = String | : union * : sequence & : unordered tuple , : ordered tuple

What we are looking for: XML → Functional Program. document-type definition → type definitions Regular expression → type element → term Document validation → type checking

Possible solutions • Using a universal datatype Data Element = Atom String | Node String (List Element)

Data Element = Atom String | Node String (List Element) Node “Msg” [ Node “To” [Atom “jrommes@cs.uu.nl”], Node “Bcc [Atom “doaitse@cs.uu.nl”], Node “From” [Atom “joep@geevers.com”], Node “Body” [ Node “P” [Atom “Our...”] ] ] No validation possible

Possible solutions • Using a universal datatype • Using a newtype declarations Newtype Msg = Msg (List (Either To Bcc), From, Body ) Newtype From = From String Newtype To = To String Newtype Bcc = Bcc String Newtype Body = List P Newtype P = P String

Newtype Msg = Msg (List (Either To Bcc), From, Body Newtype From = From String Newtype To = To String Newtype Bcc = Bcc String Newtype Body = List P Newtype P = P String Msg ( [ Left ( To “jrommes@cs.uu.nl”), Right ( Bcc “doaitse@cs.uu.nl”), From “joep@geevers.com”, Body [ P “Our...” ] ) Sound, but not complete.

Possible solutions • Using a universal datatype • Using a newtype declarations • Using regular expression types as primitive Hosoya

Possible solutions • Using a universal datatype • Using a newtype declarations • Using regular expression types as primitive • Using Type-Indexed rows Shields

Hosoya’s approach

Why Regular Expression Types? • Static typechecking: generated XML documents conform to DTD • Or: invalid documents can never arise • For example: A <table> must have at least one <tr>

Why Regular Expression Patterns? • Convenient programming constructs for manipulating documents • For instance, jump over arbitrary length data and extract specific data: type Person = person[Name,Email*,Tel?] match p with person[Name ,Email+ ,Tel ] -> … …

XDuce: Values • Primitives represent XML documents (trees) • For example: person[name[“Joep”] ,email[“Joep@geevers.com”]] • I.e. a value is a sequence of nodes

XDuce: Regular Expression Types • Types correspond to document schemas • Familiar XML regular expressions: type Tel = tel[String] type Tels = Tel* type Recip = Bcc|Cc (Name, Tel*), Addr T? = T|() T+ = T,T*

Subtyping • Many algebraic laws: • Associativity of concatenation and union: A|(B|C) (A|B)|C • Commutativity of union: A|B  B|A • These laws are crucial for XML processing, but lead to complicated specification

Subtyping • Subtyping as set inclusion • First define which values belong to type • One type is a subtype of another if the former denotes a subset of the latter • For example: (Name*, Tel*) <: (Name|Tel)*

Pattern Matching: Exhaustiveness type Person = person[Name,Email*,Tel?] match p with person[Name,Email+,Tel?] -> … person[Name,Email*,Tel] -> … • Not exhaustive • Use subtyping to check: the input type must be a subtype of the union of the pattern types

Pattern Matching: Irredundancy match p with person[Name,Email*,Tel?] -> … person[Name,Email+,Tel] -> … • Second clause redundant • A clause is redundant iff all the input values that can be matched by the pattern can also be matched by preceding patterns

Pattern Matching: Type Inference type Name = name[String] match (ps as Person*) with person[name[val n as String] ,Email*,Tel?] ,rest -> … • Avoid excessive type annotations • Use input type and pattern to infer types of • bare variables (rest) • bound variables (n)

Functions • First-order functions (explicitly typed): fun f(P):T = e • For example: fun tels(val ps as Person*):Tel* = match ps with person[Name,Email*,tel[val t]],rest -> tel[t],tels(rest) person[Name,Email*],rest -> tels(rest)

Higher-order Functions • Functions as first-class citizen • Why desireable? • Abstraction • Not supported by XDuce • What is needed? • Subtyping for arrow types • So why not support higher-order functions?

Higher-order Functions • Function definitions given by fixed set G • G is used in T-APP (instead of standard rule) • Consequence: T-ABS fails • Fix: redefine T-APP • Type annotations needed for check of pattern match

Parametric Polymorphism • Generic typing using vars instead of actual types • Why desireable? • Abstraction from structure of problem • What is needed? • Type abstraction • Type application • So why no parametric polymorphism?

Parametric Polymorphism • Problems: forall X . (U|X) -> (T|X) • Pattern matching problems: • Exhaustiveness / irredundancy checks • Type inference • Typing constraints cannot be represented forall X {U,T}.(U|X) -> (T|X)

Conclusions • Typed language with XML docs as primitive values • Regular expression types are fundamental • Regular expression pattern matching • No higher-order functions • No parametric polymorphism

Shields’ approach “It is required that content models in element type declarations be deterministic” Consequence 1: regular expressions must be 1-unambiguous Unions and unordered tuples are formed from distinct members. ( ( To , Bcc ) & (Bcc, To) ) is 1-unambiguous ( (Bcc, To) & Bcc ) is not ( (To | Bcc) & Bcc ) is not

Shields’ approach “It is required that content models in element type declarations be deterministic” Consequence 2: possible to transform any XML element into a term: * sequence list , tuple tuple | union → type-indexed sum & unordered tuple → type-indexed product | and & are both formed from Type-Index Rows

Type-Indexed Rows A type-indexed row is a list of types Type constructors • Empty: Row • (_#_): Type → Row → Row For example: (Int # Bool # Empty)

Type-indexed product TIP: • (All _): Row → Type • Type-indexed coproduct TIC: • (One _): Row → Type

Insertion Constraints Insertion constraints used to guarantee distinctness of elements: a ins (Int # Bool # Empty) constrains a to be any other than Int or Bool (List b) ins (Int # Bool # Empty) Is True

Type-indexed product TIP: • Triv: All Empty • (_ && _): extension forall (a: Type) (b: Row) . a ins b => a → All b → All (a#b) • Type-indexed coproduct TIC: • (Inj _): injection forall (a: Type) (b: Row) . a ins b => a → One (a#b)

Let tuple = \(x && y && Triv) . (x, y)In tuple (True && 1 && Triv) Type checking: Unify All(x#y#Empty) and All(Int#Bool#Empty) Under constraint: x ins (y#Empty) Overall term has type (Int, Bool)or(Bool, Int) !

Equality constraints ( c # d # Empty ) eq ( Int # Bool # Empty ) Propagates until sufficient information is found to be simplified

Simplifying constraints • Simple unification: (a → Int) eq (Bool → b) a eq Bool, Int eq b • Row unification: (Int # a # Empty) eq (Bool # b # Empty) (Int eq b), (a # Empty) eq (Bool # Empty) • insertion: (a,b) ins (Bool # c # Empty) (a,b) ins (c # Empty)

Introducing fresh typenames • Monomorphic: newtype xCoord = Int All (xCoord # Int # Empty) • Polymorphic: newtype xCoord = \ (a:Type).a Allows same newtypes within a record !! Introduction opaque newtypes Type arguments are ignored in insertion constraints : newtype opaque xCoord = \(a:Type).a

XMLambda and UHConclusion • Why regular expression types (Hosoya)? • Fundamental regular expression types • Powerful pattern matching • No higher order functions and polymorphism • Subtyping and parametric polymorphism? • Why type indexed rows (Shields)? • Flexibility: more general than regular expression types • All nice characteristics of FP • Constraint system?

Dynamic Document Generation with XMLambda in a Haskell-like Language

Dynamic Document Generation with XMLambda in a Haskell-like Language

Presentation Transcript