- 52 Views
- Uploaded on
- Presentation posted in: General

EXCHANGING INTENSIONAL XML DATA

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

EXCHANGING INTENSIONAL XML DATA

Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ;

Bernd AmannCedric-CNAM ; Omar Benjelloun INRIA ;

Fred Dang NgocINRIA

H. GÜL ÇALIKLI 2002700743

MURAT KORAŞ 2002700797

- Emergence of Web Services as standard means of publishing and accessing data on the web introduced a new class of XML documents called “intensional documents”.
- Intensional Documents:XMLdocuments where;
- some of the documents are defined explicitly
- some are defined by programs that generate data.

- materialisation: the process of evaluating some of the programs included in an XML document and replacing them by their results.
- GOAL of this PAPER:
- Study the new issues raised by the exchange of intensional XML document btw. Applications
- Decide on which data should be materialised before it is sent and which should not

CONSIDERATIONS for MATERIALISATION

- Performance:
- current system load
- cost of communication

- Capabilities:
- unability to handle intensional parts of a document
- lack of access rights (to a particular service)

- Security:
- invoking service calls from an untrusted party may cause severe security violations

- Functionalities:
- confidentiality reasons
- calling services may involve fees to be paid.

Sender

Receiver

capabilities

ACL

cost

...

capabilities

ACL

cost

...

g

g

r

g

f

q

r

g

g

r

q

Data exchange scenario for intensional documents

g

Data Exchange Schema

q

f

f

q

g

r

q

...

...

...

...

...

- SIMPLE INTENSIONAL XML:
- Model intentional XML documents as Labelled Trees consisting of two types of nodes:
- Data nodes
- Function Nodes correspond to “Service Calls”

- Assume the existance of someDisjoint Domains:
- N :domain of NODES
- L :domain of LABELS
- F : domain of FUNCTION NAMES
- D : domain of DATA VALUES

- Model intentional XML documents as Labelled Trees consisting of two types of nodes:

- SIMPLE INTENSIONAL XML (cont’d)
- DEFINITION 1: An intensional documentdis an expression (T,λ) where:
- T=(N,E,<) is an ordered tree.
- N N: finite set of nodes
- E N X N : edges
- < : associates with each node in N a total order on its children.

- λ :N L U F U D is a labeling function for the nodes.
NOTE: only leaf nodes may be assigned data values from D

- T=(N,E,<) is an ordered tree.

- DEFINITION 1: An intensional documentdis an expression (T,λ) where:

- SIMPLE INTENSIONAL XML (cont’d)
- Nodes with a label in L U D are called Data Nodes.
- Nodes with a label in F are called Function Nodes.
- The children subtrees of a function node are the Function Parameters
- When the function is called;
- These subtrees are passed to it
- The return value replaces the function node in the document.

THE MODEL and THE PROBLEM

newspaper

Get_Temp

TimeOut

title

temp

date

city

“Exhibits”

“The Sun”

“16 ºC”

“04/10/2002”

“Paris”

THE MODEL and THE PROBLEM

- SIMPLE SCHEMA:
- DEFINITION 2: A document schema s is anexpression (L,F,τ) where,
- L L :finite set of labels
- F F :finite set of function names
- τ :function that maps:
- Each label name l Є L to a regular expression over L U F or to the keyword data
- Each function name f Є F to a pair of expressions called
- τin(f ) input type of f
- τout(f ) output type of f

- DEFINITION 2: A document schema s is anexpression (L,F,τ) where,

THE MODEL and THE PROBLEM

- SIMPLE SCHEMA (cont’d)
- Example of a Schema:
- data:
- τ (newspaper) =title.date.(Get_Temp|temp)
.(TimeOut|exhibit)

- τ (title) = data
- τ (date) = data
- τ (temp) = data
- τ (city) = data
- τ (exhibit) = data

- Example of a Schema:

THE MODEL and THE PROBLEM

- SIMPLE SCHEMA (cont’d)
- Example of a Schema (cont’d):
- functions:
- τin (Get_Temp)= city
- τout (Get_Temp)= temp
- τin (TimeOut)= data
- τout (Timeout)= (exhibit|performance)
- τin (Get_Date)= title
- τin (Get_Date)= date

- Example of a Schema (cont’d):

- SIMPLE SCHEMA (cont’d):
- DEFINITION 3: An intensional document t is instance of a schema s=(L,F,τ) if for each:
- Data NodenЄ t with label lЄ L, the labels of n’s children form a word in lang(τ(l ))
- Same is valid for Function Node.

- DEFINITION 3: An intensional document t is instance of a schema s=(L,F,τ) if for each:

Used to denode the regular language defined by τ (l )

- SIMPLE SCHEMA (cont’d):
- DEFINITION 3 (cont’d):
f : a function name

t1,......,tn : a sequence of intensional trees

IFthe labels of n’s children form a word in lang(τin(f)) (lang(τout(f)) )

AND

all the trees are instances of s.

THEN

t1,......,tnis an input instance of f (output instance)

- DEFINITION 3 (cont’d):

every subtree conforms to the same schema as the whole document

- SIMPLE SCHEMA (cont’d):
- DEFINITION 4: (about Rewritings)
- t,t’: trees
- IFt’ is obtained from t by;
- selecting a function node v in t with some label fand
- replacing it by an arbitrary output instance of f

- THENwe say thatt t’

- DEFINITION 4: (about Rewritings)

v

- SIMPLE SCHEMA (cont’d):
- DEFINITION 4: (about Rewritings) (cont’d)
- IFt t1 t2 ------ tn THEN
- we say that t tn
- nodes v1,........, vn are called rewriting sequence
- the set of all trees t’ such that t t’ is denoted ext(t).

v1

v2

vn

t rewrites into tn

*

*

- SIMPLE SCHEMA (cont’d):
- DEFINITION 5: (about Rewritings)
- Let:
- t be a tree
- s be a schema

- 1. IF ext(t) contains some instance of s THEN
t possibly rewrites into s.

- 2. IFeither t is already an instance of s
orthere exists some node vin t such that all trees t’ where t t’ safely rewrite into s

THEN we say that t safely rewrites into s

v

- SIMPLE SCHEMA (cont’d):
- DEFINITION 6:
- Let:
- s be a schema
- r is a distinguished label called root label

- IF all the instances t of s with root label r rewrite safely into instances of s’
THENwe say that:

s safely rewritesinto s’

- A Richer Data Model :
Function Patterns:

- The schemas we have seen so far specify that a particular function, identified by its name, may appear in the document.
- But sometimes, one does not know in advance which functions will be used at a given place.
- A common intensional schema for such documents should not require the use of a particular function, but rather allow for a set of functions, which have a proper signature.

- to specify such set of functions we useFunction Patterns
- Function Patterns:A function belongs to the pattern if its name satisfies theboolean predicateand itssignatureis the same as the required one
- EX:
- τname(Forecast)= UDDIF InACL
- τin(Forecast)= city
- τout(Forecast)= temp

V

- A Richer Data Model (cont’d):
- Restricted Service Invocations:
- We assumed so far that all the functions appearing in a document may be invoked in a rewriting, in order to match a given schema.
- This is not always the case, for the reasons like;
- security,
- cost,
- access rights , etc.

- THUS, function names/patterns in the schema can be partitioned into two disjoint groups of invocable and noninvocable ones.
- A legal rewriting is then one that invokes only invocable functions.

- Rewriting Process:
1.Safe Writing:

- check if t safely rewrites to s
- if so, find a rewriting sequence.
- rewriting sequence a sequence of functions that need to be invoked to transformtinto the required structure
- preferred required structure shortest/ cheapest one

- check if t safely rewrites to s

- Rewriting Process(cont’d):
2.Possible Writing :

- IF a safe rewriting does not exist
- check whether at least t may rewrite to s.
- IF it is acceptable to do so (the sender accepts that the rewriting may fail),
- try to find a successful rewriting sequence if one exists
- preferred rewriting sequence one with the least cost.

- IF a safe rewriting does not exist

- Rewriting Process(cont’d):
3.Mixed Approached:

In mixed approach, one could

- first invoke some function calls
- then attempt from there to find safe rewritings.

- Rewriting Process(cont’d):
- DEFINITION 7:
- For a rewriting sequencetv:t1 ..tn ,
- IFV j ЄtibutV jЄti-1 .
- THEN we say thatfunction nodeVjdepends on afunction nodeV i.
- IF the dependency graph among the nodes contains no paths of length greater than k.
- THEN we say that a rewriting sequence is ofdepth k

v1

vn

RESTRICTION:

“Consider onlyk-depth left-to-right rewritings.“

- Algorithm for k-depth left to right safe rewriting
- Algorithm is decomposed into three parts:
- 1.Rewriting Function Parameters:
- to invoke a function
- its parameters should be of right type

- if not
- they should be rewritten to fit that type.

- when rewriting the parameters;
- the functions in them can be invoked
ONLY IF their own parameters can be rewritten into (i.e. are the expected input type.)

- the functions in them can be invoked

- to invoke a function

- 1.Rewriting Function Parameters:

- Algorithm is decomposed into three parts (cont’d)
- 1.Rewriting Function Parameters (cont’d)
- For deepest functions
- Verify that their parameters are instances of the corresponding input types.
- If notrewriting fails.

- Move upward ( do till all functions in the tree(forest) are done)
- Try to safely rewrite f ’s own parameters into the required structure.
- If notrewriting fails.

- For deepest functions

- 1.Rewriting Function Parameters (cont’d)

- Algorithm is decomposed into three parts (cont’d)
- 2.Top Down Traversal:
- In each iteration of the recursive procedure “Rewriting Function Parameters”,the parameters of the outmost functions of tree (forest) are handled.
- In this part safely rewrite the tree (forest)by invoking only these outmost functions.
- THUS:
- traverse the tree (forest) top down
- At each step treat a single node and its children.

- 2.Top Down Traversal:

- Algorithm is decomposed into three parts (cont’d)
- 2.Top Down Traversal (cont’d)
- node n with children whose labels form a word w
- The subtree rooted at node n can be rewritten into the target schema s=(L,F,τ)IF and ONLY IF:
- 1. wcan be safely rewritten into a word in lang(τ(label(n)))
AND

- 2. each of n’s children can be safely rewritten into an instance of s.

- 1. wcan be safely rewritten into a word in lang(τ(label(n)))

- 2.Top Down Traversal (cont’d)

- Algorithm is decomposed into three parts (cont’d)
- 3.Rewriting the children of a node n:
- Given:
- wword (sequence of labels of n’s children)

- Goal:
- rewrite w so that it becomes a word in the regular language R=τ(label(n))

- The process of rewriting involves:
- choosing some functions in wand replacing them by a possible output
- then choosing some other functions (which might have been returned by previous calls) and replacing them by their output
- and so on up to the depth k

- Safe Rewriting Algorithm:
- Given:
- word w
- the output types Rf1,.....,Rfnof the available functions
- target regular language R

- Purpose of the algorithm:
- to test ifwcan be safely rewritten into a word in R
- if so, to find a safe rewriting sequence

- Given:

- Safe Rewriting Algorithm:
- Note:For illustration purposes we use the newspaper document
- w=title.date.Get_Temp.TimeOut word children labels form
- R=title.date.temp (TimeOut|exhibit*)safe rewriting of the above word into the word in R

- The Algorithm:
- 1) Build the finite state automata for the following regular languages
- 1.1) An AutomatonAwaccepting was a single word.

- The Algorithm (cont’d)
- 1.2) Build automata Afi ,i=1,...,n each accepting the regular language Rfi
- 1.3) Build an automaton A accepting the complement of the regular language R . The automaton should be deterministic and complete.

- The complement automation A for schema τ’(newspaper)=title.temp(TimeOut|exhibit*)

*

*

*

*

p0

title

p1

date

temp

p3

TimeOut

p4

p6

p3

*

exhibit

*

p5

exhibit

- The Algorithm (cont’d)
- 2)Let Aw := Aw
- 3) For j=1,...,k
- Consider all the edgese=(v,u) in Awthat are labelled by the function name fi and not iterated in previous iterations
- 3.1) extend Aw by attaching a copy of the automaton Afi with its inital and final states linked to v and u respectively by εmoves.
- 3.2) denote v as a fork node (for the edge e)
- 3.3) two fork options of v aree itself and the new outgoing ε edge

k

k

k

Get_Temp

q2

title

date

q0

q1

q3

TimeOut

q4

ε

ε

ε

ε

temp

q5

q6

q7

exhibit

performance

1

- 1 depth automaton Aw for the word w=title.date.Get_Temp.TimeOut

Represents choice of not invoking the function

Fork node

Fork node

Represents choice of invoking the function

- The Algorithm (cont’d)
- 4) Construct the cartesian product automaton
AX=Aw X A

- The fork nodes and fork options in AX reflect those of Aw :
- 4.1) the fork nodes [q p] Є AX nodes where q was a fork node in Aw
- 4.2) a fork option in AX consists of all edges originating from one fork option edge in Aw.

k

k

k

k

- The cartesian product automaton Ax = Aw x A

exhibit

q4,p6

q5,p5

Performance

q7,p5

ε

ε

ε

Performance

Exhibit

exhibit

TimeOut

Perform.

ε

ε

q7,p6

q3,p6

q7,p6

q7,p3

q4,p3

ε

Get_Temp

title

date

TimeOut

q0,p0

q1,p1

q2,p2

q4,p4

q3,p3

ε

ε

temp

q5,p2

q6,p3

Figure6:

- The Algorithm (cont’d):
- 5) Mark nodes in AX:
- 5.1) mark states that are accepting states in both Aw and A
- 5.2) iteratively mark;
- nonfork (regular) nodes: IF one of their outgoing edges points to a marked node
- fork nodes: IF both of their fork options (for some fi ) contain an edge that points to a marked node.

k

- The cartesian product automaton Ax = Aw x A

exhibit

q4,p6

q5,p5

Performance

q7,p5

ε

ε

ε

Performance

Exhibit

exhibit

TimeOut

Perform.

ε

ε

q7,p6

q3,p6

q7,p6

q7,p3

q4,p3

ε

Get_Temp

title

date

TimeOut

q0,p0

q1,p1

q2,p2

q4,p4

q3,p3

ε

ε

temp

q5,p2

q6,p3

Figure6:

- The Algorithm (cont’d):
- 6)Try to obtain a SAFE REWRITING.
- “A safe rewriting exists IFF the initial state is not marked”
- 6.1) Follow a non-marked path(corresponding tow ) starting from the initial state ofAx to a state [q p] where q is an accepting stateofAw
- 6.1.1) non-marked fork options on the path determine the rewriring choices (i.e. which functions to call)
- 6.1.2)when a function is invoked, we cont,nue the path with the new rewritten word rather than the wordw

k

- The Algorithm (cont’d):
- 6.2) To minimize the rewriting cost, choose a path with minimal number/cost of function invocations.

- EXIT % End of the algorithm

- The complement automaton A for schema τ’(newspaper)=title.date.temp.exhibit*

1

*

*

*

*

*

q0

title

q1

date

temp

p3

p4

p6

q3

*

exhibit

*

p5

exhibit

Figure7:

1

1

- The cartesian product automatonAx = Aw x A

1

exhibit

q4,p6

q5,p5

Performance

q7,p5

ε

ε

ε

Performance

Exhibit

exhibit

TimeOut

Perform.

ε

ε

q7,p6

q3,p6

q7,p6

q7,p3

q4,p3

ε

Get_Temp

TimeOut

title

date

q0,p0

q1,p1

q2,p2

q3,p3

ε

ε

temp

q5,p2

q6,p3

Figure8:

- Complexity of the Algorithm:
- s0 schema of the sender
- s agreed data exchange schema
- Complexity is determined by the size of thecartesian product of the automaton.
- 1. Construct the cartesian product
- 2. Traverse and mark the nodes of the resulting product
- THUS complexity is bounded by:
- O(|Ax| )=O( ( | Aw | X | A |) )

2

2

k

- Complexity of the Algorithm: (cont’d)
- O(|Ax| )=O( ( | Aw | X |A |) )

2

2

k

Maximum size:

O((|s0|+|w|) )

Complexity is polynomial

in the size of schemas s and s0 (with the exponent determined by k)

k

- The Algorithm
- 1.Build finite state automaton for the following languages:
- 1.1. An automaton Aw
- 1.2. An automaton accepting the regular language R

k

- An automaton A for schema τ’’(newspaper)=title.date. Temp.exhibit*

p0

title

p1

date

temp

p3

Exhibit

p4

p2

exhibit

Figure10:

- The Algorithm (cont’d)
- 2.Construct the cartesian product automaton Ax=Aw x A

k

q4,p3

ε

ε

title

date

q0,p0

q1,p1

q2,p2

q7,p3

q3,p3

ε

ε

q7,p4

temp

q5,p2

q6,p3

ε

q4,p4

exhibit

Figure11:

- The Algorithm (cont’d)
- 3.Mark all nodes in Ax having some outgoing path leading to a final state
- 4.IFthe initial state is markedTHEN a rewriting may exist.
- To obtain such a rewriting:
- Follow a marked path from the initial state of Ax to a final one , with the fork options on the path determining the rewriting choices.
- Backtrack when the call return a value that does not allow to continue to an accepting state
- To minimize thE rewriting cost, choose a path with the minimal number/cost of function invocations.

- The cartesian product automaton for possible rewritting.

q4,p3

ε

ε

title

date

q0,p0

q1,p1

q2,p2

q7,p3

q3,p3

ε

ε

q7,p4

temp

q5,p2

q6,p3

ε

q4,p4

exhibit

Figure11:

- implementation performed in theSchemaEnforcement Module of ActiveXML.
- We’ll describe:
- how the intensional document and schema model map to:
- XML
- XML schema
- SOAP
- WSDL

- Describe ActiveXML and Schema Enforcement Module

- how the intensional document and schema model map to:

- In the implementation;
- intensional XML document a synctactically well-formed XML document

- To distinguish intensional parts from the rest of the document;
- namespace http://www.activexml.com/ns/int is used.
- http://www.activexml.com/ns/int namespace defined for function (service) calls.

newspaper

TimeOut

Get_Temp

title

date

city

“Exhibits”

“The Sun”

“04/10/2002”

“Paris”

Namespace defined for function (service) calls

Data nodes title and date

1.URL of the server

3.associated namespace

2.Method name

Three attributes of the function nodes provide necessary information to call the SOAP Service

1.URL of the server

3.associated namespace

2.Method name

Function TimeOut

- XML Representation of Function Attributes

id attribute:identifies the function attributes

Attributes: designate the SOAP function that implements the boolean predicate used for function pattern

The “contents” detail the function signature i.e. Expected types of input parameters and the result of function calls

- Function Pattern “Forecast”

Returns an element of type “temp”

Captures any function with one input parameter of element type “city”

- Newspaper element with structuretitle.date.(Forecast|temp).(TimeOut|exhibit*)

- ActiveXML System:
- Active XML is a peer-to-peer system centered around intensional XML documents.
- Each peer;
- contains a repository of intensional documents
- provides active features to enrich them by automatically triggering the function calls they contain.
- also provides some Web Services defined declaratively as queries/updates on top of the repository documents.

- All exchanges between the ActiveXML peers and with Web Service providers/consumers use the SOAP Protocol

- Each peer;

- The Role ofSchema Encorcement Module :
- 1. to verify whether the call parameters conform to the WSDLint description of the service.
- 2. if not, try to rewrite them into the required structure.
- 3. if 2 fails, to report an error.
NOTE:

- Similarly, before an ActiveXML returns its answer, the Schema Encorcement Module performs the same three steps on the returned data.

- Implementation ofSchema Enforcement Module:
- Parser uses a standard SAX parser.
- does not cover all the features of XML Schema
- implements the important features such as;
- complex types
- element/type references
- schema import
- does not check simple types, inheritance and keys, but could easily be added to the code.

- Different from the algorithm proposed, implementation builds the automaton in a lazy mode;
- start from the inital state and construct only needed parts
- The construction is pruned whenever a node can be marked directly without looking at the remaining, unexplored branches.

- Main ideas that guide this process:
- 1.Sink Nodes once you get there you can’t get out
- 2.Marked Nodes

- The pruned automaton

exhibit

q4,p6

q5,p5

Performance

q7,p5

ε

Performance

ε

ε

Exhibit

exhibit

TimeOut

Perform.

ε

ε

q7,p6

q3,p6

q7,p6

q7,p3

q4,p3

ε

Get_Temp

title

date

TimeOut

q0,p0

q1,p1

q2,p2

q4,p4

q3,p3

ε

ε

temp

q5,p2

q6,p3

Figure12:

- XML documents with embedded calls to Web services are already present in several existing products.
WHAT’S NEW ?

- However, the proposed extension of the XML Schema with function types is a first step towards a more precise description of XML documents embedding computation.
MAIN PROBLEM:

- whether Safe Rewriting remains decidable when the k-depth restriction is removed.