Exchanging intensional xml data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 66

EXCHANGING INTENSIONAL XML DATA PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

EXCHANGING INTENSIONAL XML DATA. Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ; Bernd Amann Cedric-CNAM ; Omar Benjelloun INRIA ; Fred Dang Ngoc INRIA. H. GÜL ÇALIKLI 2002700743 MURAT KORAŞ 2002700797. INTRODUCTION.

Download Presentation

EXCHANGING INTENSIONAL XML DATA

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Exchanging intensional xml data

EXCHANGING INTENSIONAL XML DATA

Tova Milo INRIA & Tel-Aviv U. ; Serge Abiteboul INRIA ;

Bernd AmannCedric-CNAM ; Omar Benjelloun INRIA ;

Fred Dang NgocINRIA

H. GÜL ÇALIKLI 2002700743

MURAT KORAŞ 2002700797


Introduction

INTRODUCTION

  • Emergence of Web Services as standard means of publishing and accessing data on the web introduced a new class of XML documents called “intensional documents”.

  • Intensional Documents:XMLdocuments where;

    • some of the documents are defined explicitly

    • some are defined by programs that generate data.


Introduction1

INTRODUCTION

  • materialisation: the process of evaluating some of the programs included in an XML document and replacing them by their results.

  • GOAL of this PAPER:

    • Study the new issues raised by the exchange of intensional XML document btw. Applications

    • Decide on which data should be materialised before it is sent and which should not


Introduction2

INTRODUCTION

CONSIDERATIONS for MATERIALISATION

  • Performance:

    • current system load

    • cost of communication

  • Capabilities:

    • unability to handle intensional parts of a document

    • lack of access rights (to a particular service)

  • Security:

    • invoking service calls from an untrusted party may cause severe security violations

  • Functionalities:

    • confidentiality reasons

    • calling services may involve fees to be paid.


Introduction3

Sender

Receiver

capabilities

ACL

cost

...

capabilities

ACL

cost

...

g

g

r

g

f

q

r

g

g

r

q

INTRODUCTION

Data exchange scenario for intensional documents

g

Data Exchange Schema

q

f

f

q

g

r

q

...

...

...

...

...


The model and the problem

THE MODEL and THE PROBLEM

  • SIMPLE INTENSIONAL XML:

    • Model intentional XML documents as Labelled Trees consisting of two types of nodes:

      • Data nodes

      • Function Nodes correspond to “Service Calls”

    • Assume the existance of someDisjoint Domains:

      • N :domain of NODES

      • L :domain of LABELS

      • F : domain of FUNCTION NAMES

      • D : domain of DATA VALUES


The model and the problem1

THE MODEL and THE PROBLEM

  • SIMPLE INTENSIONAL XML (cont’d)

    • DEFINITION 1: An intensional documentdis an expression (T,λ) where:

      • T=(N,E,<) is an ordered tree.

        • N N: finite set of nodes

        • E N X N : edges

        • < : associates with each node in N a total order on its children.

      • λ :N  L U F U D is a labeling function for the nodes.

        NOTE: only leaf nodes may be assigned data values from D


The model and the problem2

THE MODEL and THE PROBLEM

  • SIMPLE INTENSIONAL XML (cont’d)

    • Nodes with a label in L U D are called Data Nodes.

    • Nodes with a label in F are called Function Nodes.

      • The children subtrees of a function node are the Function Parameters

      • When the function is called;

        • These subtrees are passed to it

        • The return value replaces the function node in the document.


Exchanging intensional xml data

THE MODEL and THE PROBLEM

newspaper

Get_Temp

TimeOut

title

temp

date

city

“Exhibits”

“The Sun”

“16 ºC”

“04/10/2002”

“Paris”


Exchanging intensional xml data

THE MODEL and THE PROBLEM

  • SIMPLE SCHEMA:

    • DEFINITION 2: A document schema s is anexpression (L,F,τ) where,

      • L L :finite set of labels

      • F F :finite set of function names

      • τ :function that maps:

        • Each label name l Є L to a regular expression over L U F or to the keyword data

        • Each function name f Є F to a pair of expressions called

          • τin(f ) input type of f

          • τout(f ) output type of f


Exchanging intensional xml data

THE MODEL and THE PROBLEM

  • SIMPLE SCHEMA (cont’d)

    • Example of a Schema:

      • data:

      • τ (newspaper) =title.date.(Get_Temp|temp)

        .(TimeOut|exhibit)

      • τ (title) = data

      • τ (date) = data

      • τ (temp) = data

      • τ (city) = data

      • τ (exhibit) = data


Exchanging intensional xml data

THE MODEL and THE PROBLEM

  • SIMPLE SCHEMA (cont’d)

    • Example of a Schema (cont’d):

      • functions:

      • τin (Get_Temp)= city

      • τout (Get_Temp)= temp

      • τin (TimeOut)= data

      • τout (Timeout)= (exhibit|performance)

      • τin (Get_Date)= title

      • τin (Get_Date)= date


The model and the problem3

THE MODEL and THE PROBLEM

  • SIMPLE SCHEMA (cont’d):

    • DEFINITION 3: An intensional document t is instance of a schema s=(L,F,τ) if for each:

      • Data NodenЄ t with label lЄ L, the labels of n’s children form a word in lang(τ(l ))

      • Same is valid for Function Node.

Used to denode the regular language defined by τ (l )


The model and the problem4

THE MODEL and THE PROBLEM

  • SIMPLE SCHEMA (cont’d):

    • DEFINITION 3 (cont’d):

      f : a function name

      t1,......,tn : a sequence of intensional trees

      IFthe labels of n’s children form a word in lang(τin(f)) (lang(τout(f)) )

      AND

      all the trees are instances of s.

      THEN

      t1,......,tnis an input instance of f (output instance)

every subtree conforms to the same schema as the whole document


The model and the problem5

THE MODEL and THE PROBLEM

  • SIMPLE SCHEMA (cont’d):

    • DEFINITION 4: (about Rewritings)

      • t,t’: trees

      • IFt’ is obtained from t by;

        • selecting a function node v in t with some label fand

        • replacing it by an arbitrary output instance of f

      • THENwe say thatt t’

v


The model and the problem6

THE MODEL and THE PROBLEM

  • SIMPLE SCHEMA (cont’d):

    • DEFINITION 4: (about Rewritings) (cont’d)

    • IFt t1 t2 ------ tn THEN

    • we say that t tn

    • nodes v1,........, vn are called rewriting sequence

    • the set of all trees t’ such that t t’ is denoted ext(t).

v1

v2

vn

t rewrites into tn

*

*


The model and the problem7

THE MODEL and THE PROBLEM

  • SIMPLE SCHEMA (cont’d):

    • DEFINITION 5: (about Rewritings)

    • Let:

      • t be a tree

      • s be a schema

    • 1. IF ext(t) contains some instance of s THEN

      t possibly rewrites into s.

    • 2. IFeither t is already an instance of s

      orthere exists some node vin t such that all trees t’ where t t’ safely rewrite into s

      THEN we say that t safely rewrites into s

v


The model and the problem8

THE MODEL and THE PROBLEM

  • SIMPLE SCHEMA (cont’d):

    • DEFINITION 6:

    • Let:

      • s be a schema

      • r is a distinguished label called root label

    • IF all the instances t of s with root label r rewrite safely into instances of s’

      THENwe say that:

      s safely rewritesinto s’


The model and the problem9

THE MODEL and THE PROBLEM

  • A Richer Data Model :

    Function Patterns:

    • The schemas we have seen so far specify that a particular function, identified by its name, may appear in the document.

    • But sometimes, one does not know in advance which functions will be used at a given place.

    • A common intensional schema for such documents should not require the use of a particular function, but rather allow for a set of functions, which have a proper signature.


The model and the problem10

THE MODEL and THE PROBLEM

  • to specify such set of functions we useFunction Patterns

  • Function Patterns:A function belongs to the pattern if its name satisfies theboolean predicateand itssignatureis the same as the required one

  • EX:

    • τname(Forecast)= UDDIF InACL

    • τin(Forecast)= city

    • τout(Forecast)= temp

V


The model and the problem11

THE MODEL and THE PROBLEM

  • A Richer Data Model (cont’d):

  • Restricted Service Invocations:

    • We assumed so far that all the functions appearing in a document may be invoked in a rewriting, in order to match a given schema.

    • This is not always the case, for the reasons like;

      • security,

      • cost,

      • access rights , etc.

    • THUS, function names/patterns in the schema can be partitioned into two disjoint groups of invocable and noninvocable ones.

    • A legal rewriting is then one that invokes only invocable functions.


Exchanging intensional data

EXCHANGING INTENSIONAL DATA

  • Rewriting Process:

    1.Safe Writing:

    • check if t safely rewrites to s

      • if so, find a rewriting sequence.

      • rewriting sequence a sequence of functions that need to be invoked to transformtinto the required structure

      • preferred required structure  shortest/ cheapest one


Exchanging intensional data1

EXCHANGING INTENSIONAL DATA

  • Rewriting Process(cont’d):

    2.Possible Writing :

    • IF a safe rewriting does not exist

      • check whether at least t may rewrite to s.

      • IF it is acceptable to do so (the sender accepts that the rewriting may fail),

      • try to find a successful rewriting sequence if one exists

      • preferred rewriting sequence  one with the least cost.


Exchanging intensional data2

EXCHANGING INTENSIONAL DATA

  • Rewriting Process(cont’d):

    3.Mixed Approached:

    In mixed approach, one could

    • first invoke some function calls

    • then attempt from there to find safe rewritings.


Exchanging intensional data3

EXCHANGING INTENSIONAL DATA

  • Rewriting Process(cont’d):

    • DEFINITION 7:

    • For a rewriting sequencetv:t1 ..tn ,

      • IFV j ЄtibutV jЄti-1 .

      • THEN we say thatfunction nodeVjdepends on afunction nodeV i.

      • IF the dependency graph among the nodes contains no paths of length greater than k.

      • THEN we say that a rewriting sequence is ofdepth k

v1

vn


Exchanging intensional data4

EXCHANGING INTENSIONAL DATA

RESTRICTION:

“Consider onlyk-depth left-to-right rewritings.“


Safe rewriting

SAFE REWRITING

  • Algorithm for k-depth left to right safe rewriting

  • Algorithm is decomposed into three parts:

    • 1.Rewriting Function Parameters:

      • to invoke a function

        • its parameters should be of right type

      • if not

        • they should be rewritten to fit that type.

      • when rewriting the parameters;

        • the functions in them can be invoked

          ONLY IF their own parameters can be rewritten into (i.e. are the expected input type.)


Safe rewriting1

SAFE REWRITING

  • Algorithm is decomposed into three parts (cont’d)

    • 1.Rewriting Function Parameters (cont’d)

      • For deepest functions

        • Verify that their parameters are instances of the corresponding input types.

        • If notrewriting fails.

      • Move upward ( do till all functions in the tree(forest) are done)

        • Try to safely rewrite f ’s own parameters into the required structure.

        • If notrewriting fails.


Safe rewriting2

SAFE REWRITING

  • Algorithm is decomposed into three parts (cont’d)

    • 2.Top Down Traversal:

      • In each iteration of the recursive procedure “Rewriting Function Parameters”,the parameters of the outmost functions of tree (forest) are handled.

      • In this part  safely rewrite the tree (forest)by invoking only these outmost functions.

      • THUS:

        • traverse the tree (forest) top down

        • At each step treat a single node and its children.


Safe rewriting3

SAFE REWRITING

  • Algorithm is decomposed into three parts (cont’d)

    • 2.Top Down Traversal (cont’d)

      • node n with children whose labels form a word w

      • The subtree rooted at node n can be rewritten into the target schema s=(L,F,τ)IF and ONLY IF:

        • 1. wcan be safely rewritten into a word in lang(τ(label(n)))

          AND

        • 2. each of n’s children can be safely rewritten into an instance of s.


Safe rewriting4

SAFE REWRITING

  • Algorithm is decomposed into three parts (cont’d)

    • 3.Rewriting the children of a node n:

    • Given:

      • wword (sequence of labels of n’s children)

    • Goal:

      • rewrite w so that it becomes a word in the regular language R=τ(label(n))

    • The process of rewriting involves:

      • choosing some functions in wand replacing them by a possible output

      • then choosing some other functions (which might have been returned by previous calls) and replacing them by their output

      • and so on up to the depth k


Safe rewriting5

SAFE REWRITING

  • Safe Rewriting Algorithm:

    • Given:

      • word w

      • the output types Rf1,.....,Rfnof the available functions

      • target regular language R

    • Purpose of the algorithm:

      • to test ifwcan be safely rewritten into a word in R

      • if so, to find a safe rewriting sequence


Safe rewriting6

SAFE REWRITING

  • Safe Rewriting Algorithm:

  • Note:For illustration purposes we use the newspaper document

    • w=title.date.Get_Temp.TimeOut word children labels form

    • R=title.date.temp (TimeOut|exhibit*)safe rewriting of the above word into the word in R

  • The Algorithm:

  • 1) Build the finite state automata for the following regular languages

    • 1.1) An AutomatonAwaccepting was a single word.


Safe rewriting7

SAFE REWRITING

  • The Algorithm (cont’d)

    • 1.2) Build automata Afi ,i=1,...,n each accepting the regular language Rfi

    • 1.3) Build an automaton A accepting the complement of the regular language R . The automaton should be deterministic and complete.


Safe rewriting8

SAFE REWRITING

  • The complement automation A for schema τ’(newspaper)=title.temp(TimeOut|exhibit*)

*

*

*

*

p0

title

p1

date

temp

p3

TimeOut

p4

p6

p3

*

exhibit

*

p5

exhibit


Safe rewriting9

SAFE REWRITING

  • The Algorithm (cont’d)

  • 2)Let Aw := Aw

  • 3) For j=1,...,k

    • Consider all the edgese=(v,u) in Awthat are labelled by the function name fi and not iterated in previous iterations

    • 3.1) extend Aw by attaching a copy of the automaton Afi with its inital and final states linked to v and u respectively by εmoves.

    • 3.2) denote v as a fork node (for the edge e)

    • 3.3) two fork options of v aree itself and the new outgoing ε edge

k

k

k


Safe rewriting10

Get_Temp

q2

title

date

q0

q1

q3

TimeOut

q4

ε

ε

ε

ε

temp

q5

q6

q7

exhibit

performance

SAFE REWRITING

1

  • 1 depth automaton Aw for the word w=title.date.Get_Temp.TimeOut

Represents choice of not invoking the function

Fork node

Fork node

Represents choice of invoking the function


Safe rewriting11

SAFE REWRITING

  • The Algorithm (cont’d)

  • 4) Construct the cartesian product automaton

    AX=Aw X A

  • The fork nodes and fork options in AX reflect those of Aw :

  • 4.1) the fork nodes [q p] Є AX  nodes where q was a fork node in Aw

  • 4.2) a fork option in AX consists of all edges originating from one fork option edge in Aw.

k

k

k

k


Safe rewriting12

SAFE REWRITING

  • The cartesian product automaton Ax = Aw x A

exhibit

q4,p6

q5,p5

Performance

q7,p5

ε

ε

ε

Performance

Exhibit

exhibit

TimeOut

Perform.

ε

ε

q7,p6

q3,p6

q7,p6

q7,p3

q4,p3

ε

Get_Temp

title

date

TimeOut

q0,p0

q1,p1

q2,p2

q4,p4

q3,p3

ε

ε

temp

q5,p2

q6,p3

Figure6:


Safe rewriting13

SAFE REWRITING

  • The Algorithm (cont’d):

  • 5) Mark nodes in AX:

    • 5.1) mark states that are accepting states in both Aw and A

    • 5.2) iteratively mark;

      • nonfork (regular) nodes: IF one of their outgoing edges points to a marked node

      • fork nodes: IF both of their fork options (for some fi ) contain an edge that points to a marked node.

k


Safe rewriting14

SAFE REWRITING

  • The cartesian product automaton Ax = Aw x A

exhibit

q4,p6

q5,p5

Performance

q7,p5

ε

ε

ε

Performance

Exhibit

exhibit

TimeOut

Perform.

ε

ε

q7,p6

q3,p6

q7,p6

q7,p3

q4,p3

ε

Get_Temp

title

date

TimeOut

q0,p0

q1,p1

q2,p2

q4,p4

q3,p3

ε

ε

temp

q5,p2

q6,p3

Figure6:


Safe rewriting15

SAFE REWRITING

  • The Algorithm (cont’d):

  • 6)Try to obtain a SAFE REWRITING.

    • “A safe rewriting exists IFF the initial state is not marked”

    • 6.1) Follow a non-marked path(corresponding tow ) starting from the initial state ofAx to a state [q p] where q is an accepting stateofAw

      • 6.1.1) non-marked fork options on the path determine the rewriring choices (i.e. which functions to call)

      • 6.1.2)when a function is invoked, we cont,nue the path with the new rewritten word rather than the wordw

k


Safe rewriting16

SAFE REWRITING

  • The Algorithm (cont’d):

    • 6.2) To minimize the rewriting cost, choose a path with minimal number/cost of function invocations.

  • EXIT % End of the algorithm


Safe rewriting17

SAFE REWRITING

  • The complement automaton A for schema τ’(newspaper)=title.date.temp.exhibit*

1

*

*

*

*

*

q0

title

q1

date

temp

p3

p4

p6

q3

*

exhibit

*

p5

exhibit

Figure7:


Safe rewriting18

SAFE REWRITING

1

1

  • The cartesian product automatonAx = Aw x A

1

exhibit

q4,p6

q5,p5

Performance

q7,p5

ε

ε

ε

Performance

Exhibit

exhibit

TimeOut

Perform.

ε

ε

q7,p6

q3,p6

q7,p6

q7,p3

q4,p3

ε

Get_Temp

TimeOut

title

date

q0,p0

q1,p1

q2,p2

q3,p3

ε

ε

temp

q5,p2

q6,p3

Figure8:


Safe rewriting19

SAFE REWRITING

  • Complexity of the Algorithm:

  • s0 schema of the sender

  • s agreed data exchange schema

  • Complexity is determined by the size of thecartesian product of the automaton.

    • 1. Construct the cartesian product

    • 2. Traverse and mark the nodes of the resulting product

    • THUS complexity is bounded by:

    • O(|Ax| )=O( ( | Aw | X | A |) )

2

2

k


Safe rewriting20

SAFE REWRITING

  • Complexity of the Algorithm: (cont’d)

    • O(|Ax| )=O( ( | Aw | X |A |) )

2

2

k

Maximum size:

O((|s0|+|w|) )

Complexity is polynomial

in the size of schemas s and s0 (with the exponent determined by k)

k


Possible rewriting

POSSIBLE REWRITING

  • The Algorithm

  • 1.Build finite state automaton for the following languages:

    • 1.1. An automaton Aw

    • 1.2. An automaton accepting the regular language R

k


Possible rewriting1

POSSIBLE REWRITING

  • An automaton A for schema τ’’(newspaper)=title.date. Temp.exhibit*

p0

title

p1

date

temp

p3

Exhibit

p4

p2

exhibit

Figure10:


Possible rewriting2

POSSIBLE REWRITING

  • The Algorithm (cont’d)

  • 2.Construct the cartesian product automaton Ax=Aw x A

k

q4,p3

ε

ε

title

date

q0,p0

q1,p1

q2,p2

q7,p3

q3,p3

ε

ε

q7,p4

temp

q5,p2

q6,p3

ε

q4,p4

exhibit

Figure11:


Possible rewriting3

POSSIBLE REWRITING

  • The Algorithm (cont’d)

  • 3.Mark all nodes in Ax having some outgoing path leading to a final state

  • 4.IFthe initial state is markedTHEN a rewriting may exist.

  • To obtain such a rewriting:

    • Follow a marked path from the initial state of Ax to a final one , with the fork options on the path determining the rewriting choices.

    •  Backtrack when the call return a value that does not allow to continue to an accepting state

    •  To minimize thE rewriting cost, choose a path with the minimal number/cost of function invocations.


Safe rewriting21

SAFE REWRITING

  • The cartesian product automaton for possible rewritting.

q4,p3

ε

ε

title

date

q0,p0

q1,p1

q2,p2

q7,p3

q3,p3

ε

ε

q7,p4

temp

q5,p2

q6,p3

ε

q4,p4

exhibit

Figure11:


Implementation

IMPLEMENTATION

  • implementation performed in theSchemaEnforcement Module of ActiveXML.

  • We’ll describe:

    • how the intensional document and schema model map to:

      • XML

      • XML schema

      • SOAP

      • WSDL

    • Describe ActiveXML and Schema Enforcement Module


Implementation1

IMPLEMENTATION

  • In the implementation;

    • intensional XML document a synctactically well-formed XML document

  • To distinguish intensional parts from the rest of the document;

    • namespace http://www.activexml.com/ns/int is used.

    • http://www.activexml.com/ns/int namespace defined for function (service) calls.


Implementation2

IMPLEMENTATION

newspaper

TimeOut

Get_Temp

title

date

city

“Exhibits”

“The Sun”

“04/10/2002”

“Paris”


Implementation3

IMPLEMENTATION

Namespace defined for function (service) calls

Data nodes title and date

1.URL of the server

3.associated namespace

2.Method name

Three attributes of the function nodes provide necessary information to call the SOAP Service


Implementation4

IMPLEMENTATION

1.URL of the server

3.associated namespace

2.Method name

Function TimeOut


Implementation5

IMPLEMENTATION

  • XML Representation of Function Attributes

id attribute:identifies the function attributes

Attributes: designate the SOAP function that implements the boolean predicate used for function pattern

The “contents” detail the function signature i.e. Expected types of input parameters and the result of function calls


Implementation6

IMPLEMENTATION

  • Function Pattern “Forecast”

Returns an element of type “temp”

Captures any function with one input parameter of element type “city”


Implementation7

IMPLEMENTATION

  • Newspaper element with structuretitle.date.(Forecast|temp).(TimeOut|exhibit*)


Implementation8

IMPLEMENTATION

  • ActiveXML System:

  • Active XML is a peer-to-peer system centered around intensional XML documents.

    • Each peer;

      • contains a repository of intensional documents

      • provides active features to enrich them by automatically triggering the function calls they contain.

      • also provides some Web Services defined declaratively as queries/updates on top of the repository documents.

    • All exchanges between the ActiveXML peers and with Web Service providers/consumers use the SOAP Protocol


Implementation9

IMPLEMENTATION

  • The Role ofSchema Encorcement Module :

  • 1. to verify whether the call parameters conform to the WSDLint description of the service.

  • 2. if not, try to rewrite them into the required structure.

  • 3. if 2 fails, to report an error.

    NOTE:

  • Similarly, before an ActiveXML returns its answer, the Schema Encorcement Module performs the same three steps on the returned data.


Implementation10

IMPLEMENTATION

  • Implementation ofSchema Enforcement Module:

  • Parser uses a standard SAX parser.

    • does not cover all the features of XML Schema

    • implements the important features such as;

      • complex types

      • element/type references

      • schema import

      • does not check simple types, inheritance and keys, but could easily be added to the code.


Implementation11

IMPLEMENTATION

  • Different from the algorithm proposed, implementation builds the automaton in a lazy mode;

    • start from the inital state and construct only needed parts

    • The construction is pruned whenever a node can be marked directly without looking at the remaining, unexplored branches.

  • Main ideas that guide this process:

    • 1.Sink Nodes  once you get there you can’t get out

    • 2.Marked Nodes


Implementation12

IMPLEMENTATION

  • The pruned automaton

exhibit

q4,p6

q5,p5

Performance

q7,p5

ε

Performance

ε

ε

Exhibit

exhibit

TimeOut

Perform.

ε

ε

q7,p6

q3,p6

q7,p6

q7,p3

q4,p3

ε

Get_Temp

title

date

TimeOut

q0,p0

q1,p1

q2,p2

q4,p4

q3,p3

ε

ε

temp

q5,p2

q6,p3

Figure12:


Conclusion and related work

CONCLUSION and RELATED WORK

  • XML documents with embedded calls to Web services are already present in several existing products.

    WHAT’S NEW ?

  • However, the proposed extension of the XML Schema with function types is a first step towards a more precise description of XML documents embedding computation.

    MAIN PROBLEM:

  • whether Safe Rewriting remains decidable when the k-depth restriction is removed.


  • Login