a type system for a semistructured and xml data base management system n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Type System for a Semistructured and XML Data Base Management System PowerPoint Presentation
Download Presentation
A Type System for a Semistructured and XML Data Base Management System

Loading in 2 Seconds...

play fullscreen
1 / 70

A Type System for a Semistructured and XML Data Base Management System - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on

A Type System for a Semistructured and XML Data Base Management System. Ph. D. Thesis Proposal Dario Colazzo. Thesis Goals. Formal developement and study of a type system for XML querying Implementation of a concrete type system for an XML data base management system: the Xtasy system.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Type System for a Semistructured and XML Data Base Management System' - jalila


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a type system for a semistructured and xml data base management system

A Type System for a Semistructured and XML Data Base Management System

Ph. D. Thesis Proposal

Dario Colazzo

thesis goals
Thesis Goals
  • Formal developement and study of a type system for XML querying
  • Implementation of a concrete type system for an XML data base management system: the Xtasy system
presentation outline
Presentation outline
  • Semistructured data and XML
  • Data models
  • Type languages: DTD, XML Schema
  • Querying XML data: Tequyla
  • Processing XML data: XDuce
  • Thesis goals
semistructured data
Semistructured data
  • Irregular and instable structure
  • Self-describing representation
  • No separate schema information: few guarantees of reliability and efficiency of applications
oem graph
OEM graph

addrbook

person

person

name

age

addr

name

age

email

“Dario Colazzo”

30

“Pisa”

first

second

30

“sartia@xyz.com”

“Carlo”

“Sartiani”

xml syntax
XML syntax

<addrbook>

<person>

<name>Dario Colazzo</name>

<addr>Pisa</addr>

</person>

<person>

<name>

<first>Carlo </first>

<second>Sartiani</second>

</name>

<addr>Pisa</addr>

<email>sartia@xyz.com</email>

</person>

</addrbook>

attributes and element reference
Attributes and element reference

<db>

<state id="01">

<name>Italy</name>

<code>IT</code>

</state>

.......

<city region=“Toscana” state-of="01">

<name>Italy</name>

<code>PI</code>

</city>

</db>

xml query data model
XML Query Data Model
  • Based on node labeled forest trees (set of documents)
  • Several kind of nodes:
    • element node
    • attribute node
    • value node
  • Identifier and reference attributes modeled as general attribute
xml tree
XML Tree

addrbook

element node

attribute node

person

person

value node

name

email

addr

name

age

addr

age

first

second

“Dario Colazzo”

“Pisa”

30

“Pisa”

“sartia@xyz.com”

30

“Carlo”

“Sartiani”

xml schema languages
XML schema languages
  • Document Type Declarations: schemas as grammars for documents. Regular type expressions
  • XML Schemas: closer to traditional type languages
slide11
DTD
  • Regular type expressions:
    • T | U union
    • T,U sequence
    • T* zero or more
    • T? zero or one
    • X=T[X] recursive definitions
  • coupled-tag element declarations
  • global definitions
  • only one base type: string (PCDATA)
  • no type reusing
dtd example
DTD, example

zero or more

<!DOCTYPE addrbook[

<!ELEMENT addrbook (person*)

<!ELEMENT person (name, addr, tel?)>

<!ELEMENT name #PCDATA>

<!ELEMENT addr #PCDATA>

<!ELEMENT tel #PCDATA>

zero or one

xml schema
XML Schema
  • decoupled-tag: elements and types may be defined separately
  • local definitions
  • base types: intgers, string, decimal,...
  • type reusing:
    • type refining
    • type extension with subtyping
xml schema example
XML Schema, example

<xsd:complexType name="person">

<xsd:sequence>

<xsd:element name="name" type="xsd:string" />

<xsd:element name="age" type="xsd:ageType"/>

<\xsd:sequence>

<\xsd:complexType>

<xsd:complexType name="newPerson" base="typeOfPerson" derivedBy="extension">

<xsd:element name="car" type="xsd:string" />

<\xsd:complexType>

querying xml data
Querying XML data
  • XML querying is based on the use of patterns to select portions of document
  • Untyped query languages:
    • XQL
    • XML-QL
    • Quilt
  • Typed:
    • Tequyla
    • XDuce (functional language)
  • Forthcoming W3C query language...?..
    • probably  Quilt
tequyla
Tequyla
  • SQL-like query language
  • query free-nesting
  • typed:
    • query correctness
    • query typing
  • Currently: only non algorithmical definitions, and weak subtyping
tequyla queries
Tequyla queries
  • The body of a Tequila query is a from clause composed by XPath patterns
  • x=addressbook.xml;
    • bind to x the root element of addressbook.xml
  • y in x//person/addr
    • starting from the root (x) search for a person element at an arbitrary depth (//), then for an addr sub element (/), finally bind the node found to y
a tequyla query
A Tequyla query

Q =

from x=addressbook.xml;

y in x//person/addr;

z in x//person/name;

where y="Pisa"

select nome[z]

XPath

xduce
XDuce
  • Typed functional language
  • Regular expressions types
  • Type based pattern language
xduce schema
XDuce schema
  • A schema is a set of type definitions

E= {

Addressbook = addrbook [(Name, Addr, Tel?) *]

Name = name [String]

Addr = addr[String]

Tel = tel[String]

}

an xduce funtion telephone list
An XDuce funtion: telephone list
  • Consider T= (Name, Addr,Tel?) in

fun mkTelList : T* --> (Name,Tel)* =

name[n], addr[a], tel[t], rest:T*

--> name[n],tel[t], mkTelList(rest)

| name[n], addr[a], rest: T*

--> mkTelList(rest)

| () --> ()

xduce subtyping language inclusion
XDuce subtyping: language inclusion
  • XDuce provides a simple but rather powerful notion of subtyping based on inclusion between sets of values
  • Examples
    • Name, Addr <: Name, Addr,Tel?
    • Name, Addr,Tel <: Name, Addr,Tel?
  • XML Schema extension subtyping is not captured
type language
Type language
  • As expressive as DTD and XML Schema
  • Base types
  • Attributes and id/idref types
  • Type refining and extension
  • Local type definitions
  • Unordered sequence types
schema extraction and schema inferring
Schema extraction and schema inferring
  • For untyped data, a schema will be inferred according to the XML Schema style
  • For typed XML data, the schema will be converted in the internal schema representation
  • Type inference for query results
data conformity
Data conformity
  • An algorithm will be defined to check data conformity to a schema
  • The problem is EXPTIME-complete
  • Optimization techniques exist
  • Further ones has to be found to deal with unordered sequence types and id/idref types
query correctness
Query correctness
  • Only type correct queries will be executed
  • Type correctness is based on successful matching between the query structural requirements and the type of the data to be queried
correct queries an example 1 2
Correct queries, an example (1/2)

Consider

E= {

Adrressbook = addrbook [Person*]

Person = (Name, Addr, Tel?)

Name = name [String]

Addr = addr[String]

Tel = tel[String]

}

correct queries an example 2 2
Correct queries, an example (2/2)
  • A correct query:

Q =

from x=addressbook.xml;

y in x//person/addr;

z in x//person/name;

where y="Pisa"

select nome[z]

correctness union types
Correctness & union types
  • Consider:

Q’ = from x=addressbook.xml;

y in x//person/addr;

z in x//person/tel;

where y="Pisa"

select results[z]

  • Schould we consider this query correct?
correctness union types existential approach
Correctness & union types: existential approach
  • The previous query is considered as correct
  • The user will be warned about optional elements required by patterns
total approach
Total approach
  • The previous query is considered as not correct
  • Too severe discipline
  • A lot of queries with non empty results would be cut off
type equivalences
Type equivalences
  • Several type equivalences laws will be considered
  • In particular:
    • (T | U) , S = (T , S) | (T , S)
  • Useful to simplify schema definitions
subtyping
Subtyping
  • A subtype relation E  E’ will be defined such that:
    • If a query Q is correct wrt E’ then it is also correct wrt E
  • Type extension will be supported: if E is an extension of E’ then E  E’
parametric polymorphism 1 3
Parametric polymorphism (1/3)
  • Used in some functional languages (e.g. ML and Haskel) to define generic functions, for example:

funtion Sort (t :Type; L:List t; Ord:tX t Bool): List t

begin

.....

end.

  • It will allow us to define generic queries
parametric polymorphism 2 3
Parametric polymorphism (2/3)
  • Parametric types fits well in the description of irregular data structure
  • For example E(t)=

{Adrressbook = addrbook [(Name, Addr, Tel?) *]

Name = name [String]

Addr = addr[t]

Tel = tel[String]}

  • addr elements content can have, for example, a street and a city sub-element
parametric polymorphism 3 3
Parametric polymorphism (3/3)
  • A generic query:

Q =  t: Type;  a : E(t) .

from x= a ;

y in x//person/addr;

z in x//person/name;

where z=“dario"

select indirizzo[y]

  • More precise typing: the type Any* is different from t*
conclusions
Conclusions
  • The type system will provide:
    • union types
    • reference types
    • recursive types
    • subtyping
    • parametric polymorphism
presentation outline1
Presentation outline
  • Proposal
  • What has been done
  • Ongoing and future work
thesis goals1
Thesis Goals
  • Formal developement and study of a type system for XML querying
  • The query language is an abstract version of XQuery (W3C)
  • The type langueage is expressive enough to capture the essence of current standards
xquery type system
Xquery type system
  • Only result analisis: XQuery type system is defined to determine and check at query-analysis time the output type of a query on documents conforming to an expected input type.
  • Query correctness is not defiend and checked (only some ideas).
what has been done
What has been done
  • We have:
    • formally defined the notion of query type correctness
    • defined a type system to statically check it and to perform result analisys; the rules define a terminating algorithm.
    • intruduced an alternative, wrt Xquery, approach to deal with recursive types
observations
Observations
  • Our type system also performs query analisys and, in this respect, presents some differences wrt XQuery approach
  • Till now, we have considered a type system feeaturing product, union and recursive types
  • We have discovered that these type mechnanism are sufficient enough to make the study interesting and (as we will see) rather subtle.
observations1
Observations
  • discovered that for particular queries (fortunately not frequent ones) the type system is not able to exactly capture the semantical characterization of correctness
  • Introduced a further notion of correctness, path-covering, and provided rules to check this property
papers
Papers
  • A first defintion of the type system can be found in A Typed Text Retrieval Query Language for XML Documents , Journal of the American Society for Information Science and Technology (JASIS)Special Issue 2001
  • In Types for Correctness of Queries over Semistructured Data, the system has been improved by a finer notion of query correctness and by the notion of path covering.

The work will be submitted at WebDB2002 workshop

tequyla or xquery
Tequyla (or µXQuery)
  • SQL-like query language
  • query free-nesting
  • typed:
    • type conformance of data
    • query correctness
    • query typing (result unalysis)
tequyla queries1
Tequyla queries
  • The body of a Tequila query is a from clause composed by XPath patterns
  • x=addressbook.xml;
    • bind to x the root element of addressbook.xml
  • y in x//person/addr
    • starting from the root (x) search for a person element at an arbitrary depth (//), then for an addr sub element (/), finally bind the node found to y
types
Types
  • T,U ::= () empty sequence

B atomic type (char, int,…)

T + U union

T; U sequence

l[T] element type

X type name

  • Type environments: type definitions + type binding for query free variables

E ::= ()

X=T, E

x:X, E

a type environment
A type environment
  • E=

Adrressbook= addrbook [ Person*], Person= person[Name, Addr, (Tel +EMail)],

Name = name [String],

Addr = addr[String],

Tel= tel[String],

EMail= email[String],

x: Adrressbook

a correct query
A correct query

Q ::=

from y in x//person/addr;

z in x// person/name;

where y="Pisa"

select nome[z]

XPath

an incorrect query
An incorrect query

Q ::=

from x=addressbook.xml;

y in x//person/address;

z in x/name;

where y="Pisa"

select nome[z]

queries
Queries:

Q1, Q2 :: = ()

VB

l[Q]

Q1; Q2

from x=Q1 select Q2

from x in Q1 select Q2

x

Q p

  • Observe: no where clauses.
some notation
Some notation
  • Given s= {x1= f1,...., xn= fn}

s::E

means that

xi = fi s iff xi:T E and fi T

  • E|-- Q means that each fv x in Q is typed in E (x:T  E)
definition of correctness first step
Definition of correctness: first step
  • Given a query Q, a schema E for its free variables, and s::E :

1. [[Q]]s=<f, F> or

2. [[Q]]s=<f, NF>

  • Essentially, in s, Q correctely returns a forest f (case 1.) if  Q’ p in Q, the path p finds a match with the forest returned by Q’
query correctness1
Query correctness

Given a query Q and E s.t. E|-- Q :

  • Q is strongly correct iff for each s::E

[[Q]] s=<f, F>

  • Q is weaklycorrect iff there exists s::E

[[Q]]s=<f, F>

  • Q is incorrect iff for each s::E

[[Q]]s=<f, NF>

example strongly correct query
Example: strongly correct query

Consider the type environment

X=a[Y],

Y=b[Int]+c[Int],

x: X

and the query

x(/b+/c)

example weakly correct query
Example: weakly correct query

Consider the query

x/b

Only some instance of type X contains the path /b

X=a[Y],

Y=b[Int]+c[Int],

x: X

example incorrect query
Example: incorrect query

Consider the query

x/d

No instance of type X contains the path /d

X=a[Y],

Y=b[Int]+c[Int],

x: X

type system
Type system
  • To check correctness and to infer the type of query results we have defined a set of rules that:
    • define an algorithm: determinism + termination
    • deals with recursion in a different way wrt to Xquery type system
    • in same cases (// + guarded recursion) infers context free types
    • do not rely on any notion of type inclusion: only matching between paths and types
some properties
Some properties

Given E |-- Q if the system return

E |-- Q :<T, θ> with θ{s, w, i}

then

[[Q]]  [[T]]

and

θ=s/i  Q is stongly correct/incorrect

If θ=w then in most cases Q is weakly correct, but in some cases Q is strongly correct or, even worst, incorrect 

weak correctness problem 1
Weak correctness problem (1)
  • Unsoundness for the case θ=w (and incorrect queries) is due to particuluar queries where two different paths start from the same root (x) and traverse two “disjoint” paths
  • Example:

x/b; x/c where

x :X,

X=a[Y],

Y=b[Int]+c[Int]

observations2
Observations
  • Observe, the problem does not arise for

x/b; x/b or x/b; y/c

where

x :X,

y: X,

X=a[Y],

Y=b[Int]+c[Int]

Both queries are weakly correct as inferred by the type system

strong correctness problem
Strong correctness problem
  • Consider the strongly correct queryConsider

x(/b+/c)

where

x: X,

X=a[Y],

Y=b[Int]+c[Int],

In this case the type system infers:

< b[Int]?+c[Int]?, w>

solution
Solution
  • We have a possible solution for these problems
  • It is based on a different representation of union types
  • Currentely we are working on the defiition of simple rules that implement this approach
path covering
Path covering
  • In strong correctness we require that for each alternative path in the input type there is a path selection in the query,
  • In the notion of path covering we require that each alternative expressed in the query appears in the input type
path covering examples
Path covering, examples

Consider X=a[Y],

Y=b[Int]+c[Int],

x: X

and the query

x(/b+/c+/d)

This path selection is not path-covered wrt to X, the path /d is superflous

The same is for x(/b+/d), while both x(/b+/c) and x(/b) are path-covered

path covering1
Path covering
  • It is useful for programmers as they are statically informed about extra paths that may ineffeciently attempt to match input data
  • Moreover they can improve and simplify their queries by eleiminating superflous paths or by subtituting them with actually occurring ones
path covering2
Path covering
  • The type system defined for corretness has been easily extended to check path covering
  • The system constituets a formal framework where several other notions of correctness can be defined and compared
ongoing and future work
Ongoing and future work
  • Currently we are working on:
    • the defintion of (simple) rules that solves the unsoundness problems previously outlined
    • the formal proofs of properties of the current system
  • In next months we:
    • complete the developement of formal stuff for both systems for query correctness and for the system for path covering
    • extend the language with where clauses