Representing and querying xml with incomplete information
This presentation is the property of its rightful owner.
Sponsored Links
1 / 43

Representing and Querying XML with Incomplete Information PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on
  • Presentation posted in: General

Representing and Querying XML with Incomplete Information. Serge Abiteboul INRIA. Luc Segoufin INRIA. Victor Vianu UCSD. Organization. Motivations Simplifying assumptions Model of incompleteness Answering queries Results Discussion Conclusion. Motivations.

Download Presentation

Representing and Querying XML with Incomplete Information

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Representing and querying xml with incomplete information

Representing and QueryingXMLwith Incomplete Information

Serge Abiteboul

INRIA

Luc Segoufin

INRIA

Victor Vianu

UCSD


Organization

Organization

  • Motivations

  • Simplifying assumptions

  • Model of incompleteness

  • Answering queries

  • Results

  • Discussion

  • Conclusion

Abiteboul-Segoufin-Vianu


Motivations

Motivations


The web is a world of incompleteness

The Web is a world of incompleteness

  • Information you get from the web is seldom complete:

    • Queries return you some - not all - data

    • Limited storage capability

    • Documents change on the Web: expiration

    • Sites are unavailable…

  • Context: A warehouse of XML documents from the Web, Xyleme

Abiteboul-Segoufin-Vianu


This work

This work

  • This work: simple, practically appealing approach to managing incomplete information

  • Sequence of queries to the web

    • (q1,A1)+(q2,A2)+…

    • Answers are cached

  • Process a new query without access to the web

    • Give an incomplete answer

    • Explain incompleteness to user

    • Seek additional information, i.e., find minimal set of queries to fully answer

Abiteboul-Segoufin-Vianu


Related works

Related works

  • Semantic caching

  • Answering queries using views

    • keep (Qi,Ai)

    • try to rewrite query Q into Q’(A1,...,An)

    • reject if you cannot

  • Incomplete database

    • (Qi,Ai) is some incomplete knowledge of DB

    • Related to querying incomplete information – e.g. Lipski-Imielinski

Abiteboul-Segoufin-Vianu


Challenge balance expressiveness and tractability

Challenge: balance expressiveness and tractability

  • Choice of data model

  • Choice of the query language

  • Choice of a representation of incompleteness

  • Results

    • Simple, practical solution

    • Extra features lead to serious problems

Abiteboul-Segoufin-Vianu


Simplifying assumptions

Simplifying Assumptions


Data is xml trees

Data is XML: trees

<dealer>

<UsedCars>

<ad>

<model>Honda</model>

<year>96</year>

</ad>

</UsedCars>

<NewCars>

<ad>

<model>Acura</model>

</ad>

</NewCars>

</dealer>

dealer

UsedCars

NewCars

ad

ad

model

year

model

Honda

96

Acura

Abiteboul-Segoufin-Vianu


Simplified xml

unordered trees

catalog

labelling function

value function

product

product

=c.jpg

name price category

name price cat picture

=nik =234 =electronic

=can =444 =electronique

subcategory

subcategory

=camera

=camera

Simplified XML

Abiteboul-Segoufin-Vianu


Simple xml types

Simple XML types

catalog

1 : 1 child (default)

* : 0 or more

+ : 1 or more

? : 0 or 1

*

product

*

name price cat picture

subcategory

Abiteboul-Segoufin-Vianu


Prefix selection queries ps queries

Prefix Selection Queries (ps-queries)

catalog

catalog

Query1

Query2

product

product

name price cat=elec

name

picture

<200

subcategory

Abiteboul-Segoufin-Vianu


Simplifications

Data

No order

No distinction attribute/element

No recursion

No links

Query

No complex path expressions

No join

No repeated child

Simplifications

product

name cat=elec cat=toy

Abiteboul-Segoufin-Vianu

NO


Crucial assumption xid

prod

&245

prod

&245

&245

prod

+

=

c.jpg

canon 120 elec

canon 120 elec

c.jpg

camera

camera

Crucial assumption: XID

  • URLs

  • ID/IDrefs

Abiteboul-Segoufin-Vianu


Representation of incomplete information incomplete trees

Representation of incomplete information:Incomplete trees


Document type definition dtd are used to represent incompleteness

Set of rules: e  r

e element name

r regular expression

Set of trees satisfying a DTD d: tree(d)

Shortcoming of DTDs

An element has a single definition independently of the context

Type of ad depends on the context

Document Type Definition (DTD) are used to represent incompleteness

dealer

usedcar

newxar

ad

ad

model

year

model

Abiteboul-Segoufin-Vianu


Solution specialization decoupled tags

adused and adnew

h(adused)=h(adnew )=ad

Solution: specialization (decoupled tags)

dealer

dealer

usedcar

newxar

usedcar

newxar

h

adused

adnew

ad

ad

model

year

model

model

year

model

Abiteboul-Segoufin-Vianu


Dtds specialization

DTDs + Specialization

The sets of trees that can be specified: the regular unranked tree languages [Bruggeman—Klein+Murata+Wood]

  • Same closure properties: intersection, union, complement

  • Same complexity

Abiteboul-Segoufin-Vianu


Example

Example

Q1: name, subcat, price of electronic products with price less than $200

Q2: name, pictures of cameras at least pictured once

----------------------------

Q3: name, price, pictures of cameras costing less than $100 and at least pictured once

can be completely answered using A1, A2

Q4: list all cameras

can be partially answered using A1, A2

Abiteboul-Segoufin-Vianu


Representing and querying xml with incomplete information

*

product

product

product

*

product1

product2

canon 120 elec

nikon 199 elec

sony 175 elec

camera

camera

cdplayer

catalog

missing

Q1: name, subcat, price of electronic products with price less than 200

Abiteboul-Segoufin-Vianu


Missing data after q1

Missing data after Q1

product1

product2

*

*

name price cat picture

name price cat picture

=elec

>200

!=elec

subcategory

subcategory

Abiteboul-Segoufin-Vianu


Representing and querying xml with incomplete information

product1

*

3

3

c.jpg

akai a.jpg elec

camera

catalog

product2

*

product2b

*

product2c

missing

product

product

product

product2a

canon 120 elec

nikon 199 elec

sony 175 elec

camera

camera

cdplayer

Q2: name, pictures of cameras at least pictured once

Abiteboul-Segoufin-Vianu


Incomplete information

Incomplete information

  • Known information

    • Prefix of the real data tree

  • Missing information

    • Extended tree type

    • Conditions on data values

    • Specializations, disjunctions

Abiteboul-Segoufin-Vianu


Representing and querying xml with incomplete information

product +

product2a

Missing data

name pricecat picture

=elec

product1

>200

*

subcategory

no picture

name price cat picture

product3

!=elec

no picture

subcategory

name price cat

product2c

elec

product2b

subcategory

*

namepricecat

!=camera

=elec

>200

namepricecatpicture

=elec

>200

Known data

subcategory

subcategory

Abiteboul-Segoufin-Vianu

!=camera


Answering queries

Answering Queries


Complete answer to q3

Complete answer to Q3

  • Q3: name, price, pictures of cameras costing less than $150 and having at least one picture

  • Can be fully answered using available information

  • Need to check whether answer is complete

catalog

prod

canon 120

c.jpg

Abiteboul-Segoufin-Vianu


Incomplete answer to q4

price>200

and

no picture

more products

name

Incomplete answer to Q4

  • Provide known cameras

  • Explain incompleteness

akai

canon

nikon

sony

Abiteboul-Segoufin-Vianu


Completing answer to q4

Completing answer to Q4

  • It suffices to ask:

product

0

name price cat

picture

=elec

>200

sub=camera

Abiteboul-Segoufin-Vianu


Revisit the types

Revisit the types

  • DTD

  • Conditions

  • Specialization: same

    element name may have

    several types

  • Not sufficient

  • Need to extend again the types: disjunctions

product2b

*

namepricecatpicture

=elec

>200

subcategory

!=camera

Abiteboul-Segoufin-Vianu


Disjunction

Query1’

Query2’

Disjunction

vehicle

vehicle

engine

data

data

vehicle

?

sail

engine

data

description

?

&322

sail

vehicle

Empty!

description

data=“….”

description=“….”

Abiteboul-Segoufin-Vianu


Disjunction continued

Disjunction continued

  • Type of &322

    vehicle1 + vehicle2

vehicle1

vehicle2

engine

data

data

sail

description

description

The type of &322 can not be described

independently of that of data below

Abiteboul-Segoufin-Vianu


Results

Results


Representation system lipski s imielinski s

Representation

of information

Set of possible

worlds

T

rep(T)

rep

q

q

Set of possible

answers

q(rep(T))

=

rep(q(T))

Representation

of result

q(T)

rep

Representation System:Lipski’s+Imielinski’s

Abiteboul-Segoufin-Vianu


Representation system for ps queries

Representation System for PS-queries

  • Incomplete tree T to represent

    q1-1(A1)  …  qk-1(Ak)

  • PS-query q

  • q(T) can be computed in ptime

    (representation of the answer can be computed in ptime)

Abiteboul-Segoufin-Vianu


Querying incomplete trees

Querying Incomplete Trees

  • Given T and a query q, one can

    • Give in ptime the sure answers up to our current knowledge

    • Check in ptime whether query q can be fully anwered

    • Generate in ptime queries to complete answer

Abiteboul-Segoufin-Vianu


Comparison with il

Relational model

Relational calculus/algebra

Conditional table

Closed or open world

Representation system

XML tree model

Weaker language (no join)

Weaker system (no variable)

+ Closedandopen World

Representation system

Comparison with IL

Abiteboul-Segoufin-Vianu


Drawback exponential blowup

Drawback: exponential blowup

  • Incomplete information may become exponential w.r.t the sequence of query/answer q1/A1;q2/A2…

database

database

qi:

Type:

1

1

b

b=i

a

a=i

Answers are empty

Abiteboul-Segoufin-Vianu


Dealing with exponential blowup

Dealing with exponential blowup

  • Make the representation more complex using disjunctions of types

    • Size of representation stays polynomial

    • Manipulations much more complex

  • Restrict tree types and PS-queries

    • Already very/too? simple

  • Accept to loose some information

  • Ask extra queries to simplify representation

Abiteboul-Segoufin-Vianu


Discussion

Discussion


Discussion extend language

Discussion: extend language

  • Some results in paper

  • Extensions often lead to intractability

  • E.G. : K-pebble transducers [Milo,Suciu,Vianu] that somehow subsume XML-QL and XSL

    • No (known) representation system

    • Testing rep(T) is empty is non-elementary

Abiteboul-Segoufin-Vianu


Discussion node ids

Discussion : node Ids

Without node Ids

  • much less information to integrate results

  • more complex

  • tedious case analysis

Abiteboul-Segoufin-Vianu


Discussion ordering

Discussion: ordering

  • Ordering in XML, DTD, queries

  • Problem is totally different and very complex

  • Example:

    • Q1/A1: list of males; Q2/A2: list of females; Q3: list all

  • Depending on the type of input

    • (Male)*(Female)* A3= A1 || A2

    • (Male Female)* A3= shuffle(A1,A2)

    • (Male + Female)* we cannot answer A3

  • Regular expression processing

Abiteboul-Segoufin-Vianu


Conclusion

Conclusion

  • Framework for acquiring, maintaining, querying incomplete XML data

  • Limitations:

    • simple queries

    • no order and Id assumption

    • small extensions lead to problems

  • Possible to represent the incompleteness

  • Possible to answer with incompleteness

  • Possible to obtain queries to provide full answer

Abiteboul-Segoufin-Vianu


  • Login