slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Luc PowerPoint Presentation
Download Presentation
Luc

Loading in 2 Seconds...

play fullscreen
1 / 21

Luc - PowerPoint PPT Presentation


  • 119 Views
  • Uploaded on

Trees, semistructured data, and other strange ways to go beyond tables Serge Abiteboul INRIA & ENS Cachan PODS 30th Anniversary , 2011 . Another one of these No-SQL talks ?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Luc' - rania


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Trees, semistructured data,and other strange ways to go beyond tablesSerge Abiteboul INRIA & ENS CachanPODS 30th Anniversary, 2011

Another one of these No-SQL talks?

IMS, hierarchical model, V-relations, Jacobs’s calculus, Hardgrave’s broom, nested relations, format model, complex objects, logical data model, object databases, lambda calculus, regular trees, F-logic, NF1F, NF2, COL, IFO, LDL, IQL, SGML, HTML, ASN.1, XML, YAML, JSON…

Luc

Véro

introduction

Theorem: Information lives in trees and not in relations

Proof: the Bible does not say « But of the two dimensional table of knowledge of good and evil … » 

Introduction

Trees are useless n

Knowledge lives in trees

But of the tree of the knowledge of good and evil, thou shalt not eat of it: for in the day that thou eatest thereof thou shalt surely die.Genesis, 2. 17

  • A tree is a tree. How many more do you have to look at?
  • Ronald Reagan, governor of California, opposing the expansion of Redwood National Park (1966)
  • We don’t need anything beyond relations. These things are useless. Reject!
  • Anonymous referee (circa 1990)
organization
Organization
  • Introduction
  • Hierarchical data model 60s
  • Nested relations 80s
  • Complex objects early 90s
  • Semistructured data & unranked labeled trees late 90s
  • Unranked labeled ordered trees, aka XML early 00s
  • Evolving trees, aka Active XML mid 00s
  • Cycles 90s to now
  • Conclusion

More or less chronological

slide4

For lack of time, we will ignore IMS and the hierarchical model

    • The language was purely navigational anyway
  • We will also ignore early works such as Makinouchi, Jacobs or Hardgrave
  • We will start with N1NF
    • François Bancilhon in France
    • Hans Schek in Germany
    • PhD thesis of Nicole Bidoit
non first normal form n1nf
Non-First-Normal-Form N1NF

A quarter on tables. Now what?

Data live in 1NF relations

Data would prefer to live in infamous

nested relations

aka V-relations

aka N1NF relations

aka NF2 relations

Trees!

DB101

the devil is in the details
The devil is in the details

V-relations

N1NF-relations

A is not a key

The size is now possibly exponential

in the size of the domain

A is a key

No new power

complex object model tuple and set constructors used freely
Complex object model tuple and set constructors used freely

*

*

*

*

*

Families

Children

Children

Cars

Cars

Name

Peter

Name

Peter

Name

Mimi

Sex

F

Year

1976

Year

2010

Name

Toto

Name

2CV

Name

BMW

Sex

M

Name

Zaza

Sex

F

a logic and algebra for complex objects
A logic and algebra for complex objects
  • Logic: main novelty is set variables – non first-order
  • Example: AbouBanat Query
  • { T.Father| Families(T)  X  T.Children ( X.Sex = F ) }
  • Algebra: powerset operation, unnest/nest
results
Results
  • Equivalence theorem: algebra and logic have same expressive power
  • Remark: one can compute TC using algebra/logic (waoh! Cool!)
  • Also studied: fixpoint, datalog, while…
  • Complexity: each new level of nesting introduces one more exponential
  • Need to control the use of powerset

2n

2n 2

….

from complex objects to semistructured data
From complex objects to semistructured data

*

*

*

*

*

Families

Children

Children

Cars

Cars

Name

Peter

Name

Peter

Name

Mimi

Sex

F

Year

1976

Year

2010

Name

Toto

Name

2CV

Name

BMW

Sex

M

Name

Zaza

Sex

F

revolution 1 more flexibility
Revolution 1: more flexibility

*

*

*

*

*

Families

Children

Children

Cars

Cars

Name

Peter

Name

Peter

Name

Mimi

Sex

F

Year

1976

Year

2010

Name

Toto

Name

2CV

Name

BMW

Sex

M

Name

Zaza

Sex

F

Annotations

Trash

revolution 2 remove some nodes name all
Revolution 2: Remove some nodes; name all

*

*

*

*

Families

Family

Family

Children

Cars

Cars

Name

Peter

Name

Peter

Child

Child

Car

Car

Year

1976

Year

2010

Name

Toto

Name

2CV

Name

BMW

Sex

M

Name

Zaza

Ann.

Trash

Sex

F

unranked label trees
Unranked label trees

Families

Family

Family

Children

Cars

Cars

Name

Peter

Name

Peter

Child

Child

Car

Car

Year

1976

Year

2010

Name

Toto

Name

2CV

Name

BMW

Sex

M

Name

Zaza

Ann.

Trash

Sex

F

this is better adapted to a web context
This is better adapted to a Web context
  • Self describing data: No separation between schema and data
  • Flexibility
  • Not such a big deal
  • May be the main contribution is the format?
  • <families><family><name>Peter</Name><Cars><Car><Name>BMW</Name><Year>2010</Year></Car></Cars><Children><Child> …

Plus ça change,

plus c’est la même chose

The more things change,

the more theystay the same

what else the trees are unbounded
What else? The trees are unbounded

a

r

a

$

a

a

a

a

a

a

a

a

a

a

$

a

b

a

b

b

  • Like nested relations, trees are unbounded in width
  • Unlike nested relations, they are unbounded in depth
  • One can simulate 2 counter machines with 2 branches
    • Do applications simulate 2 counter machines with XML documents?
    • I am still looking for one
    • XML documents are rarely deep
  • But even for bounded trees there are fun questions: e.g., is the equivalence of monadic datalog decidable for bounded data trees
what else the trees are ordered unranked labeled ordered trees xml
What else? the trees are orderedUnranked labeled ordered trees = XML
  • Ignore order
  • Classical optimization
  • Respect order
  • Totally new ball game
  • Bring in tree automata

Order is often painful for optimization

Reconcile

selling argument is the web
Selling argument is the Web…
  • The move from relations to trees is interesting
  • But the move from centralized to distributed as well
  • and much less investigated
  • Where the fun is:
    • Scale is beyond what we though was thinkable
    • Machines are totally autonomous
    • Schema replaced by numerous ontologies
    • True/false logic replaced by inconsistency, probabilities, trust, belief…
and the trees are evolving aka active xml
And the trees are evolving (aka Active XML)
  • An old idea from object databases: mix data and computation

Resorts

Resort

snowcond

hotels

State

Colorado

Name

Aspen

snow

!Yahoo.com/GetHotels

<city name=“Aspen”/>)

Unit Depth

Meter 1

!Unisys.com/snow

(“Aspen”)

and there are cycles
And there are cycles

Person

Name

Spouse

  • For lack of time, I will not mention the network model [Codasyl 1969]
    • The language was purely navigational anyway
  • If I would add references to XML, I’d get cycles
  • Lots of models for graph data, e.g., IQL
  • Some fun results: e.g., some copy elimination problem when trying to obtain a ChandraHarel completeness for IQL
    • Similar issue for unordered trees [recent result with Vianu]

Adam

Person

Name

Spouse

Eve

Paris C. Kanellakis

conclusion
Conclusion
  • Is this a good time to do research on trees in databases?
  • The best time to plant a tree was 20 years ago. 
  • The next best time is now. 
  • Chinese Proverb
slide21

AdvertisementBook on Web data management to appear at Cambridge University Presshttp://webdam.inria.fr/Jorge