interactive query formulation over web service accessed sources
Download
Skip this Video
Download Presentation
Interactive Query Formulation over Web Service-Accessed Sources

Loading in 2 Seconds...

play fullscreen
1 / 31

Interactive Query Formulation over Web Service-Accessed Sources - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

Interactive Query Formulation over Web Service-Accessed Sources. Michalis Petropoulos Alin Deutsch Yannis Papakonstantinou. ACM SIGMOD, June 2006. Large-Scale Data Integration Systems. . Web Domain. Web Forms & Reports. CNET’s Top Combinations CNET’s Search Desktops

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Interactive Query Formulation over Web Service-Accessed Sources' - elvis-jensen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
interactive query formulation over web service accessed sources

Interactive Query Formulationover Web Service-Accessed Sources

Michalis Petropoulos

Alin Deutsch

Yannis Papakonstantinou

ACM SIGMOD, June 2006

large scale data integration systems
Large-Scale Data Integration Systems

Web

Domain

Web Forms

& Reports

  • CNET’s Top Combinations
  • CNET’s Search Desktops
  • PCWorld’s Product Finder

End User

Application

Domain

  • CNET Computer
  • PCWorld Portals

Developer

Application

Application

Integration

Domain

Mediator

Integrated

Schema

Compatible Combinations

of Computers, Routers

and Printers

Integration

Engineer

Source

Domain

  • Dell Computers by CPU
  • Cisco Routers by Rate
  • HP Printers by Speed

Web

Service

Web

Service

Web

Service

Source

Owner

Data

Source

Data

Source

  • Dell Computers
  • Cisco Routers
  • HP Printers

Source

Schema

Source

Schema

large scale data integration systems1
Large-Scale Data Integration Systems

Web

Domain

Web Forms

& Reports

End User

Application

Domain

What queries can the

mediator answer for me?

CLIDE

Developer

Application

Application

Integration

Domain

Mediator

Integrated

Schema

Integration

Engineer

Source

Domain

Web

Service

Web

Service

Web

Service

Source

Owner

Data

Source

Data

Source

Source

Schema

Source

Schema

running example
Running Example

Parameterized Views

Schema

Computers(cid, cpu, ram, price)

NetCards(cid, rate, standard, interface)

Views

V1ComByCpu(cpu)  (Computer)*

SELECT DISTINCT Com1.*

FROM Computers Com1

WHERE Com1.cpu=cpu

V2 ComNetByCpuRate(cpu, rate) 

(Computer, NetCard)*

SELECT DISTINCT Com1.*, Net1.*

FROM Computers Com1, Network Net1

WHERE Com1.cid=Net1.cid

AND Com1.cpu=cpu

AND Net1.rate=rate

Dell

Cisco

Schema

Routers(rate, standard, price, type)

Views

V3 RouWired()  (Router)*

SELECT DISTINCT Rou1.*

FROM Routers Rou1

WHERE Rou1.type='Wired'

V4 RouWireless()  (Router)*

SELECT DISTINCT Rou1.*

FROM Routers Rou1

WHERE Rou1.type='Wireless'

Wired Routers

Computers

for a givencpu

Wireless Routers

Computers & NetCards

for a givencpu & rate

Conjunctive Queries CQ

  • Equality & Comparison Conditions
  • Parameters
running example1
Running Example

Integrated Schema

Developer

Application

Mediator

Integrated

Schema

  • Integrated schema puts togetherthe Dell and Cisco schemas

Attribute Associations

  • (Computers.cid, NetCards.cid)
  • (NetCards.rate, Routers.rate)
  • (NetCards.standard, Routers.standard)

V1

V2

V3

V4

Dell

Cisco

sophisticated mediators make feasible queries hard to predict
Sophisticated Mediators MakeFeasible Queries Hard to Predict

Feasible Queries FQ

  • Equivalent CQ query rewritings using the views
  • Might involve more than one views
  • Order might matter

Query:

Get all ‘P4’ Computers, together with their NetCards

and their compatible ‘Wireless’ Routers

Query:

Get all Computers

Feasible

Infeasible

E

Mediator

B

D

Mediator

A

C

RouWireless()

ComNetByCpuRate(‘P4’, ‘10’)

V1

ComNetByCpuRate(‘P4’, ‘54’)

V4

V2

problem
Problem
  • Large number of sources
  • Large number of views (web-services)
  • Mediator capabilities

Developer formulates an application query

  • Is an application query feasible?
  • If not, how do I know which ones are feasible?

Previous options:

    • The developer had to browse the view definitions and somehow formulate a feasible query
    • Or formulate queries until a feasible one is found(trial-and-error)

No system-provided guidance

the clide solution
The CLIDE Solution

CLIDE

  • A query formulation interface, which interactivelyguides the developer toward feasible queries by employing a coloring scheme

Developer

Application

Mediator

Integrated

Schema

V1

V2

V3

V4

Dell

Cisco

qbe like interfaces
QBE-Like Interfaces

Microsoft SQL-Server

clide interface
CLIDE Interface

Feasibility Flag

Selection Boxes

  • Table, selection, projection and join actions
  • Feasibility Flag
  • Color-based suggestions
example interaction
Example Interaction

Snapshot 1

Yellow required action

  • All feasible queries require this action

White optional action

  • Feasible queries can be formulatedw/ or w/o these actions
example interaction1
Example Interaction

Snapshot 2

Blue required choice of action

    • At least one of them is required to be taken in order to reach a feasible query

Join Lines:

  • Only yellow and blue are displayed
  • Must appear in Attribute Associations
clide properties
CLIDE Properties
  • Completeness of Suggestions
    • Every feasible query can be formulated by performing yellow and blue actions at every step
  • Minimality of Suggestions
    • At every step, only a minimal number of actions is suggested, i.e., the ones that are needed to preserve completeness
  • Rapid Convergence By Following Suggestions
    • The shortest sequence of actions from a query to any feasible query consists of suggested actions
example interaction2
Example Interaction

Snapshot 3

  • * any other constant
  • Red prohibited action
    • Does not appear in any feasible query
    • Lead to “Dead End” state
example interaction3
Example Interaction

Snapshot 4

F

Mediator

A

D

RouWireless()

ComNetByCpuRate(‘P4’, rate)

B

E

V4

V2

clide properties1
CLIDE Properties
  • Completeness of Suggestions
    • Every feasible query can be formulated by performing yellow and blue actions at every step
  • Minimality of Suggestions
    • At every step, only a minimal number of actions is suggested, i.e., the ones that are needed to preserve completeness
  • Rapid Convergence By Following Suggestions
    • The shortest sequence of actions from a query to any feasible query consists of suggested actions
interaction graph
Interaction Graph

Selection

Action

Table

Action

Join Action

Com1

Com1.ram

Com1.price

Com1.cpu=‘P4’

Net1

Com1.cid=Net1.cid

Rou1

  • Nodes are queries
    • One for each qCQ
  • Edges are actions
    • Table, selection, projection and join actions
  • Green nodes are feasible queries
  • Infinitely big structure
    • All CQ queries
    • All possible combinations of actions formulating them
interaction graph colors
Interaction Graph: Colors
  • Yellow action 
    • Every path from current node n to a feasible node contains 
  • Blue action 
    • At least one path from current node n to a feasible node nF contains 
    • There is not path from n to nF that contains a feasible node
  • Red action 
    • No path to a feasible node contains 

Current

Node

Com1.cid

Com1.cpu

Current

Node

Com1.cid=*

Com1.cpu=*

Com1.cpu=*

Net1

Com1.cid=Net1.cid

Com1.cid=Net1.cid

Net1.rate=’54Mbps’

Com1.ram=*

Com1.cid=Net1.cid

Net1.rate=’54Mbps’

Net1.rate=’54Mbps’

Com1.price=*

Com1.cpu=*

Rou1

Rou1

Com1.cid=Net1.cid

Net1.rate=Rou1.rate

Net1

Rou1

Com2

Com2

Com2

Com2.cid=Net1.cid

Com2.cpu=‘P4’

Net1.rate=‘54Mbps’

clide properties2
CLIDE Properties
  • Rapid Convergence By Following Suggestions
    • The shortest path from the current node to any feasible node consists of suggested actions
clide architecture
CLIDE Architecture

Actions

  • Back-End invoked every time the user performs an action
    • i.e., the user arrives at a new node in the interactions graph

Front-End

User

Current Query

Colored Actions + Feasibility Flag

Back-End

Color Algorithm

Seed Queries SQ

Parameters Algorithm

Closest Feasible Queries FQC

Closest Feasible Queries Algorithm

Aliases Collapse Rule

Minimal Feasible

Extension Queries

Maximally-Contained Rewriter

Schemas

Views

Attribute

Associations

color determined by a finite set of feasible queries
Color DeterminedBy a Finite Set of Feasible Queries

Challenge: Infinitely Many Feasible Queries

?

Radius

n

Closest Feasible

Queries FQC

Solution:

  • Finite Set of Feasible Queries
  • Closest Feasible Queries FQC
  • FQC is sufficient to color actions

Challenge: How far can the Closest Feasible Queries FQC be?

closest feasible queries fq c algorithm
Closest Feasible Queries FQC Algorithm

Solution: Based on Maximally Contained Queries FQMC

Maximally Contained

Queries FQMC

pL Radius

n

Closest Feasible

Queries FQC

  • Compute maximally contained queries FQMC
  • Radius pL is the longest path to a maximally contained query
  • Theorem: All FQC queries are reachable via a path of length p  pL

Challenge: The set of nodes within pL can be too many to explore

closest feasible queries fq c algorithm1
Closest Feasible Queries FQC Algorithm

Solution: Find the Closest Feasible Queries Directly

Maximally Contained

Queries FQMC

n

Closest Feasible

Queries FQC

More feasible nodes

  • Theorem: All queries in FQMC are in FQC
  • But not all queries in FQC are in FQMC
  • Collapse Aliases to compute FQC \ FQMC
remaining issues of the algorithm
Remaining Issues of the Algorithm
  • Color remaining actions white or red
  • Handling projections
  • Dealing with parameterized queries
clide implementation issues
CLIDE Implementation Issues
  • Maximally contained queries need to be minimized(affect CLIDE’s rapid convergence and minimality)

 Implemented the minimization module from scratch

 MiniCon computes mappings between the query and the views

 Exposed these mappings to guide minimization

  • Optimizations to bring the response time below 3 sec

 Amortize the computation across multiple interactions steps by observing that query is changed minimally at every step

 Instead of calling MiniCon from scratch at every step, we incrementally compute the closest feasible queries

thank you
Thank you

Play with our demo:

http://www.clide.info

maximally contained queries fq mc
Maximally Contained Queries FQMC

Not Maximally Contained

Query: Q3

Get all Computers

with a given cpu & ram

Maximally Contained Query

Query: Q4

Get all Computers

with a given ram

Maximally Contained Query

Query: Q2

Get all Computerswith a given cpu

Query: Q1

Get all Computers

  • Assuming fixed SELECT clause (projection list)
  • Covered extensively in literature
    • MiniCon, Bucket, InverseRules Algorithms
  • FQMC is finite
closest feasible queries fq c algorithm2
Closest Feasible Queries FQC Algorithm

Solution: Collapse Aliases

Maximally

Contained

Feasible

Queries FQMC

Closest

Feasible

Queries FQC

n

  • Collapse Aliases to compute FQC \ FQMC
  • Check satisfiability
color algorithm
Color Algorithm

Yellow and Blue

  • An action  is colored based on which closest feasible queries it appear in
  • Yellow, if  appears in all queries in FQC
  • Blue, if  appears in at least one (but not all) query in FQC

White and Red

  • Attach Maximum Projection Lists to Closest Feasible Queries
    • Projections that can be added to a feasible query, without compromising feasibility
  • Projection  is white if in the maximum projection list
  • Color selections based on projections
clide implementation optimizations
CLIDE Implementation & Optimizations

Front-End

Maximally-Contained Rewriter

Current Query

Colored Actions + Feasibility Flag

Minimal Feasible Extension Queries FQME

+ Maximum Projection Lists

Back-End

Redundant Actions Removal

Color Algorithm

Maximally-Contained Feasible Extension Queries

+ Maximum Projection Lists

Seed Queries SQ

Redundant Queries Removal

Parameters Algorithm

Feasible Extension Queries

+ Maximum Projection Lists

Closest Feasible Queries FQC

Closest Feasible Queries Algorithm

Views Expansion

Aliases Collapse Rule

Maximally-Contained Feasible Queries over Views

+ Containment Mappings

Minimal Feasible

Extension Queries

Maximally-Contained Rewriter

MiniCon

Containment Mappings Logging

Schemas

Views

Attribute

Associations

  • Views expansion introduce redundancy
    • Affects CLIDE’s rapid convergence and minimality
  • Efficient containment test crucial to redundancy removal
clide performance
CLIDE Performance
  • Views
  • Schema

C1

C1

B1

B1

Ci

Ci1

B1

C1

CiM

CL

CL

A

A

Bi

Bi1

B2

C1

BiM

BK

BK

Chains of Stars

  • Queries

A-span = 7

B-span = 3

Selections = 4,6,8,10

A

ad